View on GitHub

Pwrake

Pwrake: Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

Download this project as a .zip file Download this project as a tar.gz file

Pwrake

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

README in Japanese, GitHub Repository, RubyGems

Features

Installation

Install with RubyGems:

$ gem install pwrake

Or download source tgz/zip and expand, cd to subdirectory and install:

$ ruby setup.rb

In the latter case, you need install Parallel manually. It is required by Pwrake for processor count.

If you use rbenv, your system may fail to find pwrake command after installation:

-bash: pwrake: command not found

In this case, you need the rehash of command paths:

$ rbenv rehash

Usage

Parallel execution using 4 cores at localhost:

$ pwrake -j 4

Parallel execution using all cores at localhost:

$ pwrake -j

Parallel execution using total 2*2 cores at remote 2 hosts:

  1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.
  2. Allow passphrase-less access via SSH in either way:
    • Add passphrase-less key generated by ssh-keygen. (Be careful)
    • Add passphrase using ssh-add.
  3. Make hosts file in which remote host names and the number of cores are listed:

     $ cat hosts
     host1 2
     host2 2
    
  4. Run pwrake with an option --hostfile or -F:

     $ pwrake -F hosts
    

Use MPI to start remote worker

  1. Setup MPI on your cluster.
  2. Install MPipe gem. (requires mpicc)
  3. Run pwrake-mpi command.

     $ pwrake-mpi -F hosts
    

Options

Pwrake command line options (in addition to Rake option)

-F, --hostfile FILE              [Pw] Read hostnames from FILE
-j, --jobs [N]                   [Pw] Number of threads at localhost (default: # of processors)
-L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY
    --ssh-opt, --ssh-option OPTION
                                 [Pw] Option passed to SSH
    --filesystem FILESYSTEM      [Pw] Specify FILESYSTEM (nfs|gfarm)
    --gfarm                      [Pw] FILESYSTEM=gfarm
-A, --disable-affinity           [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal              [Pw] Turn OFF task steal
-d, --debug                      [Pw] Output Debug messages
    --pwrake-conf [FILE]         [Pw] Pwrake configuration file in YAML
    --show-conf, --show-config   [Pw] Show Pwrake configuration options
    --report LOGDIR              [Pw] Generate `report.html' (Report of workflow statistics) in LOGDIR and exit.
    --report-image IMAGE_TYPE    [Pw] Gnuplot output format (png,jpg,svg etc.) in report.html.
    --clear-gfarm2fs             [Pw] Clear gfarm2fs mountpoints left after failure.

pwrake_conf.yaml

Task Properties

Example of Rakefile:

desc "ncore=4 allow=ourhost*" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.
rule ".o" => ".c" do
  sh "..."
end

(1..n).each do |i|
  desc "ncore=2 steal=no" # desc should be inside of loop because it is effective only for the next task.
  file "task#{i}" do
    sh "..."
  end
end

Properties (The leftmost item is default):

ncore=integer     - The number of cores used by this task.
exclusive=no|yes  - Exclusively execute this task in a single node.
allow=hostname    - Allow this host to execute this task. (accepts wild card)
deny=hostname     - Deny this host to execute this task. (accepts wild card)
order=deny,allow|allow,deny - The order of evaluation.
steal=yes|no      - Allow task stealing for this task.
retry=integer     - The number of retry for this task.

Note for Gfarm

Scheduling with Graph Partitioning

Current version

Tested Platform

Acknowledgment

This work is supported by: