View on GitHub

Pwrake

Pwrake: Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

Download this project as a .zip file Download this project as a tar.gz file

Pwrake

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

README in Japanese, GitHub Repository, RubyGems

Features

Requirement

Installation

Install with RubyGems:

$ gem install pwrake

Or download source tgz/zip and expand, cd to subdirectory and install:

$ ruby setup.rb

If you use rbenv, your system may fail to find pwrake command after installation:

-bash: pwrake: command not found

In this case, you need the rehash of command paths:

$ rbenv rehash

Usage

Parallel execution using 4 cores at localhost:

$ pwrake -j 4

Parallel execution using all cores at localhost:

$ pwrake -j

Parallel execution using total 2*2 cores at remote 2 hosts:

  1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.
  2. Allow passphrase-less access via SSH in either way:
    • Add passphrase-less key generated by ssh-keygen. (Be careful)
    • Add passphrase using ssh-add.
  3. Make hosts file in which remote host names and the number of cores are listed:

     $ cat hosts
     host1 2
     host2 2
    
  4. Run pwrake with an option --hostfile or -F:

     $ pwrake -F hosts
    

Sustitute MPI for SSH to start remote worker (Experimental)

  1. Setup MPI on your cluster.
  2. Install MPipe gem. (requires mpicc)
  3. Run pwrake-mpi command.

     $ pwrake-mpi -F hosts
    

Options

Pwrake command line options (in addition to Rake option)

-F, --hostfile FILE              [Pw] Read hostnames from FILE
-j, --jobs [N]                   [Pw] Number of threads at localhost (default: # of processors)
-L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY
    --ssh-opt, --ssh-option OPTION
                                 [Pw] Option passed to SSH
    --filesystem FILESYSTEM      [Pw] Specify FILESYSTEM (nfs|gfarm2fs)
    --gfarm                      [Pw] (obsolete; Start pwrake on Gfarm FS)
-A, --disable-affinity           [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal              [Pw] Turn OFF task steal
-d, --debug                      [Pw] Output Debug messages
    --pwrake-conf [FILE]         [Pw] Pwrake configuration file in YAML
    --show-conf, --show-config   [Pw] Show Pwrake configuration options
    --report LOGDIR              [Pw] Generate `report.html' (Report of workflow statistics) in LOGDIR and exit.
    --report-image IMAGE_TYPE    [Pw] Gnuplot output format (png,jpg,svg etc.) in report.html.
    --clear-gfarm2fs             [Pw] Clear gfarm2fs mountpoints left after failure.

pwrake_conf.yaml

Task Properties

Example of Rakefile:

desc "ncore=4 allow=ourhost*" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.
rule ".o" => ".c" do
  sh "..."
end

(1..n).each do |i|
  desc "ncore=2 steal=no" # desc should be inside of loop because it is effective only for the next task.
  file "task#{i}" do
    sh "..."
  end
end

Properties (The leftmost item is default):

ncore=integer|rational - The number of cores used by this task.
exclusive=no|yes       - Exclusively execute this task in a single node.
reserve=no|yes         - Gives higher priority to this task if ncore>1. (reserve a host)
allow=hostname         - Allow this host to execute this task. (accepts wild card)
deny=hostname          - Deny this host to execute this task. (accepts wild card)
order=deny,allow|a     llow,deny - The order of evaluation.
steal=yes|no           - Allow task stealing for this task.
retry=integer          - The number of retry for this task.

Note for Gfarm

Scheduling with Graph Partitioning

Publications

Acknowledgment

This work is supported by: