Search Google Appliance


Using Linux Clusters

Logging In

In order to log into the clusters, you must first request access. To do so, send an email to consultants@pdx.edu. Once you have access, to can connect using ssh and your Odin credentials to either rocks.research.pdx.edu or gravel.research.pdx.edu.

Dispatching Jobs

The linux clusters are intended to run MPI programs. We currently have both OpenMPI and MPICH2 installed on both rocks and gravel. To manage the clusters' resources, we run Torque; and we run Maui to schedule jobs. To utilize the resource manager and scheduler, you must use a script to start an mpi program.

Using OpenMPI

Assuming you have a Hello World mpi program called hellompi, you can use a script similar to this to execute it on 20 nodes:

  • #!/bin/bash
    #PBS -l nodes=20
    source /opt/torque/etc/openmpi-setup.sh
    mpirun hellompi

Here is another example, this time running on 4 cores each of 10 nodes for a total of 40 processes:

  • #!/bin/bash
    #PBS -l nodes=10:ppn=4
    source /opt/torque/etc/openmpi-setup.sh
    mpirun hellompi

Here is a more complex example, requesting 6 cores on the host "compute-1-3" and 8 cores on the host "compute-0-5", with each process needing 1 gigabyte of memory, standard output going to the file "hello.out", standard error going to the file "hello.err", and naming this job "Hello":

  • #!/bin/bash
    #PBS -l nodes=compute-1-3:ppn=6+compute-0-5:ppn=8,pmem=1gb,nice=10
    #PBS -o hello.out
    #PBS -e hello.err
    #PBS -N Hello
    source /opt/torque/etc/openmpi-setup.sh
    mpirun hellompi

If you want your program to run on every core in the cluster, there is a special queue for that called "full". Here is a script specifying the "full" queue:

  • #!/bin/bash
    #PBS -N BigJob
    #PBS -q full
    source /opt/torque/etc/openmpi-setup.sh
    mpirun hellompi

Once you have created the script, you can submit it to the job scheduler with the command "qsub myscript".

Using MPICH2

In order to use MPICH2 with Torque and Maui on our clusters, you must first alter your PATH variable. Add this line to the end of the '.bashrc' file in your home directory: "PATH=/opt/mpiexec/bin:/opt/mpich2/bin:$PATH"

Now, to run an mpi program called ring on every core in the cluster, create a script called 'ring.q' -- this file can have any name you wish -- with the contents:

  • #!/bin/bash
    #PBS -N ring
    #PBS -q full
    mpiexec ring

If you only want the ring command to run on 1 core each of 4 nodes and want to be sure it is running in a specific directory, you can use a script like this one:

  • #!/bin/bash
    #PBS -N small-ring
    #PBS -l nodes=4
    cd /home/username/sub/directory
    mpiexec ring

The script can be submitted to the job scheduler with the command "qsub ring.q".

Monitoring Jobs

You can monitor the status of gravel here and rocks here for the web interface. You can also use the command line with the following commands: "showq", "qstat", and "checkjob". The first will list all jobs and their job id, the second can be given an option job id to monitor, and the last requires a job id to check.

Deleting Jobs

To delete one of your currently running jobs, look up the job id with "showq" and then delete the job with "qdel <ID>". Deleting a job may take several seconds to complete.

Further Resources

Contact Academic & Research Computing for additional assistance.