SAIL Compute Cluster

Overview

The Stanford AI Lab cluster aggregates research compute nodes from various groups within the lab and control them via a central batch queueing system that coordinates all jobs running on the cluster. The nodes should not be accessed directly, as the scheduler will allocate resources such as CPU, Memory and GPU exclusively to each job.

Once you have access to use the cluster, you can submit, monitor, and cancel jobs from the headnode, sc.stanford.edu. This machine should not be used for any compute-intensive work, however you can get a shell on a compute node simply by starting an interactive job. You may also monitor (read-only) your jobs and the status of the cluster using the web-based dashboard at https://sc.stanford.edu.

You can use the cluster by starting batch jobs or interactive jobs. Interactive jobs give you access to a shell on one of the nodes, from which you can execute commands by hand, whereas batch jobs run from a given shell script in the background and automatically terminate when finished.

If you encounter any problems using the cluster, please send us a request via http://support.cs.stanford.edu and be as specific as you can when describing your issue.


Usage Policy

To gain access to the cluster, please have your advisor or one of your research group leaders submit an request via http://support.cs.stanford.edu and state the following: your CS login ID, name of the advisor you're working with (and put them under cc on the form) and estimated access expiration date.

By default, there is no sharing of compute resources between partitions, in other words, only use partition(s) and compute resources from your own group. For collaboration, please present us with written approval by your collaborators and their advisors. If we observed deliberate attempt for abuse, the incident will be reported and may incur negative consequences.

Access to the headnode, sc.stanford.edu, is only availabe on Stanford Network or via Stanford VPN service

If we have any trouble with your job, we will try to get in touch with you but we reserve the right to kill your jobs at any time.

If you have questions about the cluster, send us a request at http://support.cs.stanford.edu.


Job Submissions

Use of the cluster is coordinated by a batch queue scheduler, which assigns compute nodes to jobs in an order that depends on various factors, such as: the time submitted, the number of nodes requested, the availability of the resources being requested (etc. GPU, Memory).

There are two basic types of jobs to the cluster- interactive and batch.

Interactive jobs give you access to a shell on one of the nodes, from which you can execute commands by hand, whereas batch jobs run from a given shell script in the background and automatically terminate when finished.

Generally speaking, interactive jobs are used for building, prototyping and testing, while batch jobs are used thereafter.

Batch Jobs

Batch jobs are the preferred way to interact with the cluster, and are useful when you do not need to interact with the shell to perform the desired task. Two clear advantages are that your job will be managed automatically after submission, and that placing your setup commands in a shell script lets you efficiently dispatch multiple similar jobs. To start a simple batch job on a partition (group you work with, see bottom of the page), ssh into sc and type:

sbatch my_script.sh

There are many parameters you can define based on your requirement. You can reference to a sample submit script I have via /sailhome/software/sample-batch.sh.

For further documentation on submitting batch jobs via Slurm, see the online sbatch documentation via SchedMD.

Our friends at the Stanford Research Computing Center who runs the Sherlock cluster via Slurm, also has a wonderful write-up and they largely applies to us too. Sherlock Cluster

Interactive Jobs

Interactive jobs are useful for compiling and prototyping code intended to run on the cluster, performing one-time tasks, and executing software that requires runtime feedback. To start an interactive job, ssh into sc and type:

srun --partition=mypartition --pty bash

The above will allocate a node in mypartition and drop you into a bash shell. You can also add other parameter as necessary.

srun --partition=mypartition --nodelist=node1 --gres=gpu:1 --pty bash

The above will allocate node1 in mypartition with 1 GPU and drop you into a bash shell.

If you need X11 forwarding please make sure you have XServer installed (such as XQuartz) and add --x11 to your srun command:

srun --partition=mypartition --nodelist=node1 --gres=gpu:1 --pty  --x11 xclock

GPU specifics

User can request for a specific type of GPU or specify a memory constraint if they choose to:

srun --partition=mypartition --gres=gpu:titanx:1 --pty bash

The above will request 1 TitanX GPU from any nodes in mypartition.

srun --partition=mypartition --gres=gpu:1 --constraint=12G --pty bash

The above will request 1 GPU with 12G VRAM from any nodes in mypartition.

Of course this varies from partition to partition depending on their hardware configurations. Please visit https://sc.stanford.edu and click on "Partition" on the top right, you can see the types of GPU available for each partition there. As for constraint, you can refer to the specification by Nvidia. (1080ti = 11G, titan = 12G ....etc)

For further documentation on the srun command, see the online srun documentation via SchedMD.


Managing Jobs

One tool we found very useful and installed on SC cluster is "pestat", https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat.
It gives you an overview of the entire cluster or just a specific partition/node/user, line-by-line

Status of each node on the cluster:

pestat -G

Status of each node within a partition:

pestat -p mypartition -G

Status of a specific node:

pestat -n mynode -G

List nodes that has a job owned by a specific user:

pestat -u myuser -G

You can also use standard Slurm command, to view a list of all jobs running on the cluster by typing:

squeue

You can view detailed information for a specific job by typing:

scontrol show job jobid

To cancel a job you started, type:

scancel "jobid"

There is also a Slurm web dashboard if you prefer at https://sc.stanford.edu , note this is only accessible from within Stanford network or via Stanford VPN.

A good comparison between torque/pbs command vs. Slurm, please head to https://www.sdsc.edu/~hocks/FG/PBS.slurm.html


Storage

There are several storage options for the scail cluster,

Home directory: /sailhome/csid

All sc cluster nodes mount a common network volumes for your home directory. This is a good option for submission scripts, outputs ...etc, there is a quota of 20GB for each user. 

Scratch Storage via NFS

Each group has their own network storage mounted via auto-fs (meaning it'll mount when you call the path), space varies from group to group, so ask your group for details or contact us if you are not sure where to store your files.

Dedicated data-transfer node:

scdt.stanford.edu

Since we want to keep resource-contention to the minimum, we have a dedicated machine for handling data IOs. If you need to move large amount of data or have any prolonged IO operations, please do so on SCDT. The machine is equipped with higher bandwidth interfaces and mount all network storages within the cluster, it is often faster to it here than anywhere else. 


FAQ

Partition Defaults

A number of global defaults should be made aware of:

Memory per job = 4GB, user can specify more via --mem
Core per job = 2, user can specify more via --cpus-per-task
Walltime = varies by partitions, check https://sc.stanford.edu/ (Partition). Most are 7days, user can specify upto 21days via --time

Screen/TMUX

We see many users wrap their interactive jobs with screen or tmux so they can detach and re-attach later. While this is a feasible use case, we want to state that if there's any network interruption between the headnode (sc) and the compute nodes (and they do happen occasionally), these jobs will get cancelled automatically by Slurm. Jobs submitted via sbatch on the other hand, can better sustain these kind of interruptions.

Virtual Environment for Python:

Almost all users are using some kind of virtual python environment, either virtualenv, anaconda, miniconda ...etc. We install a small number of default python packages to get things going, but you are responsible for creating your own environment.

CUDA/CuDNN

At the moment, CUDA 9.0 is the default across the cluster. But each group (partition) can have their own default. Contact us if you think the group is ready for a new version of CUDA, which can be added (multiple CUDA versions can co-exist), this often requires GPU driver update which requires a reboot on all of the nodes.

iPython/Jupyter Notebook:

Do not run iPython/Jupyter notebook on the headnode (can cause memory spikes), instead, do that via one of the compute node. 

ssh sc.stanford.edu
screen
srun -p mypartition --pty bash (add --gres=gpu:1 if you need GPU)
export XDG_RUNTIME_DIR="" (important)
source yourvirtualenvironment
jupyter-notebook --no-browser --port=8880 --ip='0.0.0.0'

Follow the result URL on your browser to open up your notebook.

Extra-credit: If you do this often, you can easily convert the above into a script and use sbatch to run it in batch mode.