Working on Gattaca

gattaca.cs.middlebury.edu (Gattaca) is a 16-core (dual socket) Linux workstation with 256 GB of RAM and two NVIDIA Titan Xp GPUs running Ubuntu 16.04 LTS. This document is a brief guide to successfully using the system.

Getting an Account

Contact Michael Linderman (mlinderman@middlebury.edu) to obtain an account. Please provide your public SSH key (the GitHub documentation has helpful instructions for generating a public key on different platforms). The username will be the same as your Middlebury username (mine is mlinderman for example), but, at present, this account is distinct from your Middlebury account and will have a distinct password.

Storage

At present Gattaca has approximately 43.6 TB of storage. Storage on Gattaca, including home directories, is not backed up in anyway; you are responsible for protecting any data from loss. Gattaca home directories are approximately 300 GB on SSD. Approximately 36 TB is available in /data/projects and 7.6 TB in /data/scratch. Every user has a personal scratch directory /data/scratch/$USER for their user. /home should only used for source code and other "small" data; use /data/project for your larger durable project files, e.g. data, and /data/scratch for larger temporary or working files. Note /tmp is on the small SSD, so if you application generates many temporary files, point its temporary directory to your "scratch" directory. Expect in the future that files more than 2 weeks old will be routinely expunged from /data/scratch. Both "projects" and "scratch" on striped across two disks in a 8-disk JBOD array.

Compute

Gattaca is running the Slurm job queueing system version 15.08.7. At present you are not required to use the queueing system to access the entire machine, but best practices are to use Slurm (and in the future you will be required to do so). Note that because of hyper-threading, Slurm sees each socket as having 16 (virtual) cores.

There are many guides to using Slurm. At present there are no specific partitions to which jobs must be submitted.

Slurm manages the GPU as a generic resource. To schedule GPU jobs, add --gres=gpu:titanxp:1 to your job resource requirements.

Software

Speciality software is available via Modules. To bring various packages and associated libraries, etc. load the associated modules, e.g. module load samtools. You can see the currently available modules with module avail. Please contact Michael Linderman if there is software you think should be installed globally.

Additional Topics

Transferring Files to and from Gattaca

The simplest approach is to use scp(secure copy) available via the OSX terminal or Putty on windows (or in a variety of GUI applications). Alternately you can use tools like sshfs or SFTPNetDrive to mount Gattaca directories locally as a remote disk.

Installing Python packages

You can install Python packages for yourself on gattaca (for python3) via

pip3 install --user <package>

Alternately you can use virtualenv to install packages in their own directory ensuring that they won't interfere with any other installed packages.

Working with TENSorFlow on the GPU

The neccessary CUDA tools (version 10.1) are already installed for you and available via the module system, e.g.

module load cuda/10.1

With the module loaded you can install TensorFlow in a virtualenv as described in the TensorFlow documentation. Make sure to install the tensorflow-gpu package (as opposed to the tensorflow package).

To control which the GPUs you are using, you can use the CUDA_VISIBLE_DEVICES environment variable. For example, launching your program via:

CUDA_VISIBLE_DEVICES=1 /your/program

would use the the GPU:1 device. You can use the nvidia-smi command to see the available GPUs and their current utilization. module load cuda will set CUDA_DEVICE_ORDER=PCI_BUS_ID so that the CUDA device IDs are consistent with nvidia-smi.

If you encounter an ExhaustResourceError, the available GPUs have insufficient memory for your application. Try reducing the memory requirements (e.g. by reducing the batch size).

Using Screen so Jobs Continue to Run after you logout

I suggest using the screen or tmux "detachable" terminals. Start a screen session and starting running your programs within that session. When you need to logout, detach from the screen. Any tasks will continue to run. When your reconnect to Gattaca you can attach to your screen and pickup where you left off.

Trouble Connecting Via SSH

There are many possible problems, but a common issue is mismatching usernames between your local computer and gattaca. When you execute ssh gattaca.cs.middlebury.edu, SSH uses your local username, which is likely not the same as your Middlebury username. Instead explicity specify the username, e.g. ssh mlinderman@gattaca.cs.middlebury.edu. On OSX and Linux you can set the username and other options once in your ~/.ssh/config file. For example

Host *middlebury.edu
        User mlinderman