Introduction
This section will describe running a simple task on the cluster, using hipercow
. We make some assumptions, common to the R version:
- You are a member of DIDE at Imperial College London
- You have an account on our cluster; you can check this by logging into the web portal with your DIDE credentials. If you cannot log in, please email Wes.
- You are working in a network share that the cluster can see. Ideally this is a project share and not your network home directory, as project shares are much faster and more reliable. See the R documentation for more on this topic, including how to configure this on your machine
- You have some Python code that you would like to run on the cluster, which currently works for you locally.
- You are confident executing commands at the command line (bash or similar). If you are not, please do talk with us as we'd be interested in your workflows.
We also make the assumption that you are OK with some rough edges while we develop this system. Please be prepared to work with us to track down and understand the bugs that you will definitely run into, so that we can make this tool work for people as well as the R package does.
Project layout
You may have an existing project, or you might be starting from scratch. We are not at all prescriptive about how you might structure your files, but we will create a directory hipercow/
at the root of your project, and you must not manually change or delete any file within that directory. It is safe to mix R and Python hipercow
s within the same project but at the moment they are completely unaware of each other's existence despite occupying the same space. It is likely that you will have a pyproject.toml
or a requirements.txt
file at this level, and quite possibly your .git/
directory.
Interaction with hipercow
If you have installed hipercow
successfully, then you will be able to run commands with the hipercow
tool; try running
$ hipercow --help
Usage: hipercow [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
dide
driver
environment
init
task
Authentication
Before starting anything, we should sort out your DIDE credentials. You need your DIDE username and password - the password might differ from your ICT password, see our guide to passwords. If unsure, you can check by logging into the web portal.
You can run hipercow dide authenticate
to store credentials in your system keychain.
$ hipercow dide authenticate
# Please enter your DIDE credentials
We need to know your DIDE username and password in order to log you into
the cluster. This will be shared across all projects on this machine, with
the username and password stored securely in your system keychain. You will
have to run this command again on other computers
Your DIDE password may differ from your Imperial password, and in some
cases your username may also differ. If in doubt, perhaps try logging in
at https://mrcdata.dide.ic.ac.uk/hpc" and use the combination that works
for you there.
DIDE username (default: rfitzjoh) >
Using username 'rfitzjoh'
Password:
I am going to to try and log in with your password now.
If this fails we can always try again
Success! I'm saving these into your keyring now so that we can reuse these
when we need to log into the cluster.
At any point you can check credentials by running
$ hipercow dide authenticate check
Fetching credentials
Testing credentials
Success!
and you can clear them by running
$ hipercow dide authenticate clear
Initialisation
At the root of your project, run:
$ hipercow init .
Initialised hipercow at .
which creates the hipercow/
directory and enables all other commands.
Because you will want to submit tasks to the DIDE cluster, we need to configure the dide
driver too:
$ hipercow driver configure dide
Configured hipercow to use 'dide'
Running hello world
Tasks are submitted using hipercow task create
, followed by any shell command. In order to make sure that credentials are correct, we recommend submitting a simple task first, like this:
$ hipercow task create --wait echo hello hipercow world
a182aa2b169c2e04aa0a5d27fff1acaa
Waiting......OK
hello hipercow world
The --wait
option should occur before your command and indicates that hipercow
should wait for the task to complete before returning. The string printed out (a182aa2b169c2e04aa0a5d27fff1acaa
) is the "task id" - every task gets one of these and they are unique.
Running some python code
Running python code requires additional steps; we'll need to set an environment for our task to run in, containing any packages that are needed. If you are in the position where your project does not depend on any packages other than those distributed with python (so no numpy
, matplotlib
, or anything else) you can skip this step.
We need to know what packages you need. There are a couple of ways of configuring this, though probably the easiest is to write a requirements.txt
file suitable to use with pip
. We might have a requirements.txt
file that contains simply:
cowsay
indicating that we need that package installed. Or it could be a bunch of packages and version constraints.
The next step is to indicate that we want a pip
-based environment; this is the default but this command still needs to be run
$ hipercow environment new
Creating environment 'default' using 'pip'
Then we need to install our packages on the cluster, so that tasks that we submit find these packages:
$ hipercow environment provision
Waiting...OK
Actual environment location may have moved due to redirects, links or junctions.
Requested location: "Q:\cluster\testing\hipercow\env\default\contents\venv-windows\Scripts\python.exe"
Actual location: "\\qdrive\homes\rfitzjoh\cluster\testing\hipercow\env\default\contents\venv-windows\Scripts\python.exe"
Obtaining dependency information for cowsay from https://files.pythonhosted.org/packages/f1/13/63c0a02c44024ee16f664e0b36eefeb22d54e93531630bd99e237986f534/cowsay-6.1-py3-none-any.whl.metadata
Downloading cowsay-6.1-py3-none-any.whl.metadata (5.6 kB)
Downloading cowsay-6.1-py3-none-any.whl (25 kB)
Installing collected packages: cowsay
This can take a little while, and we're not really sure why.
- You will see
Waiting
followed by dots untilOK
while the provisioning task queues. - The next lines of text are python creating a new empty virtual environment and complaining about paths (this can be ignored)
- Finally, we see the installation of the
cowsay
package
Now, we can submit tasks that use the cowsay
package:
$ hipercow task create --wait -- cowsay -t hello hipercow
fdfaa803fc22c9f4e05fd93247358671
Waiting....OK
______________
| hello hipercow |
==============
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
The --
here in the command is (at least currently) required to separate the command between the bits that relate to hipercow
and the bits for your command (here, the -t
would make the tool complain).