Skip to content

Reference

hipercow.root

Interact with the hipercow root.

OptionalRoot = None | str | Path | Root module-attribute

Optional root type, for user-facing functions.

Represents different inputs to the user-facing functions in hipercow, recognising that most of the time the working directory will be within a hipercow environment. The possible types represent different possibilities:

  • None: search for the root from the current directory (like git does)
  • str: the name of a path to search from
  • Path: a pathlib.Path object representing the path to search from
  • Root: a root previously opened with open_root

init(path)

Initialise a hipercow root.

Sets up a new hipercow root at path, creating the directory <path>/hipercow/ which will contain all of hipercow's files. It is safe to re-initialise an already-created hipercow root (in which case nothing happens) and safe to initialise a Python hipercow root at the same location as an R one, though at the moment they do not interact.

Parameters:

  • path (str | Path) –

    The path to the project root

Returns:

  • None

    Nothing, called for side effects only.

open_root(path=None)

Open a hipercow root.

Locate and validate a hipercow root, converting an OptionalRoot type into a real Root object. This function is used in most user-facing functions in hipercow, but you can call it yourself to validate the root early.

Parameters:

  • path (OptionalRoot, default: None ) –

    A path to the root to open, a Root, or None.

Returns:

  • Root

    The opened Root object.

hipercow.configure

configure(name, *, root=None, **kwargs)

Configure a driver.

Configures a hipercow root to use a driver.

Parameters:

  • name (str) –

    The name of the driver. This will be dide unless you are developing hipercow itself :)

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

  • **kwargs (Any, default: {} ) –

    Arguments passed to, and supported by, your driver.

Returns:

  • None

    Nothing, called for side effects only.

unconfigure(name, root=None)

Unconfigure (remove) a driver.

Parameters:

  • name (str) –

    The name of the driver. This will be dide unless you are developing hipercow itself :)

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • None

    Nothing, called for side effects only.

hipercow.task

Functions for interacting with tasks.

TaskStatus

Bases: Flag

Status of a task.

Tasks move from CREATED to SUBMITTED to RUNNING to one of SUCCESS or FAILURE. In addition a task might be CANCELLED (this could happen from CREATED, SUBMITTED or RUNNING) or might be MISSING if it does not exist.

A runnable task is one that we could use task_eval with; it might be CREATED or SUBMITTED.

A terminal task is one that has reached the latest state it will reach, and is SUCCESS, FAILURE or CANCELLED.

is_runnable()

Check if a status implies a task can be run.

is_terminal()

Check if a status implies a task is completed.

task_driver(task_id, root)

Get the driver used to submit a task.

This may not always be set (e.g., a task was created before a driver was configured), in which case we return None.

Parameters:

  • task_id (str) –

    The task identifier to look up.

  • root (Root) –

    The root, or if not given search from the current directory.

Returns:

  • str | None

    The driver name, if known. Otherwise None.

task_exists(task_id, root=None)

Test if a task exists.

A task exists if the task_id was used with this hipercow root (i.e., if any files associated with it exist).

Parameters:

  • task_id (str) –

    The task identifier, a 32-character hex string.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • bool

    True if the task exists.

task_last(root=None)

Return the most recently created task.

Parameters:

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Return

A task identifier (a 32-character hex string) if any tasks have been created (and if the recent task list has not been truncated), or None if no tasks have been created.

task_list(*, root=None, with_status=None)

List known tasks.

Warning

This function could take a long time to execute on large projects with many tasks, particularly on large file systems. Because the tasks are just returned as a list of strings, it may not be terribly useful either. Think before building a workflow around this.

Parameters:

  • root (OptionalRoot, default: None ) –

    The root to search from.

  • with_status (TaskStatus | None, default: None ) –

    Optional status, or set of statuses, to match

Returns:

  • list[str]

    A list of task identifiers.

task_log(task_id, *, outer=False, root=None)

Read the task log.

Not all tasks have logs; tasks that have not yet started (status of CREATED or SUBMITTED and those CANCELLED before starting) will not have logs, and tasks that were run without capturing output will not produce a log either. Be sure to check if a string was returned.

Parameters:

  • task_id (str) –

    The task identifier to fetch the log for, a 32-character hex string.

  • outer (bool, default: False ) –

    Fetch the "outer" logs; these are logs from the underlying HPC software before it hands off to hipercow.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • str | None

    The log as a single string, if present.

task_recent(*, root=None, limit=None)

Return a list of recently created tasks.

Parameters:

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

  • limit (int | None, default: None ) –

    The maximum number of tasks to return.

Return

A list of task identifiers. The most recent tasks will be last in this list (we might change this in a future version - yes, that will be annoying). Note that this is recency in creation, not completion.

task_recent_rebuild(*, root=None, limit=None)

Rebuild the list of recent tasks.

Parameters:

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

  • limit (int | None, default: None ) –

    The maximum number of tasks to add to the recent list. Use limit=0 to truncate the list.

Returns:

  • None

    Nothing, called for side effects only.

task_status(task_id, root=None)

Read task status.

Parameters:

  • task_id (str) –

    The task identifier to check, a 32-character hex string.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

task_wait(task_id, *, root=None, allow_created=False, **kwargs)

Wait for a task to complete.

Parameters:

  • task_id (str) –

    The task to wait on.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

  • allow_created (bool, default: False ) –

    Allow waiting on a task that has status CREATED. Normally this is not allowed because a task that is CREATED (and not SUBMITTED) will not start; if you pass allow_created=True it is expected that you are also manually evaluating this task!

  • **kwargs (Any, default: {} ) –

    Additional arguments to taskwait.taskwait.

Returns:

  • bool

    True if the task completes successfully, False if it

  • bool

    fails. A timeout will throw an error. We return this boolean

  • bool

    rather than the TaskStatus because this generalises to

  • bool

    multiple tasks.

hipercow.task_create

task_create_shell(cmd, *, environment=None, envvars=None, resources=None, driver=None, root=None)

Create a shell command task.

This is the first type of task that we support, and more types will likely follow. A shell command will evaluate an arbitrary command on the cluster - it does not even need to be written in Python! However, if you are using the pip environment engine then it will need to be pip-installable.

The interface here is somewhat subject to change, but we think the basics here are reasonable.

Parameters:

  • cmd (list[str]) –

    The command to execute, as a list of strings

  • environment (str | None, default: None ) –

    The name of the environment to evaluate the command in. The default (None) will select default if available, falling back on empty.

  • envvars (dict[str, str] | None, default: None ) –

    A dictionary of environment variables to set before the task runs. Do not set PATH in here, it will not currently have an effect.

  • resources (TaskResources | None, default: None ) –

    Optional resources required by your task.

  • driver (str | None, default: None ) –

    The driver to launch the task with. Generally this is not needed as we expect most people to have a single driver set.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • str

    The newly-created task identifier, a 32-character hex string.

hipercow.bundle

Support for bundles of related tasks.

Bundle

Bases: BaseModel

A bundle of tasks.

Attributes:

  • name (str) –

    The bundle name

  • task_ids (list[str]) –

    The task identifiers in the bundle

bundle_create(task_ids, name=None, *, validate=True, overwrite=True, root=None)

Create a new bundle from a list of tasks.

Parameters:

  • task_ids (list[str]) –

    The task identifiers in the bundle

  • name (str | None, default: None ) –

    The name for the bundle. If not given, we randomly create one. The format of the name is subject to change.

  • validate (bool, default: True ) –

    Check that all tasks exist before creating the bundle.

  • overwrite (bool, default: True ) –

    Overwrite a bundle if it already exists.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

The name of the newly created bundle. Also, as a side

  • str

    effect, writes out the task bundle to disk.

bundle_delete(name, root=None)

Delete a bundle.

Note that this does not delete the tasks in the bundle, just the bundle itself.

Parameters:

  • name (str) –

    The name of the bundle to delete

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • None

    Nothing, called for side effects only.

bundle_list(root=None)

List bundles.

Parameters:

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

The names of known bundles. Currently the order of these

bundle_load(name, root=None)

Load a task bundle.

Parameters:

  • name (str) –

    The name of the bundle to load

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • Bundle

    The loaded bundle.

bundle_status(name, root=None)

Get the statuses of tasks in a bundle.

Depending on the context, bundle_status_reduce() may be more appropriate function to use, which attempts to reduce the list of statuses into the single "worst" status.

Parameters:

  • name (str) –

    The name of the bundle to get the statuses for.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

bundle_status_reduce(name, root=None)

Get the overall status from a bundle.

Parameters:

  • name (str) –

    The name of the bundle to get the statuses for.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

hipercow.environment

environment_check(name, root=None)

Validate an environment name for this root.

This function can be used to ensure that name is a reasonable environment name to use in your root. It returns the resolved name (selecting between empty and default if name is None), and errors if the requested environment is not found.

Parameters:

  • name (str | None) –

    The name of the environment to use, or None to select the appropriate default.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • str

    The resolved environment name.

environment_delete(name, root=None)

Delete an environment.

Parameters:

  • name (str) –

    The name of the environment to delete.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • None

    Nothing, called for side effects only.

environment_exists(name, root=None)

Check if an environment exists.

Note that this function will return False for empty, even though empty is always a valid choice. We might change this in future.

Parameters:

  • name (str) –

    The name of the environment to check.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • bool

    True if the environment exists, otherwise False.

environment_list(root=None)

List known environments.

Parameters:

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • list[str]

    A sorted list of environment names. The name empty will always be present.

environment_new(name, engine, root=None)

Create a new environment.

Creating an environment selects a name and declares the engine for the environment. After doing this, you will certainly want to provision the environment using provision().

Parameters:

  • name (str) –

    The name for the environment. The name default is a good choice if you only want a single environment, as this is the environment used by default. You cannot use empty as that is a special empty environment.

  • engine (str) –

    The environment engine to use. The options here are pip and empty. Soon we will support conda too.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • None

    Nothing, called for side effects.

hipercow.provision

provision(name, cmd, *, driver=None, root=None)

Provision an environment.

This function requires that your root has a driver configured (with hipercow.configure) and an environment created (with hipercow.environment_new).

Note that in the commandline tool, this command is grouped into the environment group; we may move this function into the environment module in future.

Parameters:

  • name (str) –

    The name of the environment to provision

  • cmd (list[str] | None) –

    Optionally the command to run to do the provisioning. If None then the environment engine will select an appropriate command if it is well defined for your setup. The details here depend on the engine.

  • driver (str | None, default: None ) –

    The name of the driver to use in provisioning. Normally this can be omitted, as None (the default) will select your driver automatically if only one is configured.

  • root (OptionalRoot, default: None ) –

    The root, or if not given search from the current directory.

Returns:

  • None

    Nothing, called for side effects only.

hipercow.resources

Specify and interact with resources.

ClusterResources dataclass

Resources available on a cluster.

This will be returned by a cluster driver and will be used to validate resources.

Attributes:

  • queues (Queues) –

    Valid queues.

  • max_memory (int) –

    The maximum ram across all nodes in the cluster.

  • max_cores (int) –

    The maximum cores across all nodes in the cluster.

validate_resources(resources)

Check resources are valid on this cluster.

Takes a set of resources and checks that the requested queue, number of cores and ram are valid for the cluster. If the queue is not provided, then we will take the default.

Parameters:

Queues dataclass

Queues available on the cluster.

Attributes:

  • valid (set[str]) –

    The set of valid queue names. Being a set, the order does not imply anything.

  • default (str) –

    The default queue, used if none is explicitly given

  • build (str) –

    The queue used to run build jobs

  • test (str) –

    The queue used to run test jobs

simple(name) staticmethod

Create a Queues object with only one valid queue.

This situation is common enough that we provide a small wrapper.

Parameters:

  • name (str) –

    The only supported queue. This will become the set of valid queues, the default queue, the build queue and the test queue.

TaskResources

Bases: BaseModel

Resources required for a task.

We don't support everything that the R version does yet; in particular we've not set up hold_until, priority or requested_nodes, as these are not widely used.

Attributes:

  • queue (str | None) –

    The queue to run on. If not given (or None), we use the default queue for your cluster. Alternatively, you can provide .default for the default queue or .test for the test queue.

  • cores (int | float) –

    The number of cores to request. Adding more cores does not necessarily make your task any faster; your task must have some mechanism to exploit this parallelism (e.g., using the multiprocessing package). Specify math.inf if you want to request all the cores on a single node.

  • exclusive (bool) –

    Request exclusive access to a node.

  • max_runtime (int | None) –

    The maximum run time (wall clock), in seconds.

  • memory_per_node (int | None) –

    Specify that your task can only run on a node with at least this much memory, in GB (e.g, 128 is 128GB).

  • memory_per_task (int | None) –

    An estimate of how much memory your task requires (across all cores), in GB. If you provide this, the scheduler can attempt to arrange tasks such that they will all fit in the available RAM.

hipercow.environment_engines

Support for environment engines.

Empty

Bases: EnvironmentEngine

The empty environment, into which nothing may be installed.

check_args(cmd)

Check arguments to the empty environment, which must be empty.

Parameters:

  • cmd (list[str] | None) –

    A list or None, if the list is not empty an error is thrown.

Returns:

create(**kwargs)

Create the empty environment, which already exists.

Returns:

  • None

    Never returns, but throws if called.

exists()

The empty environment always exists.

Returns:

path()

The path to the empty environment, which never exists.

Returns:

  • Path

    Never returns, but throws if called.

provision(cmd, **kwargs)

Install packages into the empty environment, which is not allowed.

Returns:

  • None

    Never returns, but throws if called.

EnvironmentEngine

Bases: ABC

Base class for environment engines.

Attributes:

  • root

    The hipercow root

  • name

    The name of the environment to provision

  • platform

    Optionally, the platform to target.

check_args(cmd) abstractmethod

Check arguments provided in cmd for suitability.

This method runs on the client; the python process initiating the provisioning request, and does not run in the context of the process that will create the environment. In particular, don't assume that the platform information is the same.

Parameters:

  • cmd (list[str] | None) –

    A list of arguments to provision an environment, or None if the user provided none. In the latter case you must provide suitable defaults or error.

Returns:

  • list[str]

    A validated list of arguments.

create(**kwargs) abstractmethod

Create (or initialise) the environment.

This method will be called on the target platform, not on the client platform. Most environment systems have a concept of initialisation; this will typically create the directory referred to by path(), and do any required bootstrapping. It will not typically install anything for the user.

In general, we expect create() to be called only once per environment lifetime, while provision() we expect to be called every time the environment is modified (one or many times).

Parameters:

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments passed on to the concrete method

Returns:

  • Nothing ( None ) –

    Called for side-effects only.

exists()

Test if an environment exists.

This method is not abstract, and generally should not need to be replaced by derived classes.

Returns:

  • bool

    True if the environment directory exists, otherwise

  • bool

    False. Note that True does not mean that the

  • bool

    environment is usable; this is mostly intended to be

  • bool

    used to determine if create() needs to be called.

path()

Compute path to the environment contents.

This base method version will return a suitable path within the root. Implementations can use this path directly (say, if the environment path does not need to differ according to platform etc), or compute their own. We might change the logic here in future to make this base-class returned path more generally useful.

Returns:

  • Path

    The path to the directory that will store the environment.

provision(cmd, **kwargs) abstractmethod

Provision an environment.

Install packages or software into the environment.

Parameters:

  • cmd (list[str]) –

    A command to run in the environment. Most of the time this just calls hipercow.utils.subprocess_run directly

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments passed through to the concrete method.

Returns:

  • Nothing ( None ) –

    Called for side-effects only.

run(cmd, *, env=None, **kwargs) abstractmethod

Run a command within an environment.

Both provisioning and running tasks will run in their context of an environment. This method must be specialised to activate the environment and then run the given shell command.

This method should (eventually) call hipercow.util.subprocess_run, returning the value from that function.

Parameters:

  • cmd (list[str]) –

    The command to run

  • env (dict[str, str] | None, default: None ) –

    An optional dictionary of environment variables that will be set within the environment.

  • **kwargs (Any, default: {} ) –

    Additional methods passed from the provisioner or the task runner.

Return

Information about the completed process. Note that errors are not thrown unless the keyword argument check=True is provided.

Pip

Bases: EnvironmentEngine

Python virtual environments, installed by pip.

check_args(cmd)

Validate pip installation command.

Checks if cmd is a valid pip command.

If cmd is None or the empty list, we try and guess a default command, based on files found in your project root.

  • if you have a pyproject.toml file, then we will try and run pip install --verbose .

  • if you have a requirements.txt, then we will try and run pip install --verbose -r requirements.txt

(In both cases these are returned as a list of arguments.)

If there are other reasonable conventions that we might follow, please let us know.

Parameters:

  • cmd (list[str] | None) –

    The command to validate

Returns:

  • list[str]

    A validated list of arguments.

create(**kwargs)

Create the virtual environment.

Calls

python -m venv <path>

with the result of path().

Parameters:

  • **kwargs (Any, default: {} ) –

    Additional arguments to subprocess_run

Returns:

  • None

    Nothing, called for side effects only.

provision(cmd, **kwargs)

Provision a virtual environment using pip.

Parameters:

  • cmd (list[str]) –

    The command to run

  • **kwargs (Any, default: {} ) –

    Additional arguments to Pip.run

Returns: Nothing, called for its side effect only.

run(cmd, *, env=None, **kwargs)

Run a command within the pip virtual environment.

Parameters:

  • cmd (list[str]) –

    The command to run

  • env (dict[str, str] | None, default: None ) –

    Environment variables, passed into subprocess_run. We will add additional environment variables to control the virtual environment activation. Note that PATH cannot be safely set through env yet, because we have to modify that to activate the virtual environment, and because subprocess.Popen requires the PATH to be set before finding the program to call on Windows. We may improve this in future.

  • **kwargs (Any, default: {} ) –

    Keyword arguments to subprocess_run.

Details about the process, if `check=True` is not

Platform dataclass

Information about a platform.

The most basic information about a platform that we need to set up an environment, derived from the the platform module.

Attributes:

  • system (str) –

    The name of the system, in lowercase. Values will be linux, windows or darwin (macOS). We may replace this with an Enum in future.

  • version (str) –

    The python version, as a 3-element version string.

local() staticmethod

Platform information for the running Python.

A convenience function to construct suitable platform information for the currently running system.