Reference

`hipercow.root`

Interact with the hipercow root.

`OptionalRoot = None | str | Path | Root` `module-attribute`

Optional root type, for user-facing functions.

Represents different inputs to the user-facing functions in hipercow, recognising that most of the time the working directory will be within a hipercow environment. The possible types represent different possibilities:

None: search for the root from the current directory (like git does)
str: the name of a path to search from
Path: a pathlib.Path object representing the path to search from
Root: a root previously opened with open_root

`init(path)`

Initialise a hipercow root.

Sets up a new hipercow root at path, creating the directory <path>/hipercow/ which will contain all of hipercow's files. It is safe to re-initialise an already-created hipercow root (in which case nothing happens) and safe to initialise a Python hipercow root at the same location as an R one, though at the moment they do not interact.

Parameters:

path (str | Path) –

The path to the project root

Returns:

None –

Nothing, called for side effects only.

`open_root(path=None)`

Open a hipercow root.

Locate and validate a hipercow root, converting an OptionalRoot type into a real Root object. This function is used in most user-facing functions in hipercow, but you can call it yourself to validate the root early.

Parameters:

path (OptionalRoot, default: None ) –

A path to the root to open, a Root, or None.

Returns:

Root –

The opened Root object.

`hipercow.configure`

`configure(name, *, root=None, **kwargs)`

Configure a driver.

Configures a hipercow root to use a driver.

Parameters:

name (str) –

The name of the driver. This will be dide unless you are developing hipercow itself :)
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.
**kwargs (Any, default: {} ) –

Arguments passed to, and supported by, your driver.

Returns:

None –

Nothing, called for side effects only.

`unconfigure(name, root=None)`

Unconfigure (remove) a driver.

Parameters:

name (str) –

The name of the driver. This will be dide unless you are developing hipercow itself :)
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

None –

Nothing, called for side effects only.

`hipercow.task`

Functions for interacting with tasks.

`TaskStatus`

Bases: Flag

Status of a task.

Tasks move from CREATED to SUBMITTED to RUNNING to one of SUCCESS or FAILURE. In addition a task might be CANCELLED (this could happen from CREATED, SUBMITTED or RUNNING) or might be MISSING if it does not exist.

A runnable task is one that we could use task_eval with; it might be CREATED or SUBMITTED.

A terminal task is one that has reached the latest state it will reach, and is SUCCESS, FAILURE or CANCELLED.

`is_runnable()`

Check if a status implies a task can be run.

`is_terminal()`

Check if a status implies a task is completed.

`task_driver(task_id, root)`

Get the driver used to submit a task.

This may not always be set (e.g., a task was created before a driver was configured), in which case we return None.

Parameters:

task_id (str) –

The task identifier to look up.
root (Root) –

The root, or if not given search from the current directory.

Returns:

str | None –

The driver name, if known. Otherwise None.

`task_exists(task_id, root=None)`

Test if a task exists.

A task exists if the task_id was used with this hipercow root (i.e., if any files associated with it exist).

Parameters:

task_id (str) –

The task identifier, a 32-character hex string.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

bool –

True if the task exists.

`task_last(root=None)`

Return the most recently created task.

Parameters:

root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Return

A task identifier (a 32-character hex string) if any tasks have been created (and if the recent task list has not been truncated), or None if no tasks have been created.

`task_list(*, root=None, with_status=None)`

List known tasks.

Warning

This function could take a long time to execute on large projects with many tasks, particularly on large file systems. Because the tasks are just returned as a list of strings, it may not be terribly useful either. Think before building a workflow around this.

Parameters:

root (OptionalRoot, default: None ) –

The root to search from.
with_status (TaskStatus | None, default: None ) –

Optional status, or set of statuses, to match

Returns:

list[str] –

A list of task identifiers.

`task_log(task_id, *, outer=False, root=None)`

Read the task log.

Not all tasks have logs; tasks that have not yet started (status of CREATED or SUBMITTED and those CANCELLED before starting) will not have logs, and tasks that were run without capturing output will not produce a log either. Be sure to check if a string was returned.

Parameters:

task_id (str) –

The task identifier to fetch the log for, a 32-character hex string.
outer (bool, default: False ) –

Fetch the "outer" logs; these are logs from the underlying HPC software before it hands off to hipercow.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

str | None –

The log as a single string, if present.

`task_recent(*, root=None, limit=None)`

Return a list of recently created tasks.

Parameters:

root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.
limit (int | None, default: None ) –

The maximum number of tasks to return.

Return

A list of task identifiers. The most recent tasks will be last in this list (we might change this in a future version - yes, that will be annoying). Note that this is recency in creation, not completion.

`task_recent_rebuild(*, root=None, limit=None)`

Rebuild the list of recent tasks.

Parameters:

root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.
limit (int | None, default: None ) –

The maximum number of tasks to add to the recent list. Use limit=0 to truncate the list.

Returns:

None –

Nothing, called for side effects only.

`task_status(task_id, root=None)`

Read task status.

Parameters:

task_id (str) –

The task identifier to check, a 32-character hex string.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

TaskStatus –

The status of the task.

`task_wait(task_id, *, root=None, allow_created=False, **kwargs)`

Wait for a task to complete.

Parameters:

task_id (str) –

The task to wait on.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.
allow_created (bool, default: False ) –

Allow waiting on a task that has status CREATED. Normally this is not allowed because a task that is CREATED (and not SUBMITTED) will not start; if you pass allow_created=True it is expected that you are also manually evaluating this task!
**kwargs (Any, default: {} ) –

Additional arguments to taskwait.taskwait.

Returns:

bool –

True if the task completes successfully, False if it
bool –

fails. A timeout will throw an error. We return this boolean
bool –

rather than the TaskStatus because this generalises to
bool –

multiple tasks.

`hipercow.task_create`

`task_create_shell(cmd, *, environment=None, envvars=None, resources=None, driver=None, root=None)`

Create a shell command task.

This is the first type of task that we support, and more types will likely follow. A shell command will evaluate an arbitrary command on the cluster - it does not even need to be written in Python! However, if you are using the pip environment engine then it will need to be pip-installable.

The interface here is somewhat subject to change, but we think the basics here are reasonable.

Parameters:

cmd (list[str]) –

The command to execute, as a list of strings
environment (str | None, default: None ) –

The name of the environment to evaluate the command in. The default (None) will select default if available, falling back on empty.
envvars (dict[str, str] | None, default: None ) –

A dictionary of environment variables to set before the task runs. Do not set PATH in here, it will not currently have an effect.
resources (TaskResources | None, default: None ) –

Optional resources required by your task.
driver (str | None, default: None ) –

The driver to launch the task with. Generally this is not needed as we expect most people to have a single driver set.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

str –

The newly-created task identifier, a 32-character hex string.

`hipercow.bundle`

Support for bundles of related tasks.

`Bundle`

Bases: BaseModel

A bundle of tasks.

Attributes:

name (str) –

The bundle name
task_ids (list[str]) –

The task identifiers in the bundle

`bundle_create(task_ids, name=None, *, validate=True, overwrite=True, root=None)`

Create a new bundle from a list of tasks.

Parameters:

task_ids (list[str]) –

The task identifiers in the bundle
name (str | None, default: None ) –

The name for the bundle. If not given, we randomly create one. The format of the name is subject to change.
validate (bool, default: True ) –

Check that all tasks exist before creating the bundle.
overwrite (bool, default: True ) –

Overwrite a bundle if it already exists.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

The name of the newly created bundle. Also, as a side

str –

effect, writes out the task bundle to disk.

`bundle_delete(name, root=None)`

Delete a bundle.

Note that this does not delete the tasks in the bundle, just the bundle itself.

Parameters:

name (str) –

The name of the bundle to delete
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

None –

Nothing, called for side effects only.

`bundle_list(root=None)`

List bundles.

Parameters:

root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

The names of known bundles. Currently the order of these

list[str] –

is arbitrary.

`bundle_load(name, root=None)`

Load a task bundle.

Parameters:

name (str) –

The name of the bundle to load
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

Bundle –

The loaded bundle.

`bundle_status(name, root=None)`

Get the statuses of tasks in a bundle.

Depending on the context, bundle_status_reduce() may be more appropriate function to use, which attempts to reduce the list of statuses into the single "worst" status.

Parameters:

name (str) –

The name of the bundle to get the statuses for.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

list[TaskStatus] –

A list of statuses, one per task. These are stored in
list[TaskStatus] –

the same order as the original bundle.

`bundle_status_reduce(name, root=None)`

Get the overall status from a bundle.

Parameters:

name (str) –

The name of the bundle to get the statuses for.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

TaskStatus –

The overall bundle status.

`hipercow.environment`

`environment_check(name, root=None)`

Validate an environment name for this root.

This function can be used to ensure that name is a reasonable environment name to use in your root. It returns the resolved name (selecting between empty and default if name is None), and errors if the requested environment is not found.

Parameters:

name (str | None) –

The name of the environment to use, or None to select the appropriate default.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

str –

The resolved environment name.

`environment_delete(name, root=None)`

Delete an environment.

Parameters:

name (str) –

The name of the environment to delete.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

None –

Nothing, called for side effects only.

`environment_exists(name, root=None)`

Check if an environment exists.

Note that this function will return False for empty, even though empty is always a valid choice. We might change this in future.

Parameters:

name (str) –

The name of the environment to check.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

bool –

True if the environment exists, otherwise False.

`environment_list(root=None)`

List known environments.

Parameters:

root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

list[str] –

A sorted list of environment names. The name empty will always be present.

`environment_new(name, engine, root=None)`

Create a new environment.

Creating an environment selects a name and declares the engine for the environment. After doing this, you will certainly want to provision the environment using provision().

Parameters:

name (str) –

The name for the environment. The name default is a good choice if you only want a single environment, as this is the environment used by default. You cannot use empty as that is a special empty environment.
engine (str) –

The environment engine to use. The options here are pip and empty. Soon we will support conda too.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

None –

Nothing, called for side effects.

`hipercow.provision`

`provision(name, cmd, *, driver=None, root=None)`

Provision an environment.

This function requires that your root has a driver configured (with hipercow.configure) and an environment created (with hipercow.environment_new).

Note that in the commandline tool, this command is grouped into the environment group; we may move this function into the environment module in future.

Parameters:

name (str) –

The name of the environment to provision
cmd (list[str] | None) –

Optionally the command to run to do the provisioning. If None then the environment engine will select an appropriate command if it is well defined for your setup. The details here depend on the engine.
driver (str | None, default: None ) –

The name of the driver to use in provisioning. Normally this can be omitted, as None (the default) will select your driver automatically if only one is configured.
root (OptionalRoot, default: None ) –

The root, or if not given search from the current directory.

Returns:

None –

Nothing, called for side effects only.

`hipercow.resources`

Specify and interact with resources.

`ClusterResources` `dataclass`

Resources available on a cluster.

This will be returned by a cluster driver and will be used to validate resources.

Attributes:

queues (Queues) –

Valid queues.
max_memory (int) –

The maximum ram across all nodes in the cluster.
max_cores (int) –

The maximum cores across all nodes in the cluster.

`validate_resources(resources)`

Check resources are valid on this cluster.

Takes a set of resources and checks that the requested queue, number of cores and ram are valid for the cluster. If the queue is not provided, then we will take the default.

Parameters:

resources (TaskResources) –

Resources to validate.

`Queues` `dataclass`

Queues available on the cluster.

Attributes:

valid (set[str]) –

The set of valid queue names. Being a set, the order does not imply anything.
default (str) –

The default queue, used if none is explicitly given
build (str) –

The queue used to run build jobs
test (str) –

The queue used to run test jobs

`simple(name)` `staticmethod`

Create a Queues object with only one valid queue.

This situation is common enough that we provide a small wrapper.

Parameters:

name (str) –

The only supported queue. This will become the set of valid queues, the default queue, the build queue and the test queue.

`TaskResources`

Bases: BaseModel

Resources required for a task.

We don't support everything that the R version does yet; in particular we've not set up hold_until, priority or requested_nodes, as these are not widely used.

Attributes:

queue (str | None) –

The queue to run on. If not given (or None), we use the default queue for your cluster. Alternatively, you can provide .default for the default queue or .test for the test queue.
cores (int | float) –

The number of cores to request. Adding more cores does not necessarily make your task any faster; your task must have some mechanism to exploit this parallelism (e.g., using the multiprocessing package). Specify math.inf if you want to request all the cores on a single node.
exclusive (bool) –

Request exclusive access to a node.
max_runtime (int | None) –

The maximum run time (wall clock), in seconds.
memory_per_node (int | None) –

Specify that your task can only run on a node with at least this much memory, in GB (e.g, 128 is 128GB).
memory_per_task (int | None) –

An estimate of how much memory your task requires (across all cores), in GB. If you provide this, the scheduler can attempt to arrange tasks such that they will all fit in the available RAM.

`hipercow.environment_engines`

Support for environment engines.

`Empty`

Bases: EnvironmentEngine

The empty environment, into which nothing may be installed.

`check_args(cmd)`

Check arguments to the empty environment, which must be empty.

Parameters:

cmd (list[str] | None) –

A list or None, if the list is not empty an error is thrown.

Returns:

list[str] –

The empty list.

`create(**kwargs)`

Create the empty environment, which already exists.

Returns:

None –

Never returns, but throws if called.

`exists()`

The empty environment always exists.

Returns:

bool –

True

`path()`

The path to the empty environment, which never exists.

Returns:

Path –

Never returns, but throws if called.

`provision(cmd, **kwargs)`

Install packages into the empty environment, which is not allowed.

Returns:

None –

Never returns, but throws if called.

`EnvironmentEngine`

Bases: ABC

Base class for environment engines.

Attributes:

root –

The hipercow root
name –

The name of the environment to provision
platform –

Optionally, the platform to target.

`check_args(cmd)` `abstractmethod`

Check arguments provided in cmd for suitability.

This method runs on the client; the python process initiating the provisioning request, and does not run in the context of the process that will create the environment. In particular, don't assume that the platform information is the same.

Parameters:

cmd (list[str] | None) –

A list of arguments to provision an environment, or None if the user provided none. In the latter case you must provide suitable defaults or error.

Returns:

list[str] –

A validated list of arguments.

`create(**kwargs)` `abstractmethod`

Create (or initialise) the environment.

This method will be called on the target platform, not on the client platform. Most environment systems have a concept of initialisation; this will typically create the directory referred to by path(), and do any required bootstrapping. It will not typically install anything for the user.

In general, we expect create() to be called only once per environment lifetime, while provision() we expect to be called every time the environment is modified (one or many times).

Parameters:

**kwargs (Any, default: {} ) –

Additional keyword arguments passed on to the concrete method

Returns:

Nothing ( None ) –

Called for side-effects only.

`exists()`

Test if an environment exists.

This method is not abstract, and generally should not need to be replaced by derived classes.

Returns:

bool –

True if the environment directory exists, otherwise
bool –

False. Note that True does not mean that the
bool –

environment is usable; this is mostly intended to be
bool –

used to determine if create() needs to be called.

`path()`

Compute path to the environment contents.

This base method version will return a suitable path within the root. Implementations can use this path directly (say, if the environment path does not need to differ according to platform etc), or compute their own. We might change the logic here in future to make this base-class returned path more generally useful.

Returns:

Path –

The path to the directory that will store the environment.

`provision(cmd, **kwargs)` `abstractmethod`

Provision an environment.

Install packages or software into the environment.

Parameters:

cmd (list[str]) –

A command to run in the environment. Most of the time this just calls hipercow.utils.subprocess_run directly
**kwargs (Any, default: {} ) –

Additional keyword arguments passed through to the concrete method.

Returns:

Nothing ( None ) –

Called for side-effects only.

`run(cmd, *, env=None, **kwargs)` `abstractmethod`

Run a command within an environment.

Both provisioning and running tasks will run in their context of an environment. This method must be specialised to activate the environment and then run the given shell command.

This method should (eventually) call hipercow.util.subprocess_run, returning the value from that function.

Parameters:

cmd (list[str]) –

The command to run
env (dict[str, str] | None, default: None ) –

An optional dictionary of environment variables that will be set within the environment.
**kwargs (Any, default: {} ) –

Additional methods passed from the provisioner or the task runner.

Return

Information about the completed process. Note that errors are not thrown unless the keyword argument check=True is provided.

`Pip`

Bases: EnvironmentEngine

Python virtual environments, installed by pip.

`check_args(cmd)`

Validate pip installation command.

Checks if cmd is a valid pip command.

If cmd is None or the empty list, we try and guess a default command, based on files found in your project root.

if you have a pyproject.toml file, then we will try and run pip install --verbose .
if you have a requirements.txt, then we will try and run pip install --verbose -r requirements.txt

(In both cases these are returned as a list of arguments.)

If there are other reasonable conventions that we might follow, please let us know.

Parameters:

cmd (list[str] | None) –

The command to validate

Returns:

list[str] –

A validated list of arguments.

`create(**kwargs)`

Create the virtual environment.

Calls

python -m venv <path>

with the result of path().

Parameters:

**kwargs (Any, default: {} ) –

Additional arguments to subprocess_run

Returns:

None –

Nothing, called for side effects only.

`provision(cmd, **kwargs)`

Provision a virtual environment using pip.

Parameters:

cmd (list[str]) –

The command to run
**kwargs (Any, default: {} ) –

Additional arguments to Pip.run

Returns: Nothing, called for its side effect only.

`run(cmd, *, env=None, **kwargs)`

Run a command within the pip virtual environment.

Parameters:

cmd (list[str]) –

The command to run
env (dict[str, str] | None, default: None ) –

Environment variables, passed into subprocess_run. We will add additional environment variables to control the virtual environment activation. Note that PATH cannot be safely set through env yet, because we have to modify that to activate the virtual environment, and because subprocess.Popen requires the PATH to be set before finding the program to call on Windows. We may improve this in future.
**kwargs (Any, default: {} ) –

Keyword arguments to subprocess_run.

Details about the process, if `check=True` is not

CompletedProcess –

present in kwargs

`Platform` `dataclass`

Information about a platform.

The most basic information about a platform that we need to set up an environment, derived from the the platform module.

Attributes:

system (str) –

The name of the system, in lowercase. Values will be linux, windows or darwin (macOS). We may replace this with an Enum in future.
version (str) –

The python version, as a 3-element version string.

`local()` `staticmethod`

Platform information for the running Python.

A convenience function to construct suitable platform information for the currently running system.

Reference

hipercow.root

OptionalRoot = None | str | Path | Root module-attribute

init(path)

open_root(path=None)

hipercow.configure

configure(name, *, root=None, **kwargs)

unconfigure(name, root=None)

hipercow.task

TaskStatus

is_runnable()

is_terminal()

task_driver(task_id, root)

task_exists(task_id, root=None)

task_last(root=None)

task_list(*, root=None, with_status=None)

task_log(task_id, *, outer=False, root=None)

task_recent(*, root=None, limit=None)

task_recent_rebuild(*, root=None, limit=None)

task_status(task_id, root=None)

task_wait(task_id, *, root=None, allow_created=False, **kwargs)

hipercow.task_create

task_create_shell(cmd, *, environment=None, envvars=None, resources=None, driver=None, root=None)

hipercow.bundle

Bundle

bundle_create(task_ids, name=None, *, validate=True, overwrite=True, root=None)

bundle_delete(name, root=None)

bundle_list(root=None)

bundle_load(name, root=None)

bundle_status(name, root=None)

bundle_status_reduce(name, root=None)

hipercow.environment

environment_check(name, root=None)

environment_delete(name, root=None)

environment_exists(name, root=None)

environment_list(root=None)

environment_new(name, engine, root=None)

hipercow.provision

provision(name, cmd, *, driver=None, root=None)

hipercow.resources

ClusterResources dataclass

validate_resources(resources)

Queues dataclass

simple(name) staticmethod

TaskResources

hipercow.environment_engines

Empty

check_args(cmd)

create(**kwargs)

exists()

path()

provision(cmd, **kwargs)

EnvironmentEngine

check_args(cmd) abstractmethod

create(**kwargs) abstractmethod

exists()

path()

provision(cmd, **kwargs) abstractmethod

run(cmd, *, env=None, **kwargs) abstractmethod

Pip

check_args(cmd)

create(**kwargs)

provision(cmd, **kwargs)

run(cmd, *, env=None, **kwargs)

Platform dataclass

local() staticmethod

`hipercow.root`

`OptionalRoot = None | str | Path | Root` `module-attribute`

`init(path)`

`open_root(path=None)`

`hipercow.configure`

`configure(name, *, root=None, **kwargs)`

`unconfigure(name, root=None)`

`hipercow.task`

`TaskStatus`

`is_runnable()`

`is_terminal()`

`task_driver(task_id, root)`

`task_exists(task_id, root=None)`

`task_last(root=None)`

`task_list(*, root=None, with_status=None)`

`task_log(task_id, *, outer=False, root=None)`

`task_recent(*, root=None, limit=None)`

`task_recent_rebuild(*, root=None, limit=None)`

`task_status(task_id, root=None)`

`task_wait(task_id, *, root=None, allow_created=False, **kwargs)`

`hipercow.task_create`

`task_create_shell(cmd, *, environment=None, envvars=None, resources=None, driver=None, root=None)`

`hipercow.bundle`

`Bundle`

`bundle_create(task_ids, name=None, *, validate=True, overwrite=True, root=None)`

`bundle_delete(name, root=None)`

`bundle_list(root=None)`

`bundle_load(name, root=None)`

`bundle_status(name, root=None)`

`bundle_status_reduce(name, root=None)`

`hipercow.environment`

`environment_check(name, root=None)`

`environment_delete(name, root=None)`

`environment_exists(name, root=None)`

`environment_list(root=None)`

`environment_new(name, engine, root=None)`

`hipercow.provision`

`provision(name, cmd, *, driver=None, root=None)`

`hipercow.resources`

`ClusterResources` `dataclass`

`validate_resources(resources)`

`Queues` `dataclass`

`simple(name)` `staticmethod`

`TaskResources`

`hipercow.environment_engines`

`Empty`

`check_args(cmd)`

`create(**kwargs)`

`exists()`

`path()`

`provision(cmd, **kwargs)`

`EnvironmentEngine`

`check_args(cmd)` `abstractmethod`

`create(**kwargs)` `abstractmethod`

`exists()`

`path()`

`provision(cmd, **kwargs)` `abstractmethod`

`run(cmd, *, env=None, **kwargs)` `abstractmethod`

`Pip`

`check_args(cmd)`

`create(**kwargs)`

`provision(cmd, **kwargs)`

`run(cmd, *, env=None, **kwargs)`

`Platform` `dataclass`

`local()` `staticmethod`