Reference
hipercow.root
Interact with the hipercow root.
OptionalRoot = None | str | Path | Root
module-attribute
Optional root type, for user-facing functions.
Represents different inputs to the user-facing functions in hipercow, recognising that most of the time the working directory will be within a hipercow environment. The possible types represent different possibilities:
None
: search for the root from the current directory (likegit
does)str
: the name of a path to search fromPath
: apathlib.Path
object representing the path to search fromRoot
: a root previously opened withopen_root
init(path)
Initialise a hipercow root.
Sets up a new hipercow root at path
, creating the directory
<path>/hipercow/
which will contain all of hipercow's files. It
is safe to re-initialise an already-created hipercow root (in
which case nothing happens) and safe to initialise a Python
hipercow root at the same location as an R one, though at the
moment they do not interact.
Parameters:
Returns:
-
None
–Nothing, called for side effects only.
open_root(path=None)
Open a hipercow root.
Locate and validate a hipercow root, converting an OptionalRoot
type
into a real Root
object. This function is used in most user-facing
functions in hipercow, but you can call it yourself to validate the
root early.
Parameters:
-
path
(OptionalRoot
, default:None
) –A path to the root to open, a
Root
, orNone
.
Returns:
-
Root
–The opened
Root
object.
hipercow.configure
configure(name, *, root=None, **kwargs)
Configure a driver.
Configures a hipercow
root to use a driver.
Parameters:
-
name
(str
) –The name of the driver. This will be
dide
unless you are developinghipercow
itself :) -
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
-
**kwargs
(Any
, default:{}
) –Arguments passed to, and supported by, your driver.
Returns:
-
None
–Nothing, called for side effects only.
unconfigure(name, root=None)
Unconfigure (remove) a driver.
Parameters:
-
name
(str
) –The name of the driver. This will be
dide
unless you are developinghipercow
itself :) -
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
None
–Nothing, called for side effects only.
hipercow.task
Functions for interacting with tasks.
TaskStatus
Bases: Flag
Status of a task.
Tasks move from CREATED
to SUBMITTED
to RUNNING
to one of
SUCCESS
or FAILURE
. In addition a task might be CANCELLED
(this could happen from CREATED
, SUBMITTED
or RUNNING
) or
might be MISSING
if it does not exist.
A runnable task is one that we could use task_eval
with; it
might be CREATED
or SUBMITTED
.
A terminal task is one that has reached the latest state it will
reach, and is SUCCESS
, FAILURE
or CANCELLED
.
is_runnable()
Check if a status implies a task can be run.
is_terminal()
Check if a status implies a task is completed.
task_driver(task_id, root)
Get the driver used to submit a task.
This may not always be set (e.g., a task was created before a
driver was configured), in which case we return None
.
Parameters:
-
task_id
(str
) –The task identifier to look up.
-
root
(Root
) –The root, or if not given search from the current directory.
Returns:
-
str | None
–The driver name, if known. Otherwise
None
.
task_exists(task_id, root=None)
Test if a task exists.
A task exists if the task_id
was used with this hipercow root
(i.e., if any files associated with it exist).
Parameters:
-
task_id
(str
) –The task identifier, a 32-character hex string.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
bool
–True
if the task exists.
task_last(root=None)
Return the most recently created task.
Parameters:
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Return
A task identifier (a 32-character hex string) if any tasks
have been created (and if the recent task list has not been
truncated), or None
if no tasks have been created.
task_list(*, root=None, with_status=None)
List known tasks.
Warning
This function could take a long time to execute on large projects with many tasks, particularly on large file systems. Because the tasks are just returned as a list of strings, it may not be terribly useful either. Think before building a workflow around this.
Parameters:
-
root
(OptionalRoot
, default:None
) –The root to search from.
-
with_status
(TaskStatus | None
, default:None
) –Optional status, or set of statuses, to match
Returns:
task_log(task_id, *, outer=False, root=None)
Read the task log.
Not all tasks have logs; tasks that have not yet started (status
of CREATED
or SUBMITTED
and those CANCELLED
before starting)
will not have logs, and tasks that were run without capturing
output will not produce a log either. Be sure to check if a
string was returned.
Parameters:
-
task_id
(str
) –The task identifier to fetch the log for, a 32-character hex string.
-
outer
(bool
, default:False
) –Fetch the "outer" logs; these are logs from the underlying HPC software before it hands off to hipercow.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
str | None
–The log as a single string, if present.
task_recent(*, root=None, limit=None)
Return a list of recently created tasks.
Parameters:
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
-
limit
(int | None
, default:None
) –The maximum number of tasks to return.
Return
A list of task identifiers. The most recent tasks will be last in this list (we might change this in a future version - yes, that will be annoying). Note that this is recency in creation, not completion.
task_recent_rebuild(*, root=None, limit=None)
Rebuild the list of recent tasks.
Parameters:
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
-
limit
(int | None
, default:None
) –The maximum number of tasks to add to the recent list. Use
limit=0
to truncate the list.
Returns:
-
None
–Nothing, called for side effects only.
task_status(task_id, root=None)
Read task status.
Parameters:
-
task_id
(str
) –The task identifier to check, a 32-character hex string.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
TaskStatus
–The status of the task.
task_wait(task_id, *, root=None, allow_created=False, **kwargs)
Wait for a task to complete.
Parameters:
-
task_id
(str
) –The task to wait on.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
-
allow_created
(bool
, default:False
) –Allow waiting on a task that has status
CREATED
. Normally this is not allowed because a task that isCREATED
(and notSUBMITTED
) will not start; if you passallow_created=True
it is expected that you are also manually evaluating this task! -
**kwargs
(Any
, default:{}
) –Additional arguments to
taskwait.taskwait
.
Returns:
hipercow.task_create
task_create_shell(cmd, *, environment=None, envvars=None, resources=None, driver=None, root=None)
Create a shell command task.
This is the first type of task that we support, and more types
will likely follow. A shell command will evaluate an arbitrary
command on the cluster - it does not even need to be written in
Python! However, if you are using the pip
environment engine
then it will need to be pip
-installable.
The interface here is somewhat subject to change, but we think the basics here are reasonable.
Parameters:
-
cmd
(list[str]
) –The command to execute, as a list of strings
-
environment
(str | None
, default:None
) –The name of the environment to evaluate the command in. The default (
None
) will selectdefault
if available, falling back onempty
. -
envvars
(dict[str, str] | None
, default:None
) –A dictionary of environment variables to set before the task runs. Do not set
PATH
in here, it will not currently have an effect. -
resources
(TaskResources | None
, default:None
) –Optional resources required by your task.
-
driver
(str | None
, default:None
) –The driver to launch the task with. Generally this is not needed as we expect most people to have a single driver set.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
str
–The newly-created task identifier, a 32-character hex string.
hipercow.bundle
Support for bundles of related tasks.
Bundle
bundle_create(task_ids, name=None, *, validate=True, overwrite=True, root=None)
Create a new bundle from a list of tasks.
Parameters:
-
task_ids
(list[str]
) –The task identifiers in the bundle
-
name
(str | None
, default:None
) –The name for the bundle. If not given, we randomly create one. The format of the name is subject to change.
-
validate
(bool
, default:True
) –Check that all tasks exist before creating the bundle.
-
overwrite
(bool
, default:True
) –Overwrite a bundle if it already exists.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
The name of the newly created bundle. Also, as a side
-
str
–effect, writes out the task bundle to disk.
bundle_delete(name, root=None)
Delete a bundle.
Note that this does not delete the tasks in the bundle, just the bundle itself.
Parameters:
-
name
(str
) –The name of the bundle to delete
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
None
–Nothing, called for side effects only.
bundle_list(root=None)
List bundles.
Parameters:
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
The names of known bundles. Currently the order of these
bundle_load(name, root=None)
Load a task bundle.
Parameters:
-
name
(str
) –The name of the bundle to load
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
Bundle
–The loaded bundle.
bundle_status(name, root=None)
Get the statuses of tasks in a bundle.
Depending on the context, bundle_status_reduce()
may be more
appropriate function to use, which attempts to reduce the list of
statuses into the single "worst" status.
Parameters:
-
name
(str
) –The name of the bundle to get the statuses for.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
list[TaskStatus]
–A list of statuses, one per task. These are stored in
-
list[TaskStatus]
–the same order as the original bundle.
bundle_status_reduce(name, root=None)
Get the overall status from a bundle.
Parameters:
-
name
(str
) –The name of the bundle to get the statuses for.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
TaskStatus
–The overall bundle status.
hipercow.environment
environment_check(name, root=None)
Validate an environment name for this root.
This function can be used to ensure that name
is a reasonable
environment name to use in your root. It returns the resolved
name (selecting between empty
and default
if name
is
None
), and errors if the requested environment is not found.
Parameters:
-
name
(str | None
) –The name of the environment to use, or
None
to select the appropriate default. -
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
str
–The resolved environment name.
environment_delete(name, root=None)
Delete an environment.
Parameters:
-
name
(str
) –The name of the environment to delete.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
None
–Nothing, called for side effects only.
environment_exists(name, root=None)
Check if an environment exists.
Note that this function will return False
for empty
, even
though empty
is always a valid choice. We might change this in
future.
Parameters:
-
name
(str
) –The name of the environment to check.
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
bool
–True
if the environment exists, otherwiseFalse
.
environment_list(root=None)
List known environments.
Parameters:
-
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
environment_new(name, engine, root=None)
Create a new environment.
Creating an environment selects a name and declares the engine for
the environment. After doing this, you will certainly want to
provision the environment using provision()
.
Parameters:
-
name
(str
) –The name for the environment. The name
default
is a good choice if you only want a single environment, as this is the environment used by default. You cannot useempty
as that is a special empty environment. -
engine
(str
) –The environment engine to use. The options here are
pip
andempty
. Soon we will supportconda
too. -
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
None
–Nothing, called for side effects.
hipercow.provision
provision(name, cmd, *, driver=None, root=None)
Provision an environment.
This function requires that your root has a driver configured
(with hipercow.configure
) and an environment created (with
hipercow.environment_new
).
Note that in the commandline tool, this command is grouped into
the environment
group; we may move this function into the
environment
module in future.
Parameters:
-
name
(str
) –The name of the environment to provision
-
cmd
(list[str] | None
) –Optionally the command to run to do the provisioning. If
None
then the environment engine will select an appropriate command if it is well defined for your setup. The details here depend on the engine. -
driver
(str | None
, default:None
) –The name of the driver to use in provisioning. Normally this can be omitted, as
None
(the default) will select your driver automatically if only one is configured. -
root
(OptionalRoot
, default:None
) –The root, or if not given search from the current directory.
Returns:
-
None
–Nothing, called for side effects only.
hipercow.resources
Specify and interact with resources.
ClusterResources
dataclass
Resources available on a cluster.
This will be returned by a cluster driver and will be used to validate resources.
Attributes:
-
queues
(Queues
) –Valid queues.
-
max_memory
(int
) –The maximum ram across all nodes in the cluster.
-
max_cores
(int
) –The maximum cores across all nodes in the cluster.
validate_resources(resources)
Check resources are valid on this cluster.
Takes a set of resources and checks that the requested queue, number of cores and ram are valid for the cluster. If the queue is not provided, then we will take the default.
Parameters:
-
resources
(TaskResources
) –Resources to validate.
Queues
dataclass
Queues available on the cluster.
Attributes:
-
valid
(set[str]
) –The set of valid queue names. Being a set, the order does not imply anything.
-
default
(str
) –The default queue, used if none is explicitly given
-
build
(str
) –The queue used to run build jobs
-
test
(str
) –The queue used to run test jobs
simple(name)
staticmethod
Create a Queues
object with only one valid queue.
This situation is common enough that we provide a small wrapper.
Parameters:
-
name
(str
) –The only supported queue. This will become the set of valid queues, the default queue, the build queue and the test queue.
TaskResources
Bases: BaseModel
Resources required for a task.
We don't support everything that the R version does yet; in
particular we've not set up hold_until
, priority
or
requested_nodes
, as these are not widely used.
Attributes:
-
queue
(str | None
) –The queue to run on. If not given (or
None
), we use the default queue for your cluster. Alternatively, you can provide.default
for the default queue or.test
for the test queue. -
cores
(int | float
) –The number of cores to request. Adding more cores does not necessarily make your task any faster; your task must have some mechanism to exploit this parallelism (e.g., using the
multiprocessing
package). Specifymath.inf
if you want to request all the cores on a single node. -
exclusive
(bool
) –Request exclusive access to a node.
-
max_runtime
(int | None
) –The maximum run time (wall clock), in seconds.
-
memory_per_node
(int | None
) –Specify that your task can only run on a node with at least this much memory, in GB (e.g, 128 is 128GB).
-
memory_per_task
(int | None
) –An estimate of how much memory your task requires (across all cores), in GB. If you provide this, the scheduler can attempt to arrange tasks such that they will all fit in the available RAM.
hipercow.environment_engines
Support for environment engines.
Empty
Bases: EnvironmentEngine
The empty environment, into which nothing may be installed.
check_args(cmd)
create(**kwargs)
Create the empty environment, which already exists.
Returns:
-
None
–Never returns, but throws if called.
exists()
path()
The path to the empty environment, which never exists.
Returns:
-
Path
–Never returns, but throws if called.
provision(cmd, **kwargs)
Install packages into the empty environment, which is not allowed.
Returns:
-
None
–Never returns, but throws if called.
EnvironmentEngine
Bases: ABC
Base class for environment engines.
Attributes:
-
root
–The
hipercow
root -
name
–The name of the environment to provision
-
platform
–Optionally, the platform to target.
check_args(cmd)
abstractmethod
Check arguments provided in cmd
for suitability.
This method runs on the client; the python process initiating the provisioning request, and does not run in the context of the process that will create the environment. In particular, don't assume that the platform information is the same.
Parameters:
-
cmd
(list[str] | None
) –A list of arguments to provision an environment, or
None
if the user provided none. In the latter case you must provide suitable defaults or error.
Returns:
create(**kwargs)
abstractmethod
Create (or initialise) the environment.
This method will be called on the target platform, not on the
client platform. Most environment systems have a concept of
initialisation; this will typically create the directory
referred to by path()
, and do any required bootstrapping.
It will not typically install anything for the user.
In general, we expect create()
to be called only once per
environment lifetime, while provision()
we expect to be
called every time the environment is modified (one or many
times).
Parameters:
-
**kwargs
(Any
, default:{}
) –Additional keyword arguments passed on to the concrete method
Returns:
-
Nothing
(None
) –Called for side-effects only.
exists()
Test if an environment exists.
This method is not abstract, and generally should not need to be replaced by derived classes.
Returns:
path()
Compute path to the environment contents.
This base method version will return a suitable path within the root. Implementations can use this path directly (say, if the environment path does not need to differ according to platform etc), or compute their own. We might change the logic here in future to make this base-class returned path more generally useful.
Returns:
-
Path
–The path to the directory that will store the environment.
provision(cmd, **kwargs)
abstractmethod
Provision an environment.
Install packages or software into the environment.
Parameters:
-
cmd
(list[str]
) –A command to run in the environment. Most of the time this just calls
hipercow.utils.subprocess_run
directly -
**kwargs
(Any
, default:{}
) –Additional keyword arguments passed through to the concrete method.
Returns:
-
Nothing
(None
) –Called for side-effects only.
run(cmd, *, env=None, **kwargs)
abstractmethod
Run a command within an environment.
Both provisioning and running tasks will run in their context of an environment. This method must be specialised to activate the environment and then run the given shell command.
This method should (eventually) call
hipercow.util.subprocess_run
, returning the value from that
function.
Parameters:
-
cmd
(list[str]
) –The command to run
-
env
(dict[str, str] | None
, default:None
) –An optional dictionary of environment variables that will be set within the environment.
-
**kwargs
(Any
, default:{}
) –Additional methods passed from the provisioner or the task runner.
Return
Information about the completed process. Note that
errors are not thrown unless the keyword argument
check=True
is provided.
Pip
Bases: EnvironmentEngine
Python virtual environments, installed by pip.
check_args(cmd)
Validate pip installation command.
Checks if cmd
is a valid pip
command.
If cmd
is None
or the empty list, we try and guess a
default command, based on files found in your project root.
-
if you have a
pyproject.toml
file, then we will try and runpip install --verbose .
-
if you have a
requirements.txt
, then we will try and runpip install --verbose -r requirements.txt
(In both cases these are returned as a list of arguments.)
If there are other reasonable conventions that we might follow, please let us know.
Parameters:
Returns:
create(**kwargs)
Create the virtual environment.
Calls
python -m venv <path>
with the result of path()
.
Parameters:
-
**kwargs
(Any
, default:{}
) –Additional arguments to
subprocess_run
Returns:
-
None
–Nothing, called for side effects only.
provision(cmd, **kwargs)
run(cmd, *, env=None, **kwargs)
Run a command within the pip virtual environment.
Parameters:
-
cmd
(list[str]
) –The command to run
-
env
(dict[str, str] | None
, default:None
) –Environment variables, passed into
subprocess_run
. We will add additional environment variables to control the virtual environment activation. Note thatPATH
cannot be safely set throughenv
yet, because we have to modify that to activate the virtual environment, and becausesubprocess.Popen
requires thePATH
to be set before finding the program to call on Windows. We may improve this in future. -
**kwargs
(Any
, default:{}
) –Keyword arguments to
subprocess_run
.
Details about the process, if `check=True` is not
-
CompletedProcess
–present in
kwargs
Platform
dataclass
Information about a platform.
The most basic information about a platform that we need to set up
an environment, derived from the the platform
module.
Attributes:
-
system
(str
) –The name of the system, in lowercase. Values will be
linux
,windows
ordarwin
(macOS). We may replace this with anEnum
in future. -
version
(str
) –The python version, as a 3-element version string.
local()
staticmethod
Platform information for the running Python.
A convenience function to construct suitable platform information for the currently running system.