Orderly Beginnings

Install orderly2

From the mrc-ide r-universe (recommended)

install.packages(
  "orderly2",
  repos = c("https://mrc-ide.r-universe.dev",
            "https://cloud.r-project.org"))

From GitHub using remotes:

remotes::install_github("mrc-ide/orderly2")

From PyPi (for Python)

pip install pyorderly

Check your version

packageVersion("orderly2")
## [1] '1.99.54'

My first orderly report / task

There is a discussion to have here about naming. We might have this in a break…

The setup

First, load the package and create a new empty orderly root.

library(orderly2)
orderly_init("workdir/part1")
## ✔ Created orderly root at '/home/runner/work/orderly-tutorial/orderly-tutorial/workdir/part1'
## ✔ Wrote '.gitignore'

(for the rest of this section, we have setwd() into this directory; you should create an RStudio “Project” here.)

What’s in the box?

fs::dir_tree("workdir/part1")
## workdir/part1
## └── orderly_config.yml

Really:

fs::dir_tree("workdir/part1", all = TRUE)
## workdir/part1
## ├── .gitignore
## ├── .outpack
## │   ├── config.json
## │   ├── location
## │   ├── metadata
## │   └── r
## │       └── git_ok
## └── orderly_config.yml

But leave everything in .outpack/ alone, just like .git/

Create an empty report

orderly_new("example")
## ✔ Created 'src/example/example.R'

Our contents now:

fs::dir_tree("workdir/part1")
## workdir/part1
## ├── orderly_config.yml
## └── src
##     └── example
##         └── example.R

The name example.R is important; this always has the form src/<name>/name.R

Hello orderly world

We have edited src/example/example.R to contain:

d <- data.frame(greeting = "hello", to = "world")
write.csv(d, "hello.csv", row.names = FALSE)

Now we run

id <- orderly_run("example")
## ℹ Starting packet 'example' `20241024-123857-4f24210d` at 2024-10-24 12:38:57.317003
## > d <- data.frame(greeting = "hello", to = "world")
## > write.csv(d, "hello.csv", row.names = FALSE)
## ✔ Finished running 'example.R'
## ℹ Finished 20241024-123857-4f24210d at 2024-10-24 12:38:57.373542 (0.05653906 secs)

Files created

fs::dir_tree("workdir/part1")
## workdir/part1
## ├── archive
## │   └── example
## │       └── 20241024-123857-4f24210d
## │           ├── example.R
## │           └── hello.csv
## ├── draft
## │   └── example
## ├── orderly_config.yml
## └── src
##     └── example
##         └── example.R
  • Directory named after the id (in archive/example)
  • We have copied example.R into the directory
  • Output sits next to inputs
  • Metadata is stored in a hidden location

Contents of hello.csv:

read.csv(file.path("workdir/part1/archive/example", id, "hello.csv"))
##   greeting    to
## 1    hello world

Every packet has a unique id

id
## [1] "20241024-123857-4f24210d"

and a bunch of metadata:

orderly_metadata(id)
## $schema_version
## [1] "0.1.1"
## 
## $name
## [1] "example"
## 
## $id
## [1] "20241024-123857-4f24210d"
## 
## $time
## $time$start
## [1] "2024-10-24 12:38:57 UTC"
## 
## $time$end
## [1] "2024-10-24 12:38:57 UTC"
## 
## 
## $parameters
## NULL
## 
## $files
##        path size
## 1 example.R   95
## 2 hello.csv   32
##                                                                      hash
## 1 sha256:541682d8b8dba9b2ddb4ac5809c03e6bedd58b52ab3e64a662f3f48e66a9639f
## 2 sha256:b9f0704f459f7ad9785ddee01a281d81f95a461dbb682436a263e0b7252e92b7
## 
## $depends
## [1] packet query  files 
## <0 rows> (or 0-length row.names)
## 
## $git
## $git$sha
## [1] "d184c8b2d3c0f223b6592b841d8b9622e552f0c5"
## 
## $git$branch
## [1] "main"
## 
## $git$url
## [1] "https://github.com/mrc-ide/orderly-tutorial"
## 
## 
## $custom
## $custom$orderly
## $custom$orderly$artefacts
## [1] description paths      
## <0 rows> (or 0-length row.names)
## 
## $custom$orderly$role
##        path    role
## 1 example.R orderly
## 
## $custom$orderly$description
## $custom$orderly$description$display
## NULL
## 
## $custom$orderly$description$long
## NULL
## 
## $custom$orderly$description$custom
## NULL
## 
## 
## $custom$orderly$shared
## [1] here  there
## <0 rows> (or 0-length row.names)
## 
## $custom$orderly$session
## $custom$orderly$session$platform
## $custom$orderly$session$platform$version
## [1] "R version 4.4.1 (2024-06-14)"
## 
## $custom$orderly$session$platform$os
## [1] "Ubuntu 22.04.5 LTS"
## 
## $custom$orderly$session$platform$system
## [1] "x86_64, linux-gnu"
## 
## 
## $custom$orderly$session$packages
##        package version attached
## 1     orderly2 1.99.54     TRUE
## 2       crayon   1.5.3    FALSE
## 3        vctrs   0.6.5    FALSE
## 4          cli   3.6.3    FALSE
## 5        knitr    1.48    FALSE
## 6        rlang   1.1.4    FALSE
## 7         xfun    0.48    FALSE
## 8     jsonlite   1.8.9    FALSE
## 9         glue   1.8.0    FALSE
## 10     openssl   2.2.2    FALSE
## 11     askpass   1.2.1    FALSE
## 12   htmltools 0.5.8.1    FALSE
## 13         sys   3.4.3    FALSE
## 14       fansi   1.0.6    FALSE
## 15   rmarkdown    2.28    FALSE
## 16    evaluate   1.0.1    FALSE
## 17      tibble   3.2.1    FALSE
## 18     fastmap   1.2.0    FALSE
## 19        yaml  2.3.10    FALSE
## 20   lifecycle   1.0.4    FALSE
## 21    compiler   4.4.1    FALSE
## 22          fs   1.6.4    FALSE
## 23   pkgconfig   2.0.3    FALSE
## 24      digest  0.6.37    FALSE
## 25        gert   2.1.4    FALSE
## 26          R6   2.5.1    FALSE
## 27        utf8   1.2.4    FALSE
## 28      pillar   1.9.0    FALSE
## 29 credentials   2.0.2    FALSE
## 30    magrittr   2.0.3    FALSE
## 31       withr   3.0.1    FALSE
## 32       tools   4.4.1    FALSE

What is a hash?

A one-way transformation from data to a fairly short string

orderly_hash_data("hello", "sha256")
## [1] "sha256:2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"

Very small changes to the string give large changes to the hash

orderly_hash_data("hel1o", "sha256")
## [1] "sha256:28ad6a376c5c22d1a3ad7f115c0cd8f4b8a9c94f55325c17f9302b6a4e41b29c"

This means we can compare hashes and be confident we are looking at the same file (git does a lot of this)

Run it again, Sam

orderly_run("example")
## ℹ Starting packet 'example' `20241024-123857-73786148` at 2024-10-24 12:38:57.455559
## > d <- data.frame(greeting = "hello", to = "world")
## > write.csv(d, "hello.csv", row.names = FALSE)
## ✔ Finished running 'example.R'
## ℹ Finished 20241024-123857-73786148 at 2024-10-24 12:38:57.48949 (0.03393054 secs)
## [1] "20241024-123857-73786148"
  • We have a new id with the new copy

A copy saved every time we run

Stop naming files data_final-rgf (2).csv, please

fs::dir_tree("workdir/part1")
## workdir/part1
## ├── archive
## │   └── example
## │       ├── 20241024-123857-4f24210d
## │       │   ├── example.R
## │       │   └── hello.csv
## │       └── 20241024-123857-73786148
## │           ├── example.R
## │           └── hello.csv
## ├── draft
## │   └── example
## ├── orderly_config.yml
## └── src
##     └── example
##         └── example.R

A high-level overview of of packets:

orderly_metadata_extract()
##                         id    name parameters
## 1 20241024-123857-4f24210d example           
## 2 20241024-123857-73786148 example

(more on this later).

Next steps