Reproducible dependencies and environments

Objectives

There are not many codes that have no dependencies. How should we deal with dependencies?

Instructor note

  • xx min teaching/discussion

How to avoid: “It works on my machine 🤷”

Use a standard way to list dependencies in your project:

  • Python: requirements.txt or environment.yml

  • R: DESCRIPTION or renv.lock

  • Rust: Cargo.lock

  • Julia: Project.toml

  • C/C++/Fortran: CMakeLists.txt or Makefile or spack.yaml or the module system on clusters or containers

  • Other languages: …

Install dependencies into isolated environments:

  • For each project, create a new environment.

  • Don’t install dependencies globally for all projects.

  • Install them from a file which documents them at the same time.

Demonstration

  1. The dependencies in our example project are listed in a environment.yml file.

    Discussion

    • Shouldn’t the dependencies in the environment.yml file be pinned to specific versions?

    • When is a good time to pin them?

  2. We also have a container definition file:

Where to explore more

Exercises

Exercise Reproducibility-1: Time-capsule of dependencies

Imagine the following situation: Five students (A, B, C, D, E) wrote a code that depends on a couple of libraries. They uploaded their projects to GitHub. We now travel 3 years into the future and find their GitHub repositories and try to re-run their code before adapting it.

  • Which version do you expect to be easiest to re-run? Why?

  • What problems do you anticipate in each solution?

A: You find a couple of library imports across the code but that’s it.

B: The README file lists which libraries were used but does not mention any versions.

C: You find a environment.yml file with:

name: student-project
channels:
  - conda-forge
dependencies:
  - scipy
  - numpy
  - sympy
  - click
  - python
  - pip
  - pip:
    - git+https://github.com/someuser/someproject.git@master
    - git+https://github.com/anotheruser/anotherproject.git@master

D: You find a environment.yml file with:

name: student-project
channels:
  - conda-forge
dependencies:
  - scipy=1.3.1
  - numpy=1.16.4
  - sympy=1.4
  - click=7.0
  - python=3.8
  - pip
  - pip:
    - git+https://github.com/someuser/someproject.git@d7b2c7e
    - git+https://github.com/anotheruser/anotherproject.git@sometag

E: You find a environment.yml file with:

name: student-project
channels:
  - conda-forge
dependencies:
  - scipy=1.3.1
  - numpy=1.16.4
  - sympy=1.4
  - click=7.0
  - python=3.8
  - someproject=1.2.3
  - anotherproject=2.3.4

Keypoints

If somebody asks you what dependencies you have in your project, you should be able to answer this question with a file.