Version control and code review

Objectives

  • Browse commits and branches of a Git repository.

  • Remember that commits are like snapshots of the repository at a certain point in time.

  • Know the difference between Git (something that tracks changes) and GitHub/GitLab (a platform to host Git repositories).

Instructor note

  • xx min teaching/discussion

Why do we need to keep track of versions?

Version control is an answer to the following questions (do you recognize some of them?):

  • “It broke … hopefully I have a working version somewhere?”

  • “Can you please send me the latest version?”

  • “Where is the latest version?”

  • “Which version are you using?”

  • “Which version have the authors used in the paper I am trying to reproduce?

  • “Found a bug! Since when was it there?”

  • “I am sure it used to work. When did it change?”

  • “My laptop is gone. Is my thesis now gone?”

Features: roll-back, branching, merging, collaboration

Problem: Your code worked two days ago, but is giving an error now. You don’t know what you changed.

Problem: You and your colleague want to work on the same code at the same time.

  • Roll-back: you can always go back to a previous version and compare

  • Branching and merging:

    • Work on different ideas at the same time

    • Different people can work on the same code/project without interfering

    • You can experiment with an idea and discard it if it turns out to be a bad idea

Branching explained with a gopher

Image created using https://gopherize.me/ (inspiration).

Reproducibility

Problem: Someone asks you about your results from 5 years ago. Can you get the same results now?

  • How do you indicate which version of your code you have used in your paper?

  • When you find a bug, how do you know when precisely this bug was introduced (Are published results affected? Do you need to inform collaborators or users of your code?).

With version control we can “annotate” code (browse this example online):

Example of a git-annotated code with code and history side-by-side

Example of a git-annotated code with code and history side-by-side.

What we typically like to snapshot

  • Software (this is how it started but Git/GitHub can track a lot more)

  • Scripts

  • Documents (plain text files much better suitable than Word documents, this material is tracked using Git)

  • Manuscripts (Git is great for collaborating/sharing LaTeX or Quarto manuscripts)

  • Configuration files

  • Website sources

  • Data

Demonstration

  • Example repository: https://github.com/coderefinery/planets

  • Commits are like snapshots and if we break something we can go back to a previous snapshot.

  • Commits carry metadata about changes: author, date, commit message, and a checksum.

  • Branches are like parallel universes where you can experiment with changes without affecting the default branch: https://github.com/coderefinery/planets/network (“Insights” -> “Network”)

  • With version control we can annotate code (example).

  • Collaboration: We can fork (make a copy on GitHub), clone (make a copy to our computer), review, compare, share, and discuss.

  • Code review: Others can suggest changes using pull requests or merge requests. These can be reviewed and discussed before they are merged. Conceptually, they are similar to “suggesting changes” in Google Docs.

Where to explore more

Exercises

Exercise Git-1: Turn a project to a Git repo and share it

  1. Create a new directory called myproject with one or few files in it. This represents your own project. It is not yet a Git repository.

  2. Turn this new directory into a Git repository.

  3. Share this repository on GitHub (or GitLab, since it really works the same).

We offer three different paths of how to do this exercise.

  • Via GitHub web interface: easy and can be a good starting point if you are completely new to Git.

  • VS Code is quite easy, since VS Code can offer to create the GitHub repositories for you.

  • Command line: you need to create the repository on GitHub and link it yourself.

Create an repository on GitHub

First log into GitHub, then follow the screenshots and descriptions below.

Screenshot on GitHub before a new repository form is opened

Click on the “plus” symbol on top right, then on “New repository”.

Then:

Screenshot on GitHub just before a new repository is created

Choose a repository name, add a short description, and in this case make sure to check “Add a README file”. Finally “Create repository”.

Upload your files

Now that the repository is created, you can upload your files:

Screenshot on GitHub just before uploading files

Click on the “+” symbol and then on “Upload files”.

Exercise Git-2: Contribute to the example repository

TODO: Have something in example repo that anyone could contribute to?

  • Fork the example repository: https://github.com/coderefinery/planets

  • Create a new branch in your fork and give it a descriptive name.

  • Make a modification on the new branch and create a new commit in the webinterface.

  • The new branch and the new commit now only exist on your branch on your fork, not yet in the original repository.

  • In case you would like to contribute your change back to the original repository, you would create a pull request (you are welcome to try). TODO: Full workflow with Issue and PR description?

TODO: In case you wanted to work on this exercise locally, the process would be the following: Fork on webinterface, clone to local computer, create new branch, work on branch, add, commit to local branch, push to remote - new branch : now same stage as when working in webinterface.

Exercise Git-3: Archaeology using Git annotate (“blame”)

Your goal is to find out when precisely this line was modified last time (which commit)?

Keypoints

TODO