Version control in Jupyter notebooks
When you’re writing a Jupyter notebook, it’s useful to track changes. This means that you can go back to a previous version of the notebook, or compare different versions. Just like any document, it means that you can make changes without worrying about losing your previous work.
Using Git works to version control Jupyter notebooks, but there are more ergonomic options when using other Jupyter-compatible tools.
Using Git to version control Jupyter notebooks
Jupyter notebooks are just files, so the default option is to version control with Git. This works especially well for production systems or libraries where there is already code being tracked with Git. It fits right in with your existing workflow.
However, this comes with several drawbacks. Git is a tool made for software engineers working primarily in text files. It’s not designed for the specific needs of data scientists or the specific needs of Jupyter notebooks.
- You have to remember to make commits, otherwise, your changes won’t be tracked. When you’re working in a notebook, you’re often making small changes and running code. You might not want to commit every time you run a cell.
- You have to remember to sync with a remote repository, otherwise, your changes won’t be backed up or you will conflict with other people’s changes.
- By default, diffs come up as ugly JSON, which is hard to read.
- Resolving conflicts is almost impossible without specialized tooling.
This is the reality of comparing notebooks in Git without help from more software.
Extra tooling will make your life easier if you want to go down this route. For example, nbdev has impressive capabilities for resolving conflicts and JupyterLab has a Git extension.
Use a Jupyter-compatible notebook with version control built-in
If you don’t need the software engineering discipline that Git offers, there are other options. There are fully managed notebook tools that just have version control built in. You make a change to the notebook, and it’s automatically saved. You can go back to previous versions, and you can see a diff of the changes.
This is a great option for data scientists who want to focus on the data science, not the software engineering.
The best tools are the ones that offer realtime collaboration as well as versioning. This way, you never have to worry about conflicting with other people or writing over each other’s work.
Below are some notebook tools that are Jupyter-compatible, have version control, and have realtime collaboration.
Deepnote
Deepnote is a new kind of data notebook that’s built for collaboration — Jupyter compatible, works magically in the cloud, and sharing is as easy as sending a link.
Hex
The Data Workspace for Teams. Work with data in collaborative SQL and Python notebooks. Share as interactive data apps that anyone can use.
Databricks Notebooks
Collaborate across engineering, data science, and machine learning teams with support for multiple languages, built-in data visualizations, automatic versioning, and operationalization with jobs.
DataCamp Workspace
DataCamp Workspace is an AI-powered data notebook to help you get from data to insights, faster.
Jetbrains Datalore
A powerful online environment for Jupyter notebooks. Use smart coding assistance for Python in online Jupyter notebooks, run code on powerful CPUs and GPUs, collaborate in real-time, and easily share the results.
Nextjournal
Runs anything you can put into a Docker container. Improve your workflow with polyglot notebooks, automatic versioning and real-time collaboration. Save time and money with on-demand provisioning, including GPU support.
Noteable
Noteable is a collaborative notebook platform that enables teams to use and visualize data, together.