KubeFlow Custom Jupyter Image (+ github for notebook source control)

I’ve been playing around a bit with KubeFlow a bit lately and found that a lot of the tutorials and examples of Jupyter notebooks on KubeFlow do a lot of the pip install and other sort of setup and config stuff in the notebook itself which feels icky.

But, in reality, if you were working in Jupyter notebooks on KubeFlow for real you’d want to build a lot of this into the image used to build the notebook server. Luckily, as with most of KubeFlow, its pretty flexible to customize and extend as you want, in this case by adding custom jupyter images.

Two main example use cases you’d want to do this are for ensuring some custom python package (e.g. my_utils) you have built is readily available in all your notebooks, and other external libraries that you use all the time are also available – e.g. kubeflow pipelines.

To that end, here is a Dockerfile that illustrates this (and here is corresponding image on docker hub).

ARG BASE_CONTAINER=gcr.io/kubeflow-images-public/tensorflow-1.13.1-notebook-cpu:v0.5.0
FROM $BASE_CONTAINER
LABEL maintainer="andrewm4894@gmail.com"
LABEL version="01"
RUN pip3 install git+https://github.com/andrewm4894/my_utils.git#egg=my_utils
RUN pip3 install kfp –upgrade
ENV NB_PREFIX /
CMD ["sh","-c", "jupyter notebook –notebook-dir=/home/jovyan –ip=0.0.0.0 –no-browser –allow-root –port=8888 –NotebookApp.token='' –NotebookApp.password='' –NotebookApp.allow_origin='*' –NotebookApp.base_url=${NB_PREFIX}"]
view raw Dockerfile hosted with ❤ by GitHub

Once you have such a custom image building fine it’s pretty easy to just point KubeFlow at it when creating a Jupyter notebook server.

Just specify your custom image

Now when you create a new workbook on that jupyter server you have all your custom goodness ready to go.

Github for notebooks

As i was looking around it seems like there is currently plans to implement some git functionality into the notebooks on KubeFlow in a bit more of a native way (see this issue).

For now i decided to just create a ssh key (help docs) for the persistent workspace volume connected to the notebook server (see step 10 here).

Then once you want to git push from your notebook server you can just hack together a little notebook like this that you can use as a poor man’s git ui 🙂