Putting your Stan model and data into a Jupyter notebook lets your audience work through your analysis step by step. The challenge in a demo, talk, or classroom situation is getting everyone in the room on the same page when that page is your presentation notebook and you want it to display properly and be runnable from their browser. Unfortunately, the law of conservation of energy mandates that the easier things are for your audience, the harder they are for you.
This report takes you through the tedious details of setting up a Jupyter Notebook so that anyone with a modern web browser and a Google account can run your Stan analysis with Google Colaboratory free cloud servers, with plenty of screenshots and technical details. Inasmuch as clouds are always moving and changing, I’ve titled this report “cloud-compute-2020”. While the screenshots may have a sell-by date of Q3 2020, the challenges will remain.
Simplicity and modularity: these packages wrap CmdStan and just provide functions to compile models, do inference, and assemble and save the results; other packages are needed for downstream analysis
Keep up with Stan releases: these interfaces can use any (recent) version of CmdStan, including the current release, Stan 2.23.
Quick and easy installation: minimal dependencies with other packages and no direct calls to C++.
Flexible licensing: BSD-3.
A Jupyter notebook consists of blocks of markdown text interleaved with blocks of statements which unifies the exposition of ideas and arguments with the necessary supporting data, computation, and visualizations. The three core programming languages supported are Julia, Python, and R, hence the name, Ju-Pyt-R. In order to author a Jupyter notebook on your machine, you need a local install of both Python and Jupyter, as outlined in the Jupyter installation instructions. Once the Jupyter server is running, you can then run existing notebooks and create new notebooks via your favorite web browser.
A notebook document is a JSON file with suffix
.ipynb which contains a list of cells, one cell per content block (code, text or image), and a dictionary of metadata which specifies the kernel used to run the notebook, here, either R or Python. When viewed in a browser, the notebook is displayed as an HTML page where the contents of the text cells are rendered by default while the code cells are displayed with controls which allow them to be executed independently. Additional controls allow you to create, edit, and publish notebooks as HTML or pdf documents.
By distributing an
ipynb notebook file together with your Stan model and data, your audience can replicate your analysis on their machine. But in order to do this, they must have a local install of the Jupyter notebook server or other IDE that can run the notebook. This is not always possible; they might not have have enough permissions to install software on their computer or their machine might not have enough memory or a powerful enough CPU to run Stan. You could give them access to your machine by running a Jupyter notebook server for single-user access or JupyterHub server for multiple-user access, but this requires bandwidth, compute power, and careful attention to security. Regarding security, here’s some wisdom from the Jupyter blog, (emphasis added)
It is important to keep in mind that, since arbitrary code execution is the main point of IPython and Jupyter, running a publicly accessible server without authentication or encryption is a very bad idea indeed.
Hence the need for Jupyter servers in the cloud: many instances of someone else’s server where the audience can run your notebook.
It is a truth, universally acknowledged, that there ain’t no such thing as a free lunch. Nonetheless, as of 2020, Google is providing free email via Gmail, free file storage via Google Drive, and free Jupyter notebook servers via Google Colab. To use these services, you must sign up for a Google account. The Colab server instances are limited, as is the amount of storage on your Google Drive, and should you use Gmail, Google examines all messages and serves ads accordingly; there ain’t no such thing as a free lunch, Q.E.D.
Google Colaboratory, nicknamed Colab, is a free Jupyter notebook environment that runs in the cloud, i.e., on remote Google servers. Google makes no promises about VM availability or resources, (see https://research.google.com/colaboratory/faq.html), however you can inspect the virtual machine instance that a notebook is running on via system commands. To see how this works, I created a notebook called
show_VM_specs.ipynb which is stored on my Google Drive in a folder called ‘Colab_Notebooks’. In order to run this notebook, I can either open it with Google Colab or download it and run it locally.