This provides a short tutorial for Google Colab as an alternative to Jupyter for running Python code. We show how to bring in, modify and run a Jupyter Notebook from a Github repository.
Colab (short for “Colaboratory”) is a Google cloud service. It allows users to write and execute Python code in a web-based environment without needing to install anything locally. Within limits, Colab is free to use, and it interacts with a user’s Google Drive, so Colab notebooks can import additional Python libraries from *.py files. Additionally, the instructions here would allow usage from a Chromebook or on a CPU that does not allow local, laptop file storage. To demonstrate Colab, we will use a case study of running the Jupyter Notebook in this Pandas introduction Github repository called Pandas_Intro_For_Noncoders. This tutorial walks step-by-step through using Colab to run the notebook including modifying the repository notebook to import its data from a repository folder on Google Drive.
Note that several helpful code snippets are available in pasteable format at the bottom of this blog.
Opening Colab and Cloning the Pandas_Intro_For_Noncoders Github Repository
- Open Colab by navigating to https://colab.research.google.com. This opens a blank notebook (e.g. Untitled0.ipynb in the picture below)
- To access your Google drive from the notebook, mount the drive by executing the following to Python statements in a cell.
- Type the statements into a blank cell (Can use +Code button to add cells as needed)
- Run the cell by clicking its run button (black circle with triangle) or by clicking in the cell and typing Shift+Enter
- Optionally, add a new folder (e.g. Projects_Python in example) to your Google Drive by clicking on the three-dot menu next to the folder and choosing New Folder.
- The picture shows how to access your Google Drive’s folder tree.
- It is helpful to also open a browser tab pointing to your Google Drive (https://drive.google.com/
- Use the +Code button at the top of the notebook to add two blank cells
- Enter and run the %cd command to change directory to the desired folder
- Enter and run the !git clone command shown below to clone (e.g make a copy of) the Pandas Intro Github repository directly into the selected Google Drive folder.
Opening and Running the Pandas_Fundamentals.ipynb Notebook in Colab
- The Colab window does not have a way to open the notebook directly. Go to your Google Drive tab and right/control-click on the *.ipynb.
- Choose Open With / Google Colaboratory. This opens the notebook in a separate Colab browser tab
- We are done with the previous Untitled0.ipynb notebook. It is ok to close this browser tab
- Note that you can run notebook cells individually or choose Colab’s Runtime / Run All menu
- Jupyter notebooks such as this one typically point to files assuming a hard drive (local folder) address. This causes a FileNotFound error when the notebook tries to open sample Excel data several cells into the notebook
- To point the Colab notebook to your Google Drive folder, insert cells as shown below to a) mount the drive and b) create a prefix string for the Google Drive path
- Add the dir_google_drive string as a filename prefix in the read_excel statement as shown. This allows the notebook to run from the sample data copy on your Google Drive. Be careful to include “/” delimiters as shown and pay attention to case sensitivity.
That gets the repository’s notebook running in Google Colab with the repository’s sample data!
Useful code snippets:
#Mount user's Google Drive without precheck that it is already mounted from google.colab import drive drive.mount('content/drive') #Clone a Github repository as folder to current (Google Drive) working directory #Use !cd xxx to change directory to desired parent folder !git clone https://github.com/jlandgre/Python_Colab_Template.git #Attempt to mount user's Google Drive is_drive_mounted = os.path.exists('\content\drive') if not is_drive_mounted: try: from google.colab import drive drive.mount('\content\drive') except ModuleNotFoundError: #Add statements for case where notebook is run as local Jupyter pass #Set a string prefix for Google Drive directory path dir_google_drive = '/content/drive/My Drive/Projects_Python/Pandas_Intro_For_Noncoders/' #Add project's Google Drive path to sys.path to allow importing *.py libraries dir_libs = dir_google_drive + 'libs' if dir_libs not in sys.path: sys.path.append(dir_libs)