Google Colab Information - neuralabc/documentation GitHub Wiki
Part of the content of this document was inspired by Internet searches and a YouTube video. All of these are linked at the end. The Frequently Asked Questions page available on Google Colab was also used
This document is divided into 3 sections:
- General Overview
- Information about Notebooks
- Resources
What is Google Colab (Colaboratory)? Colab allows you to create notebooks to write and execute Python code in your browser (most browsers should work but the most extensive tests have been conducted with the latest versions of Chrome, Firefox, and Safari). Colab is built on top of Jupyter Notebook, meaning that Colab notebooks are simply Jupyter notebooks hosted by Colab. Google Colab is particularly well suited for data analyses, machine learning, and education.
Positive points of Google Colab:
- Very user-friendly, making it a great tool for exploration and learning how to code.
- When typing code, an auto-complete feature is available. Also, a description of the different parameters of the function or method you are typing will also pop-up when you start typing parameters into the parentheses of a function or method.
- As with other types of Google documents, Colab notebooks can be easily shared with others, meaning that multiple people can work on the same notebook. This also means that Colab notebooks can be used for example code that will be shared with others.
- Since these notebooks are stored in Google Drive, it is not possible to lose them once they are saved.
- Comes with many data science and machine learning libraries already preinstalled (unlike Jupyter Notebook). This means that if the libraries you want to use are already installed, you will only have to worry about importing them and that a person you share a notebook with should be able to run your notebook instantly without having to worry about installing the right libraries. If for some reason a particular library is not already installed, you can easily install it using either !pip install followed by the name of the library or !apt-get install followed by the name of the library.
- Hosted by Google, meaning you do not have to use your own computing power when running code.
- Google’s network is extremely fast, meaning that it shouldn’t take too long to download a data file onto Colab.
- Gives access to GPU (Graphics Processing Unit) and TPU (Tensor Processing Unit). The access to a TPU is particularly interesting because it is not possible to get one for your own computer.
Cons of Google Colab:
- The resources available on Google Colab (e.g., the GPU or TPU) are limited so they might not always be available when you need them (this could occur if many people are already using the resources).
- Not all libraries are preinstalled and those that are not will need to be reinstalled every time you open a new session.
- But any pip packages can be installed with
!pip install <PACKAGE>
- But any pip packages can be installed with
- Everything on Google Colab is part of the public domain, meaning that Colab shouldn’t be used when working with confidential data.
- If you leave your Notebook inactive for too long, your computations might not be saved.
- Can run out of memory (so not ideal if working with very large datasets).
Both a pro and a con:
- Code can be run for 24 hours without interruption (but no more than that).
When you open Google Colab directly (for example from searching “Google Colab” in your search engine), a Home Menu appears and you will have the option to create a new notebook from there (bottom right corner of the menu), and see other recently opened notebooks. There is also a tab called Examples with example notebooks outlining different Google Colab features. Fun fact, these notebooks are interactive, meaning that anyone can modify them.
Here is a small breakdown of the contents of these notebooks:
- Overview of Colab features: discusses the different types of cells in Google Colab (code and text cells), convenience functions available (e.g., system aliases, Magics, tab-completion and exploring code – notice that the first two examples provide explanations that would work just as well in Jupyter as they do in Colab). The integration of Google Colab with Google Drive, which allows you to share, comment, and collaborate on a document/notebook with multiple people at once is also discussed.
- Markdown Guide: text cells in Colab are formatted using Markdown, a simple markup language. This notebook outlines how to see the markdown source and rendered version of a text cell, how to format text cells using Markdown, different examples of markdown text as well as the differences between Colab markdown and other markdown dialects. Some useful references on the topic are also provided.
- Charts in Colab: shows example code to create different types of charts in Colab. Most examples use matplotlib but other libraries are used as well.
- External data: Drive, Sheets, and Cloud Storage: shows code to import and download files from your local file system in Colab, discusses how to access files from Google Drive when in Colab (with examples), how to create a Google Sheet with Python data and download data from a Google Sheet into Python, and how to use Google Colab with Google Cloud Storage.
- Getting started with BigQuery: outlines the steps that need to be completed before starting with BigQuery, how to use BigQuery via Magics, through Google-Cloud BigQuery, and how to use BigQuery through pandas-gbq.
The Home Menu also contains a tab called GitHub that will make it possible for you to input a GitHub URL and find code that is on GitHub, either under an organization’s or user’s name (yes, that means you can search for code on the lab’s GitHub page directly from Google Colab)! Under the Google Drive tab, you will be able to see all your Colab files that are also saved on to your Google Drive. Finally, the Home Menu contains a tab called Import allowing you to import files from your computer in Google Colab.
After closing the Home Menu, (which can be done by pressing “cancel”), you will see some more information about Google Colab in (another) interactive notebook. These include some examples of applications for which Machine Learning can be used in Colab notebooks as well as links to tutorials showing how certain Machine Learning analyses can be conducted on Google Colab. Other resources are also linked (including a notebook providing instructions on how to save and load notebooks to and from GitHub).
To create a new Colab notebook, you can log in to your Google Drive account, and then click on the “New” button located on the top left corner of the page, right under the Google Drive logo. Once you are in the “New” menu, put your mouse over “more” and then click on Colaboratory. If you do not see the Colaboratory option when putting your mouse over “more”, you will want to click on “connect more apps” at the bottom of the menu. A window allowing you to search apps will then pop up and you will be able to search for Colaboratory. When you find Colaboratory, simply click on the “Connect” button and Colaboratory should become available in the previously discussed menu.
When you are in a Colab notebook (whether it be new or already has content), you can insert new text or code cells and start coding. The three vertical dots (found on the right of a code or text cell) allow you to delete or copy a code or text cell.
You can type your code in code cells the same way you would type Python code elsewhere. To run the code in a code cell, you simply need to press the “play” button on the left of the cell.
Text cells are split in half, the left half being where you type your text and the right half showing you a preview of what your text looks like as you type it. The leftmost icon (little T and big T) in a text cell changes your text into header-text, making it bigger and bolder in the right half of the cell. The arrow brackets icon will format your text into code, which means that you can actually enter lines of code into a text cell and make it distinguishable from other text (but the code cannot be run from there). Other icons in text cells allow you to add a link to a web page, add images, indent your text, and add bulleted and numbered lists.
When clicking on the File tab, you will have many options, including importing notebooks that are saved elsewhere, save a copy of the notebook in Google Drive or as a Gist GitHub File, save a copy of the file in GitHub, look through the history of versions for the notebook, and download the .ipynb file or the .py code. In the Display tab, you have the option to look at the notebook’s information and modify its parameters. The Runtime tab gives you the option to choose which part(s) of your code to run. The option “change runtime type” (which will give you the option to select which device to use (either None, GPU, or TPU)) can also be found under this tab. In the Tools tab, you have access to all the commands you can execute in your notebook (e.g., activate/deactivate line numbering, display the notebook’s source code) and shortcuts for some of these commands are provided. If you want to compare two notebooks, you can click on “Difference between two notebooks” in this tab. The Help tab provides you access to a Frequently Asked Questions page and with a quick access to Stack Overflow, where you can ask questions. Moreover, you can search code excerpts from multiple sources, and signal bugs from this tab.
** Note that the tabs on top of the page are available both from your own notebooks and from the notebook you land on after closing the Home Menu**
You will notice that clicking on the topmost icon (which is called Summary) will display a summary of the different sections of your notebook. It is also possible to connect your Google Drive account to your notebook. Doing so will let the code you run in the notebook modify related files saved on Google Drive account and to access files (such as datasets) saved in any folder of your Google Drive account. In order to connect your Google Drive account to a Colab notebook, you need to click on the Folder icon on the left hand side of the notebook. Then you will want to click on the Mount Drive icon (rightmost icon coming up after you click the folder icon), and select “connect to Google Drive”. When connected to Google Drive, you will see a new folder labelled Drive in your folder list. By clicking on it, you will be able to see all your Drive folders and subfolders. From the Folder icon, you can also access an “import files into the session’s storage space” icon (leftmost icon after clicking on the Folder icon) allowing you to import files from your computer into your current runtime environment (the files you import will be deleted when the runtime environment is recycled).
Link to a notebook introducing Python very quickly and some basic Colab features: https://colab.research.google.com/github/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l01c01_introduction_to_colab_and_python.ipynb The Python Data Science Handbook is available on Colab at this link (it can also be found on GitHub): https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb Resources used in the making of this document YouTube video: https://www.youtube.com/watch?v=vVe648dJOdI
Other resources: https://www.reddit.com/r/learnpython/comments/7wkrnl/google_colab_a_free_gpu_enabled_jupyer_notebook/ https://www.forbes.com/sites/gregoryferenstein/2019/08/18/a-review-of-googles-colab-and-cocalc-for-collaborative-data-science/#ac990d56f024 https://www.quora.com/What-advantages-does-Google-Colab-have-over-Jupyter https://www.shoutcoders.com/introduction-to-google-colab/ https://www.kaggle.com/general/105547