Python - MSU-AI/.github GitHub Wiki


Python has become the most popular language for machine learning in recent years. It is also a common language for people who have just started learning programming in general.

Installation

You can find a rather comprehensive installation guide (unofficial) at https://realpython.com/installing-python/. Here, we make some remarks to complement what the webpage did not mention, update, or emphasize very well.

Choosing a version

The latest version is usually good enough for most purposes. Just be aware that some libraries may not catch up with the latest version right after the release. For example, Tensorflow only started supporting Python 3.10 a few months after the release (read more here). You can always visit the library homepage to check it.

Of course, do not also install Python version that is too old, or soon to expire, e.g. support for Python 3.7 has been scheduled on 2023-06-27, meaning that any bugs found after that date will no longer be fixed.

Operating Systems

Pick your poison:

Windows 🪟

Python does not come directly with Windows OS.

There are at least two popular ways of getting Python installed on a Windows machine - through the Anaconda distribution or the full official installer. For whatever reasons, Anaconda has become quite popular at MSU in various CS classes. But here we would recommend against it for now (see here for its downsides).

Also, pay attention to whether you are downloading the 32-bit version or the 64-bit version. Most likely, you are using a 64-bit machine.

image

Mac OS 🍎

Mac OS typically comes with Python. So you may or may not need to install it again. Use your terminal to check the current version installed (see here).

There are at least three popular ways of getting Python installed on a Macbook - through the Anaconda distribution, the Homebrew package manager, or the full official installer. For whatever reasons, Anaconda has become quite popular at MSU in various CS classes. But we would recommend against it for now (see here for its downsides). Homebrew is usually fine if you are already familiar with it. Otherwise, just stick to the official installer. See here for instructions.

Also, pay attention to the chip that your Macbook is using. That will determine whether you should get the universal installer or the Intel-only installer. Read here.

image

Linux 🐧

Many distros already come with Python installed. If it is already the version you need, then you don't have to install it anymore.

There are many ways to get Python installed on a Linux machine. Downloading through the Anaconda distribution is certainly one way, but we would recommend against it for now (see here for its downsides). Other than that, please check out https://realpython.com/installing-python/#how-to-install-python-on-linux.


Package management

When using Python, you almost always want to import some packages to save time and effort. Consider a simple example of computing the sample standard deviation for a given list x_list = [2, 3, 5, 7], which can be implemented without importing any libraries as below:

n = len(x_list)
x_mean = sum(x_list) / n
x_stdev = (sum([(x - x_mean)**2 for x in x_list]) / (n - 1))**0.5 # 2.217355782608345

But unless this is your home assignment, why would you want to waste your time reinventing the wheel? 🦽

During actual development, we almost always prefer to use code written by other people, which has probably been optimized and tested for numerous times. In the Python standard library, there exists a package called statistics, in which we can import and calculate the sample standard deviation with ease:

import statistics
x_stdev = statistics.stdev(x_list) # 2.217355782608345

Package manager

Once you have Python installed, you are already equipped with more than 100 packages that be readily imported from the standard library. But to create artificial intelligence apps, we often need many more specialized libraries or packages, i.e. the third-party libraries. This is when a package manager can be helpful.

There are at least three popular package managers used by Python developers: pip, conda, and brew. We will only focus on pip, the standard package manager for Python. When a package is created or updated, pip almost always has the latest version, whereas the other two would depend on if the package authors are active on those platforms as well. Also, starting from Python 3.4 (~2014), pip is already included with the standard installer. If you have Python, most likely you have already have pip installed too. 😄

Python Package Index

The Python Package Index (PyPI) is where all Python packages can be installed from and published to. Let's take pytest, a popular Python testing tool, for example.

image

To install, simply open up your terminal and type

pip install pytest

Depending on the package you are installing, the installation can take anywhere from a few seconds to hours (very rarely). Assuming there are no any errors, this one single line of "pip install something" is all you need. But before you go crazy and start installing a hundred packages with pip, please read the next section about Virtual Environment. Otherwise, you are at your own risk. 💀

🚨 Because literally anyone could publish to PyPI, sometimes it is not clear which is the official download page. For example, could you easily tell which one is the official pytest package among all the search results?

image

There are three ways to counter this:

  1. The real one usually looks nicer, with logos, links, description, etc. :trollface:
  2. Don't use the search engine within PyPI. Use something like Google.
  3. Visit the package's homepage, e.g. pytest at here, and they will tell you how to install.

To read more, visit https://realpython.com/what-is-pip/.


Virtual Environment

We have already explained the usefulness of packages in Python development in previous section. When developing a real project, you will often find yourself installing numerous packages. Occasionally, some of these packages can only work well with one another for some particular versions.

So here is a hypothetical scenario: You have two projects, project A and project B. Project A, for whatever reason, can only work with pandas of version 1.3 or above. On the other hand, project B can only work with an older version of pandas, version 1.1. How can you deal with this? Use virtual environments!

The venv package

[To be written]

To read more, visit https://realpython.com/python-virtual-environments-a-primer/.


Learning Resources

Beginners

Intermediate/Advanced

⚠️ **GitHub.com Fallback** ⚠️