Installing the Data Science Stack - KeithChamberlain/PythonJourney Wiki

Original URL: https://github.com/KeithChamberlain/PythonJourney/wiki/Installing-the-Data-Science-Stack

Installing a Data Science Stack for Python

Installing the data science stack for Python on a Macbook Pro starts with installing the latest Anaconda build. Once installed, I set up some Anaconda environments to handle the different python versions needed for different purposes.

In order to work with Git (to come), and the different conda environments in a handy way, there are some terminal customizations to utilize. To deploy the terminal customizations, I use a terminal replacement called Iterm2 with Oh My Zsh, which themes the Iterm2 terminal with the active conda environment and Git repo & status. Git is then installed and used with a corresponding GitHub account to enable version control.

Of course, VS Code (comes with Anaconda), or a suitable editor, along with Jupyter Lab (comes with Anaconda) and a suitable command line editor (CLE) are needed, such as vim (comes stock with MacBook Pro).

Rather than dealing with troubling installs of different environments for things like various flavors of SQL, NoSQL, Google Cloud or AWS, I install Docker. Docker has some advantages over virtualization by using containers. Docker, for example, can be installed on an instance of an AWS EC2 server for quick and compatible installation of big data packages such as Docker container versions of MongoDB and Spark. Along the way, several Python libraries are needed.

I signed up for an AWS account to learn how to use AWS S3 cloud storage and Elastic Compute Cloud (EC2) server instances for discount rates for working with big data.

For near real-time data collection, several cloud developer APIs sounded attractive, such as from Twitter, New York Times (NYT) SoundCloud, . Where APIs are not provided, I installed python packages for web-scraping and developed data-mixing pipelines for later use.

Table of Contents (TOC)

Anaconda

AWS

AWS, its components EC2 (for the server and its local memory), and S3 (for the cloud memory storage) requires a few packages to be able to interface Python. There is a docker container, a command line interface, and some python libraries.

conda install -c conda-forge awscli # To avoid the `pip install`
conda install s3fs
conda install boto3
docker