5_what_are_the_tools - lotusflyer/hack_2018 GitHub Wiki

Tools for Data Science Analytics

There are many tools for doing statistical analysis, visualizations and predictive analytics.

Some of the older and very widely used commercial tools would include SPSS and SAS.

Open Source Tools include RapidMiner, the statistics programming language R and its libraries, the general purpose programming language Python and its libraries.

Commercial numerical analysis computing tools include MATLAB and Mathmatica.

Advanced Analytics tools can provide a drag and drop GUI or a programming language or both.

In this demonstration we will focus on:

  • Open Source and therefore free tools
  • That use programming languages
  • That combine Numerical Analysis with Statistics and Machine learning
  • That have an interactive exploratory interface

The Tools Chosen

  • Version control --> Git and Github
  • Programming Language and Framework --> Python
  • Python package manager --> Pip and Pipenv
  • Interactive development environment --> Jupyter Notebooks
  • Numerical Analysis --> Numpy and Pandas
  • Statistics --> Statsmodels
  • Machine Learning --> Scikit-learn
  • Plotting and Visualizations --> Matplotlib

Lets get started!