Low code Data Exploration Tools - clizarraga-UAD7/Workshops GitHub Wiki
Data Exploration or Exploratory Data Analysis
[Image credit: Devopedia]
For reading and cleaning data, as well as for doing data analysis, the Pandas Python Library is the preferred choice for every day data science tasks. Pandas also includes a set of essential visualization functions to explore the dataset properties.
We will present a small collection of open source software Python tools that will facilitate us carrying out an Exploratory Data Analysis of a dataset with a small amount of coding necessary.
There is significant number of these type of tools, that we will review:
-
ydata-profiling | Documentation. ydata-profiling (formerly know as pandas-profiling)provides a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas
df.describe()
function,ydata-profiling
delivers an extended analysis of a dataFrame while allowing the data analysis to be exported in different formats such ashtml
andjson
. (Please read installation notes). -
Sweetviz. Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.
-
Lux API | Documentation. Lux is a Python library that makes data science easier by automating certain aspects of the data exploration process. Lux is designed to facilitate faster experimentation with data, even when the user does not have a clear idea of what they are looking for. Lux is integrated with an interactive Jupyter widget that allows users to quickly browse through large collections of data directly within their Jupyter notebooks.
-
DataPrep | Documentation. DataPrep.EDA is the fastest and the easiest EDA (Exploratory Data Analysis) tool in Python. It allows you to understand a Pandas/Dask DataFrame with a few lines of code in seconds.
-
AutoViz. Automatically Visualize any dataset, any size with a single line of code. Now you can save these interactive charts as HTML files automatically with the "html" setting.
Please see Jupyter Notebook with examples
References:
- 11 Open Source Data Exploration Tools You Need to Know in 2023. OSDC - Open Data Science, Medium.
- ydata-profiling Examples.
- Auditing Data Quality with Pandas Profiling. Fabiana Clemente. YData, Medium.
- Powerful EDA (Exploratory Data Analysis) in just two lines of code using Sweetviz. Francois Bertrand. Towards Data Science, Medium.
- Sweetviz Examples in Google Collab.
- Lux API Examples.
- DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python (Paper)
- DataPrep Demo Jupyter Notebook
- Autoviz: The Key to Effortless Data Visualization. Emine Bozkus. DataDrivenInvestor, Medium.
Created: 03/16/2023; Updated: 03/16/2023
Carlos Lizárraga Data Science Institute University of Arizona