Count number of record in Dataframe

#Total row in DF
len(df.index)

#Count for Female only
len(df[df['gender_label']=='Female'].index )

#count for specific column. SQL Statement same as "select bmi_risk, count(1) from table group by bmi_risk"
df['bmi_risk'].value_counts()

#using Groupby to count
df.groupby(['smoke','alco']).size()

**Display data structure **

df.info()

Display all the columns in the list This list of columns will be useful for you to copy and paste to select part of columns instead of typing one by one.

df.columns

output of the df.columns from superstore dataset.

Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
       'Customer ID', 'Customer Name', 'Segment', 'Country/Region', 'City',
       'State', 'Postal Code', 'Region', 'Product ID', 'Category',
       'Sub-Category', 'Product Name', 'Sales', 'Quantity', 'Discount',
       'Profit'],
      dtype='object')

df.describe()

Display the top 6 records. This is like select using LIMIT n in postgres or select TOP n in T-SQL

df.head(6)

Display the last 10 records.

df.tail(10)

Display random sampling of 10 records.

df.sample(10)

Extract Dataframe with Null values

df[ df['user_id'].isnull()]

Get statistic of your dataset.

df.describe()

Sort Dataframe

df.sort_values(by=['col1', 'col2'], ascending=True)

Using Groupby

df.groupby(['col1']).sum()
df.groupby(['col1']).mean()
df.groupby(['col1']).median()

Using GroupBy with multiple columns on GroupBy and Calculation culumns

calculate average api_hi and api_lo from combination of active and cardio fields.

df.groupby(['active','cardio'])[['ap_hi','ap_lo']].mean()

Group column is for reference only, it is not created from groupby function.

Group	active	cardio	ap_hi	ap_lo
1	no	no	119.418 419	78.232473
2	no	yes	132.217714	84.076955
3	yes	no	119.235340	78.094759
4	Yes	yes	133.803466	84.746746

Another example using multiple function

# Calculate the min, average, median, max total_stays for each of the market_segment_type
df_pivot.groupby(['market_segment'])[['row_total']].agg(['min','mean','median','max'])

Script	Description
df['Category'].unique()	select distinct category
df['Category'].nunique()	select count distinct category. Return number of distinct records
df['Category'].value_counts()	select Category and return total count of the category. Similar to Select category,count(1) From superstore Group by category
df['Category'].value_counts(normalize=True) * 100 Output: Office Supplies 60.296178 Furniture 21.222734 Technology 18.481089	Get ratio of the category record

EDA Using Chart

Correlation in table format

df.corr()

create heatmap

matrix_corr = df.corr()
sns.heatmap(matrix_corr, annot=True)

Create pairplot

sns.pairplot(df)

Scatterplot for Visualizing Relationship between 2 Variables

sns.scatterplot('Father', 'Height', hue='Gender', data=df)

Exploratory Data Analysis (EDA) - setiamanlhc/python-snippet-code GitHub Wiki

Count number of record in Dataframe

Extract Dataframe with Null values

Sort Dataframe

Using Groupby

Using GroupBy with multiple columns on GroupBy and Calculation culumns

EDA Using Chart

Correlation in table format

create heatmap

Create pairplot

Scatterplot for Visualizing Relationship between 2 Variables

⚠️ GitHub.com Fallback ⚠️

Exploratory Data Analysis (EDA) - setiamanlhc/python-snippet-code GitHub Wiki

Count number of record in Dataframe

Extract Dataframe with Null values

Sort Dataframe

Using Groupby

Using GroupBy with multiple columns on GroupBy and Calculation culumns

EDA Using Chart

Correlation in table format

create heatmap

Create pairplot

Scatterplot for Visualizing Relationship between 2 Variables

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️