03 04 Creating and Visualizing DataFrames - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki

Visualizing data

  • Histogram df["columns"].hist()
  • Bar plots (pandas series).plot(kind="bar",title="")
  • Line Plots
  • Scatter plots dog_pack.plot(x="height_cm", y="weight_kg", kind="scatter")
  • Layering Plot

Bar Plot

# Import matplotlib.pyplot with alias plt
import matplotlib.pyplot as plt

# Look at the first few rows of data
print(avocados.head())

# Get the total number of avocados sold of each size
nb_sold_by_size = avocados.groupby("size")["nb_sold"].sum()

# Create a bar plot of the number of avocados sold by size
nb_sold_by_size.plot(kind="bar")

# Show the plot
plt.show()

Line Plot

# Import matplotlib.pyplot with alias plt
import matplotlib.pyplot as plt

# Get the total number of avocados sold on each date
nb_sold_by_date = avocados.groupby("date")["nb_sold"].sum()

# Create a line plot of the number of avocados sold by date
nb_sold_by_date.plot(kind="line")

# Show the plot
plt.show()

Scatter Plot

# Scatter plot of nb_sold vs avg_price with title
avocados.plot(x="nb_sold",y="avg_price",kind="scatter",title="Number of avocados sold vs. average price")

# Show the plot
plt.show()

layering plot

# Modify bins to 20 , transparency = 0.5
avocados[avocados["type"] == "conventional"]["avg_price"].hist(bins=20,alpha=0.5)

# Modify bins to 20
avocados[avocados["type"] == "organic"]["avg_price"].hist(bins=20,alpha=0.5)

# Add a legend
plt.legend(["conventional", "organic"])

# Show the plot
plt.show()

Missing data

  • .isna() : to detect weather each data is missing
  • .isna().any() : if there's at least one missing value
  • .isna().sum()
  • remove missing values: .dropna()
  • replacing missing values: .fillna(0)

Replacing missing values

To see how replacing missing values can affect the distribution

# From previous step
cols_with_missing = ["small_sold", "large_sold", "xl_sold"]
avocados_2016[cols_with_missing].hist()
plt.show()

# Fill in missing values with 0
avocados_filled = avocados_2016.fillna(0)

# Create histograms of the filled columns
avocados_filled[cols_with_missing].hist()

# Show the plot
plt.show()

Creating DataFrames

  • Dictionaries
    • from a list of dictionaries (by row)
    • from a dictionary of list (by column)

list of dictionaries

# Create a list of dictionaries with new data
avocados_list = [
    {"date": "2019-11-03", "small_sold": 10376832, "large_sold": 7835071},
    {"date": "2019-11-10", "small_sold": 10717154, "large_sold": 	8561348},
]

# Convert list into DataFrame
avocados_2019 = pd.DataFrame(avocados_list)

# Print the new DataFrame
print(avocados_2019)

dictionary of lists

# Create a dictionary of lists with new data
avocados_dict = {
  "date": ["2019-11-17","2019-12-01"],
  "small_sold": [10859987,9291631],
  "large_sold": [7674135,6238096]
}

# Convert dictionary into DataFrame
avocados_2019 = pd.DataFrame(avocados_dict)

# Print the new DataFrame
print(avocados_2019)

Reading and writing CSVs

DataFrame to CSV

  • new_dogs.to_csv("new_dogs_with_bim.csv")