Chapter 10 problem set 1 - UCD-pbio-rclub/python_problems GitHub Wiki

Chapter 10 problem set 1

John

  1. Load the diamonds dataset.
  2. Find the max price of each cut and color combination
  3. Find the min, max, and average dimensions (x,y,z) of a diamond for each carat size

Min-Yao

Using the same data from chapter 8. (Import my RNA-Seq CPM data from 'Expression Browser_CPM_practice.xlsx' file. Expression Browser_CPM_practice.xlsx).

1. Grouping by genotypes and treatments and focusing on COMT (Solyc03g080180.3), please find which genotype and treatment combination has the highest average expression level, and which combination has the lowest average expression level.

2. Grouping by chromosome number, please find which gene has the highest average expression level, and which gene has the largest expression level changes.

Kae

Using the data from before, group the passengers by embark code and total the survivors in each group.

Joel

Using the weather data from last week (https://raw.githubusercontent.com/hadley/nycflights13/master/data-raw/weather.csv):

  1. Use grouping functions to visualize the average temperature and humidity by airport over the year
1 (answer)

import pandas as pd
import seaborn as sns

# Read the data
weather = pd.read_csv('https://raw.githubusercontent.com/hadley/nycflights13/master/data-raw/weather.csv')

# Subset. The .reset_index() method converts the keys into columns, this will make 
origin_month_mean = weather.groupby(['origin', 'month'])[['temp','humid']].mean().reset_index()

# Plot with seaborn
sns.lineplot(data=origin_month_mean, x="month",y="temp",hue="origin",style="origin",palette="tab10", linewidth=2.5)

sns.lineplot(data=origin_month_mean, x="month",y="humid",hue="origin",style="origin",palette="tab10", linewidth=2.5)
  1. How can you visualize the average daily ranges of humidity over the year (monthly)?
2 (answer)

# Include day in the grouping factors
origin_month_day_mean = weather.groupby(['origin', 'month','day'])[['temp','humid']].mean().reset_index()


# Seaborn will add the ranges of the days within a month as a shade to the mean line.  
sns.lineplot(data=origin_month_day_mean, x="month",y="humid",hue="origin",style="origin",palette="tab10", linewidth=2.5)

# Another way to visualize it is with a boxplot or a swarm
sns.catplot(data=origin_month_day_mean, x="month",y="humid",hue="origin",palette="tab10",kind="swarm",col="origin")

sns.catplot(data=origin_month_day_mean, x="month",y="humid",hue="origin",palette="tab10",kind="boxen",col="origin")

Rie

The topic is a bit off, but the data is for candy ranking for Halloween from FiveThirtyEight. If you are interested, the article is here.

  1. Find how many candies contain chocolate and/or caramel (or neither). 0 means "no" whereas 1 means "yes".

  2. Find the averages for sugar content, price and win percent for each category above.

⚠️ **GitHub.com Fallback** ⚠️