Chapter 5 problem set 1 - UCD-pbio-rclub/python_problems GitHub Wiki

Enter your problems below. Remember to use markdown formatting.

Julin

Q1

  1. Import the "Tomato.csv" data set from this repository
  2. Make a new data frame that retains the "trt", "hyp", and "species" columns.
  3. Now subset the new data frame do that it only has the S. chilense data.
  4. Calculate the mean of the hyp column separately for the "H" and "L" treatments.

Min-Yao

Q2.

  1. Import the data that we used last week as a DataFrame.
  2. We want to only focus on wyo_leaf_FPsc samples, so please slice out these samples.
  3. In addition, assuming that wyo_leaf_FPsc_02_052 is our control sample, we want to only select the genes having expression level > 1 based on their expression level in wyo_leaf_FPsc_02_052. Please slice out a new DataFrame based on these criteria.
  4. How many genes and samples we have in this new DataFrame?

Rie

Q3.

  1. Import Brapa_cpm.csv dataset that we used last week.
  2. Extract wyo_leaf_FPsc_04_141,wyo_leaf_FPsc_04_170, wyo_leaf_FPsc_04_174 samples, assuming those samples are biological replicates.
  3. Make a new data frame (I call it dataA) with these 3 samples. Get averages for each gene. Append the averages to dataA.

Ruijuan

https://github.com/cuttlefishh/python-for-data-analysis/blob/master/assignments/assignment6.md