Chapter 8 problem set 2 - UCD-pbio-rclub/python_problems GitHub Wiki
I decided to write a function that produces random bingo balls.
import random
def RandomBingoNumber():
letter = random.choice('BINGO')
if letter == 'B':
number = str(random.randint(1,15))
if letter == 'I':
number = str(random.randint(16,30))
if letter == 'N':
number = str(random.randint(31,45))
if letter == 'G':
number = str(random.randint(46,60))
if letter == 'O':
number = str(random.randint(61,75))
call = {letter:[number]}
return call
Using this function and everything we have learned so far in this book, create a valid bingo card in the form of a data frame. The numbers on your card must be generated using the function above. There are simpler ways to do this, but we going to do it the hard way. Output should look like this:
Using the same data from last week. (Import my RNA-Seq CPM data from 'Expression Browser_CPM_practice.xlsx' file. Expression Browser_CPM_practice.xlsx) Please used your formated DataFrame from last week with hierarchical indexing.
First, please calculate the average expression level of each gene in different genotypes and treatments combination.
Answer
xlsx = pd.ExcelFile('HW7/Expression Browser_CPM_practice.xlsx')
RNASeq = pd.read_excel(xlsx, 'Expression Browser_CPM')
RNASeq.head()
RNASeq = RNASeq.set_index('Name')
RNASeq.head()
RNASeq_col_list = RNASeq.columns.tolist()
RNASeq_col_list
RNASeq_column = pd.DataFrame(RNASeq_col_list)
RNASeq_column
RNASeq_column2 = RNASeq_column[0].apply(lambda x: pd.Series(list(x.replace('_',''))))
RNASeq_column2
RNASeq_column3 = RNASeq_column2.drop(columns=[3])
RNASeq_column3
multiindex_column = pd.MultiIndex.from_frame(RNASeq_column3)
multiindex_column
RNASeq.columns = multiindex_column
RNASeq.head()
RNASeq.columns.names = ['genotype', 'treatments', 'sample_number']
RNASeq.head()
index_list = pd.DataFrame(RNASeq.index.tolist())
index_list
chromosome = index_list[0].apply(lambda x: x[5:7])
chromosome
solycID = pd.Series(RNASeq.index)
solycID
newRNASeq = RNASeq.set_index([solycID, chromosome])
newRNASeq
newRNASeq.index.names = ['SolycID','chromosome']
newRNASeq
newRNASeq.mean(level=['genotype','treatments'], axis=1)
Answer
newRNASeq_mean = newRNASeq.mean(level=['genotype','treatments'], axis=1)
newRNASeq_mean
newRNASeq_mean_column = pd.DataFrame(newRNASeq_mean.columns.tolist())
newRNASeq_mean_column
newRNASeq_mean_column[2] = 'm'
newRNASeq_mean_column
multiindex_mean_column = pd.MultiIndex.from_frame(newRNASeq_mean_column)
multiindex_mean_column
newRNASeq_mean.columns = multiindex_mean_column
newRNASeq_mean
newRNASeq_mean.columns.names = ['genotype', 'treatments', 'sample_number']
newRNASeq_mean.head()
Third, please merge these 2 DataFrames together (add the mean of gene expressions to the formatted DataFrame.) Therefore, we will add 8 new columns with hierarchical indexing. Please order the columns, so the mean column will be at the end of each treatment instead of at the end of the whole file.
Answer
final_RNASeq = pd.merge(newRNASeq,newRNASeq_mean,left_index=True, right_index=True)
final_RNASeq
final_RNASeq0 = final_RNASeq.reindex(columns=list('6523'), level=0)
final_RNASeq0.head()
final_RNASeq1 = final_RNASeq0.reindex(columns=list('ct'), level=1)
final_RNASeq1.head()
Using the merged babyname dataset we made last time (1996 and 1998), try stacking the data.
Answer
import pandas as pd
bb6=pd.read_csv('/Users/klombardo/Garbage/babynames1996.txt', names=['name','sex','count'])
bb8=pd.read_csv('/Users/klombardo/Garbage/babynames1998.txt',names=['name','sex','count'])
new=pd.merge(bb6, bb8, on=['name','sex'],how='outer',indicator=True)
new[new['_merge'].str.contains("right_only|left_only")]
columns=pd.Index(['name','sex','count_x','count_y','_merge'])
data=new.reindex(columns=columns)
ldata=data.stack().reset_index().rename(columns={0:'namestats'})
ldata
Next, try pivoting the data.
Merge the two data files by two columns ('lib' and 'totalReads')
https://github.com/UCD-pbio-rclub/python-data-analysis_RieU/blob/master/Rie.21ntCounts.022119.csv
https://github.com/UCD-pbio-rclub/python-data-analysis_RieU/blob/master/Rie.miRCounts.022119.csv
Answer
import pandas as pd
import numpy as np
sRNAdata = pd.read_csv('Rie.21ntCounts.022119.csv')
miRdata = pd.read_csv('Rie.miRCounts.022119.csv')
mergeddata = pd.merge(sRNAdata, miRdata, on=['lib','totalReads'], how='left')