Chapter 8 problem set 2 - UCD-pbio-rclub/python_problems GitHub Wiki

Chapter 8 problem set 2

John

I decided to write a function that produces random bingo balls.

import random
def RandomBingoNumber():
    letter = random.choice('BINGO')
    if letter == 'B':
        number = str(random.randint(1,15))
    if letter == 'I':
        number = str(random.randint(16,30))
    if letter == 'N':
        number = str(random.randint(31,45))
    if letter == 'G':
        number = str(random.randint(46,60))
    if letter == 'O':
        number = str(random.randint(61,75))
    call = {letter:[number]}
    return call

Using this function and everything we have learned so far in this book, create a valid bingo card in the form of a data frame. The numbers on your card must be generated using the function above. There are simpler ways to do this, but we going to do it the hard way. Output should look like this:
Bingo Card

Min-Yao

Using the same data from last week. (Import my RNA-Seq CPM data from 'Expression Browser_CPM_practice.xlsx' file. Expression Browser_CPM_practice.xlsx) Please used your formated DataFrame from last week with hierarchical indexing.

We are going to use the average of gene expressions for further analysis.

First, please calculate the average expression level of each gene in different genotypes and treatments combination.

Answer

xlsx = pd.ExcelFile('HW7/Expression Browser_CPM_practice.xlsx')
RNASeq = pd.read_excel(xlsx, 'Expression Browser_CPM')
RNASeq.head()
RNASeq = RNASeq.set_index('Name')
RNASeq.head()
RNASeq_col_list = RNASeq.columns.tolist()
RNASeq_col_list
RNASeq_column = pd.DataFrame(RNASeq_col_list)
RNASeq_column
RNASeq_column2 = RNASeq_column[0].apply(lambda x: pd.Series(list(x.replace('_',''))))
RNASeq_column2
RNASeq_column3 = RNASeq_column2.drop(columns=[3])
RNASeq_column3
multiindex_column = pd.MultiIndex.from_frame(RNASeq_column3)
multiindex_column
RNASeq.columns = multiindex_column
RNASeq.head()
RNASeq.columns.names = ['genotype', 'treatments', 'sample_number']
RNASeq.head()
index_list = pd.DataFrame(RNASeq.index.tolist())
index_list
chromosome = index_list[0].apply(lambda x: x[5:7])
chromosome
solycID = pd.Series(RNASeq.index)
solycID
newRNASeq = RNASeq.set_index([solycID, chromosome])
newRNASeq
newRNASeq.index.names = ['SolycID','chromosome']
newRNASeq
newRNASeq.mean(level=['genotype','treatments'], axis=1)

Second, please make a new DataFrame of these mean of gene expressions with hierarchical indexing.

Answer

newRNASeq_mean = newRNASeq.mean(level=['genotype','treatments'], axis=1)
newRNASeq_mean
newRNASeq_mean_column = pd.DataFrame(newRNASeq_mean.columns.tolist())
newRNASeq_mean_column
newRNASeq_mean_column[2] = 'm'
newRNASeq_mean_column
multiindex_mean_column = pd.MultiIndex.from_frame(newRNASeq_mean_column)
multiindex_mean_column
newRNASeq_mean.columns = multiindex_mean_column
newRNASeq_mean
newRNASeq_mean.columns.names = ['genotype', 'treatments', 'sample_number']
newRNASeq_mean.head()

Third, please merge these 2 DataFrames together (add the mean of gene expressions to the formatted DataFrame.) Therefore, we will add 8 new columns with hierarchical indexing. Please order the columns, so the mean column will be at the end of each treatment instead of at the end of the whole file.

Answer

final_RNASeq = pd.merge(newRNASeq,newRNASeq_mean,left_index=True, right_index=True)
final_RNASeq
final_RNASeq0 = final_RNASeq.reindex(columns=list('6523'), level=0)
final_RNASeq0.head()
final_RNASeq1 = final_RNASeq0.reindex(columns=list('ct'), level=1)
final_RNASeq1.head()

Kae

Using the merged babyname dataset we made last time (1996 and 1998), try stacking the data.

Answer

import pandas as pd
bb6=pd.read_csv('/Users/klombardo/Garbage/babynames1996.txt', names=['name','sex','count'])
bb8=pd.read_csv('/Users/klombardo/Garbage/babynames1998.txt',names=['name','sex','count'])

new=pd.merge(bb6, bb8, on=['name','sex'],how='outer',indicator=True)
new[new['_merge'].str.contains("right_only|left_only")]
columns=pd.Index(['name','sex','count_x','count_y','_merge'])
data=new.reindex(columns=columns)
ldata=data.stack().reset_index().rename(columns={0:'namestats'})
ldata

Next, try pivoting the data.

Rie

Merge the two data files by two columns ('lib' and 'totalReads')

https://github.com/UCD-pbio-rclub/python-data-analysis_RieU/blob/master/Rie.21ntCounts.022119.csv

https://github.com/UCD-pbio-rclub/python-data-analysis_RieU/blob/master/Rie.miRCounts.022119.csv

Answer

import pandas as pd
import numpy as np

sRNAdata = pd.read_csv('Rie.21ntCounts.022119.csv')
miRdata = pd.read_csv('Rie.miRCounts.022119.csv')

mergeddata = pd.merge(sRNAdata, miRdata, on=['lib','totalReads'], how='left')
⚠️ **GitHub.com Fallback** ⚠️