Chapter 7 problem set 2

John

Find me Beer using this https://api.openbrewerydb.org/breweries?

Create a dataframe using the api address above with every brewery in the database. Will need to use page and per_page as parameters. per_page max is 50
Filter this data set down to only micro breweries in states that with begin and end with the same letter
From the breweries found in part 2, find the farthest north, south, east, and west breweries. You may need to change the dtype of the columns

Rie

Read csv data named "sRNAalignmentToTE.ForChap7Assignment.013119.csv" in my depository.Inspect the data. (https://github.com/UCD-pbio-rclub/python-data-analysis_RieU/blob/master/sRNAalignmentToTE.ForChap7Assignment.013119.csv)

Answer

data = pd.read_csv('sRNAalignmentToTE.ForChap7Assignment.013119.csv')

Some columns have number in percentage inside of the parentheses. Remove those parentheses, then place the values inside of parentheses to the separate columns.

Answer

#Removing parentheses, then assigning the values in parentheses to the new columns
data[['Aligned','AlginedRatio']] = data['Aligned'].str.split('\(|\)', expand = True).iloc[:,[0,1]]
data[['notAligned','notAlginedRatio']] = data['notAligned'].str.split('\(|\)', expand = True).iloc[:,[0,1]]

#rearranging the order of the columns
col = ['totalReads', 'Aligned', 'AlginedRatio', 'notAligned','notAlginedRatio', 'libName']
data = data.reindex(columns=col)

Kae

Let's do some regex practice!

Create a random DNA sequence of length 30. Using regex, please convert the DNA sequence to RNA.

Answer

import re
import random

def seqstring(len):
    return ''.join(random.choice("ATCG") for _ in range(len))

DNA=seqstring(30)

regex=re.compile("T")
RNA=(regex.sub("U",DNA))
print(RNA)

I work for telemarketing company and I've been given a list of phone numbers the branch needs to call today. Luckily for me, I'm only responsible for calling properly formatted (XXX)XXX-XXXX numbers with area codes of either 603 or 503. What regular expression can I use to pull these numbers from my list? Try this if you're having trouble.

Answer

^\((?:603|503)\)[0-9]{3}-[0-9]{4}$

Min-Yao

1. Using the same data from last week. (Import my RNA-Seq CPM data from 'Expression Browser_CPM_practice.xlsx' file. Expression Browser_CPM_practice.xlsx) Remove the genes that have no expression in all samples and keep other no expression as "0". Check the data distribution in each sample.

Answer

reduced_RNASeq = RNASeq.replace(NA, 0)
reduced_RNASeq.describe()

2. we want to filtering outliers. Please find the genes that have any expression is higher than 15000 in any sample.

Answer

reduced_RNASeq[(np.abs(reduced_RNASeq) > 15000).any(1)]

3. we would like to transform outliers. Please change the expression level exceeding 15000 in absolute value to 15000. Please check the new data distribution in each sample.

Answer

reduced_RNASeq[np.abs(reduced_RNASeq) > 15000] = np.sign(reduced_RNASeq) * 15000
reduced_RNASeq.describe()

Joel

Create a function to define if a string is a palindrome or not.

(A palindrome is a word or sentence that has the same letters backwards and forwards)

a) Replacing whitespaces is a good way to start b) Watch out for upper/lower case differences

Answer

import re

# use the .replace() method to remove the whitespaces 
# use the .lower() method to make it case insensitive

def isPalindrome(myString):
    original = myString.replace(" ","").lower()
    reversed = myString[::-1].replace(" ","").lower()
    if (original == reversed):
        print ("Palindrome")
    else:
        print ("Not a palindrome")

Try with words/sentences like:

"Was it a car or a cat I saw"
"Ceci n'est pas un palindrome"
GCCTTCCG
CGTAATGC
Anita lava la tina
kayak

Chapter 7 problem set 2 - UCD-pbio-rclub/python_problems GitHub Wiki

Chapter 7 problem set 2

John

Rie

Kae

Min-Yao

1. Using the same data from last week. (Import my RNA-Seq CPM data from 'Expression Browser_CPM_practice.xlsx' file. Expression Browser_CPM_practice.xlsx) Remove the genes that have no expression in all samples and keep other no expression as "0". Check the data distribution in each sample.

2. we want to filtering outliers. Please find the genes that have any expression is higher than 15000 in any sample.

3. we would like to transform outliers. Please change the expression level exceeding 15000 in absolute value to 15000. Please check the new data distribution in each sample.

Joel

⚠️ GitHub.com Fallback ⚠️

Chapter 7 problem set 2 - UCD-pbio-rclub/python_problems GitHub Wiki

Chapter 7 problem set 2

John

Rie

Kae

Min-Yao

1. Using the same data from last week. (Import my RNA-Seq CPM data from 'Expression Browser_CPM_practice.xlsx' file. Expression Browser_CPM_practice.xlsx) Remove the genes that have no expression in all samples and keep other no expression as "0". Check the data distribution in each sample.

2. we want to filtering outliers. Please find the genes that have any expression is higher than 15000 in any sample.

3. we would like to transform outliers. Please change the expression level exceeding 15000 in absolute value to 15000. Please check the new data distribution in each sample.

Joel

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️