Chapter 7 problem set 2 - UCD-pbio-rclub/python_problems GitHub Wiki
Find me Beer using this https://api.openbrewerydb.org/breweries?
-
Create a dataframe using the api address above with every brewery in the database. Will need to use page and per_page as parameters. per_page max is 50
-
Filter this data set down to only micro breweries in states that with begin and end with the same letter
-
From the breweries found in part 2, find the farthest north, south, east, and west breweries. You may need to change the dtype of the columns
- Read csv data named "sRNAalignmentToTE.ForChap7Assignment.013119.csv" in my depository.Inspect the data. (https://github.com/UCD-pbio-rclub/python-data-analysis_RieU/blob/master/sRNAalignmentToTE.ForChap7Assignment.013119.csv)
Answer
data = pd.read_csv('sRNAalignmentToTE.ForChap7Assignment.013119.csv')
- Some columns have number in percentage inside of the parentheses. Remove those parentheses, then place the values inside of parentheses to the separate columns.
Answer
#Removing parentheses, then assigning the values in parentheses to the new columns
data[['Aligned','AlginedRatio']] = data['Aligned'].str.split('\(|\)', expand = True).iloc[:,[0,1]]
data[['notAligned','notAlginedRatio']] = data['notAligned'].str.split('\(|\)', expand = True).iloc[:,[0,1]]
#rearranging the order of the columns
col = ['totalReads', 'Aligned', 'AlginedRatio', 'notAligned','notAlginedRatio', 'libName']
data = data.reindex(columns=col)
Let's do some regex practice!
- Create a random DNA sequence of length 30. Using regex, please convert the DNA sequence to RNA.
Answer
import re
import random
def seqstring(len):
return ''.join(random.choice("ATCG") for _ in range(len))
DNA=seqstring(30)
regex=re.compile("T")
RNA=(regex.sub("U",DNA))
print(RNA)
- I work for telemarketing company and I've been given a list of phone numbers the branch needs to call today. Luckily for me, I'm only responsible for calling properly formatted (XXX)XXX-XXXX numbers with area codes of either 603 or 503. What regular expression can I use to pull these numbers from my list? Try this if you're having trouble.
Answer
^\((?:603|503)\)[0-9]{3}-[0-9]{4}$
1. Using the same data from last week. (Import my RNA-Seq CPM data from 'Expression Browser_CPM_practice.xlsx' file. Expression Browser_CPM_practice.xlsx) Remove the genes that have no expression in all samples and keep other no expression as "0". Check the data distribution in each sample.
Answer
reduced_RNASeq = RNASeq.replace(NA, 0)
reduced_RNASeq.describe()
2. we want to filtering outliers. Please find the genes that have any expression is higher than 15000 in any sample.
Answer
reduced_RNASeq[(np.abs(reduced_RNASeq) > 15000).any(1)]
3. we would like to transform outliers. Please change the expression level exceeding 15000 in absolute value to 15000. Please check the new data distribution in each sample.
Answer
reduced_RNASeq[np.abs(reduced_RNASeq) > 15000] = np.sign(reduced_RNASeq) * 15000
reduced_RNASeq.describe()
- Create a function to define if a string is a palindrome or not.
(A palindrome is a word or sentence that has the same letters backwards and forwards)
a) Replacing whitespaces is a good way to start b) Watch out for upper/lower case differences
Answer
import re
# use the .replace() method to remove the whitespaces
# use the .lower() method to make it case insensitive
def isPalindrome(myString):
original = myString.replace(" ","").lower()
reversed = myString[::-1].replace(" ","").lower()
if (original == reversed):
print ("Palindrome")
else:
print ("Not a palindrome")
Try with words/sentences like:
- "Was it a car or a cat I saw"
- "Ceci n'est pas un palindrome"
- GCCTTCCG
- CGTAATGC
- Anita lava la tina
- kayak