Chapter 6 problem set 1 - UCD-pbio-rclub/python_problems GitHub Wiki

Julin

I had hoped to have a relatively simple html extraction problem. However I was unable to extract tables from any of the following three sites. Still I am going to put them here and maybe someone else can figure out how to do it:

https://www.wrh.noaa.gov/mesowest/getobext.php?wfo=sto&sid=KEDU&num=72&raw=0

https://www.shastaavalanche.org/weather/weather-stations/skibowl-stats

https://mesowest.utah.edu/cgi-bin/droman/meso_base_dyn.cgi?stn=MSSKI&unit=0&timetype=LOCAL

Min-Yao

  1. Please try to Web Scraping from USDA Food Database. Check the first 5 rows to see the format is correct or not.
  2. Please compute the number of food data from each Food Group or Manufacturer. https://ndb.nal.usda.gov/ndb/search/list

John Davis

I guess I had the same idea, web scraping.

  1. Using the wikipedia article on the python programming language (https://en.wikipedia.org/wiki/Python_(programming_language)) print out a table describing all of Python 3's immutable built-in types

Joel Rodriguez

Download the csv file from https://github.com/rodriguezmDNA/pythonproblems (I downloaded the data originally from Keggle but put it on my github for easier access.

  1. There is something funny about the headers. Use a pandas method to remove redundant data
1 (answer)

The first two rows contain both contain column names, use the first one (index 0) as a header and drop the other (which would be the first row, with index 0 - but technically, second row in the raw data)

sfArt = pd.read_csv('sf-sf-civic-art-collection/sf-civic-art-collection.csv',header=0)
sfArt = sfArt.drop(0) # Remove the first row which has duplicated column names 
sfArt.head(5)
  1. Which artist is the most frequent found in the collection, and how many paintings are there?
2 (answer)

sfArt.artist.value_counts().head(1)
  1. From this artist, what are the names and dates of his works?
3 (answer)

freqArtist = sfArt.artist.value_counts().head(1).index[0]
sfArt[sfArt.artist == freqArtist][["created_at","title"]]

source: https://www.kaggle.com/san-francisco/sf-sf-civic-art-collection#sf-civic-art-collection.csv

Junqi Lu

  1. Scrape UC Davis DC Menu page: https://housing.ucdavis.edu/dining/menus/dining-commons/tercero/ and construct a dictionary to store each meal's dishes.

  2. For each dish, scrape the first 5 image URL from the google image searching page of that dish's name.

  3. Use those URLs to construct a simple HTML file.

  4. Email that HTML from Python to yourself.

⚠️ **GitHub.com Fallback** ⚠️