Chapter 6 problem set 1 - UCD-pbio-rclub/python_problems GitHub Wiki
I had hoped to have a relatively simple html extraction problem. However I was unable to extract tables from any of the following three sites. Still I am going to put them here and maybe someone else can figure out how to do it:
https://www.wrh.noaa.gov/mesowest/getobext.php?wfo=sto&sid=KEDU&num=72&raw=0
https://www.shastaavalanche.org/weather/weather-stations/skibowl-stats
https://mesowest.utah.edu/cgi-bin/droman/meso_base_dyn.cgi?stn=MSSKI&unit=0&timetype=LOCAL
- Please try to Web Scraping from USDA Food Database. Check the first 5 rows to see the format is correct or not.
- Please compute the number of food data from each Food Group or Manufacturer. https://ndb.nal.usda.gov/ndb/search/list
I guess I had the same idea, web scraping.
- Using the wikipedia article on the python programming language (https://en.wikipedia.org/wiki/Python_(programming_language)) print out a table describing all of Python 3's immutable built-in types
Download the csv file from https://github.com/rodriguezmDNA/pythonproblems (I downloaded the data originally from Keggle but put it on my github for easier access.
- There is something funny about the headers. Use a pandas method to remove redundant data
1 (answer)
The first two rows contain both contain column names, use the first one (index 0) as a header and drop the other (which would be the first row, with index 0 - but technically, second row in the raw data)
sfArt = pd.read_csv('sf-sf-civic-art-collection/sf-civic-art-collection.csv',header=0)
sfArt = sfArt.drop(0) # Remove the first row which has duplicated column names
sfArt.head(5)
- Which artist is the most frequent found in the collection, and how many paintings are there?
2 (answer)
sfArt.artist.value_counts().head(1)
- From this artist, what are the names and dates of his works?
3 (answer)
freqArtist = sfArt.artist.value_counts().head(1).index[0]
sfArt[sfArt.artist == freqArtist][["created_at","title"]]
source: https://www.kaggle.com/san-francisco/sf-sf-civic-art-collection#sf-civic-art-collection.csv
-
Scrape UC Davis DC Menu page: https://housing.ucdavis.edu/dining/menus/dining-commons/tercero/ and construct a dictionary to store each meal's dishes.
-
For each dish, scrape the first 5 image URL from the google image searching page of that dish's name.
-
Use those URLs to construct a simple HTML file.
-
Email that HTML from Python to yourself.