Bryce Whitney's Work Report - spatial-data-discovery/sdd-2021 GitHub Wiki

11/04/2021 - Week 10

Summary of Work

  • Learn NetCDF
  • Sandbox 6
  • Discussion 10

Overview of Tasks

  • Learn NetCDF: Learned about the NetCDF file format and how it is similar and different compared to the HDF5 file format.
  • Sandbox 6: Completed Sandbox 6 where I transferred the EVI data into a NetCDF file
  • Discussion 10: Completed Discussion 10 by converting my sandbox 3 ASCII raster data into a NetCDF file, and then reviewed professor Davis' file

Next Steps

  • Project Idea
  • Quiz 9
  • Quiz 10

Question of the Week

What dimensions?: Dimensions are critical to NetCDF files to help users understand what the data is representing. One of the most useful aspects of the NetCDF file format is that one dimension can be infinite. This allows time-series data to be stacked on each other indefinitely, allowing someone to easily access all of the data in one file.

10/28/2021 - Week 9

Summary of Work

  • Sandbox 5
  • Learn HDF5
  • Discussion 9

Overview of Tasks

  • Sandbox 5: Attempted to complete Sandbox 5. I had success reading and outputting the data in the ASCII raster file, but had some more issues obtaining the .png file in QGIS. I struggled to switch the GeoCRS of the raster data for a while, and am not 100% confident I did it properly.
  • Learn HDF5: I spent a lot of time reviewing the PowerPoint about the HDF5 file format to be able to complete Sandbox 5
  • Discussion 9: Completed Discussion 9, talking about the benefits and weaknesses of the HDF5 format, and how they are useful in the deep learning field.

Next Steps

  • Quiz 9
  • Discussion 10
  • Quiz 10

Question of the Week

What attributes?: I think it is better to have too many attributes than too little. This is because you would rather someone have unused data than to have them constantly email you with questions about the dataset so they can analyze it properly. Personally, I think it would be good practice to put information about who created the dataset in the root, and then more data-specific attributes as close to the data it describes as possible.

10/21/2021 - Week 8

Summary of Work

Overview of Tasks

  • de La Beaujardiere (2019) Reading: Completed the reading about The Four V's of Data Science
  • Quiz 6: Attempted Quiz 6 a bunch of times until I got a perfect score, as I would always miss one or two questions
  • Quiz 7: Completed Quiz 7 for a perfect score
  • Quiz 8: Completed Quiz 8 for a perfect score

Next Steps

  • Discussion 9
  • Quiz 9
  • Brainstorm Final Project Ideas

Question of the Week

What's the Challenge?: When writing algorithms in computer science, there is always a trade-off between how much memory is used and how fast the algorithm runs. If you want something to run fast, you usually need to use a lot of memory, and if you don't want to use a lot of memory, you usually need to sacrifice speed. This seems similar to the Five V's of data science. Let's take variability caused by surges in data arrival as an example. Cloud computing makes it so companies only have to pay for resources they use, and not for idle times. But there are sacrifices that need to be made to obtain this. For example, more data security concerns pop up when you are running it through other vendors' servers instead of your own. It doesn't seem possible to get the ideal for all five V's without making significant sacrifices elsewhere.

10/14/2021 - Week 7

Summary of Work

Overview of Tasks

  • Sandbox 4: Completed Sandbox 4 which involved creating a script to output whether ASCII raster data files were valid or not. I relied on lots of try/except blocks in my code.
  • Sarafanov et al. (2020) Reading: Completed this reading to learn more about some gap-filling techniques. It served as a starting point and inspiration for how we would go about completing the Sparse Data Challenge.
  • Discussion 7: Participated in Disc 7 by describing our process and solution to the Sparse Data Challenge.
  • Sparse Data Challenge: Worked with Luke Denoncourt to tackle the Sparse Data Challenge. This involved reading in three color bands of ASCII raster files, filling in the missing data with a nearest-neighbor approach, and outputting the resulting ASCII raster files and image they create.

Next Steps

Question of the Week

What's the process: For the sparse data challenge, we wanted to have a plan of attack before jumping into the coding. We started by researching different gap-filling techniques and evaluating two things: 1) How effective a method it seemed to be, and 2) How easy it would be to implement in python. Unfortunately, a lot of the methods we looked at were based in R, making it tough to use them in python out of the box. For time reasons, we ultimately decided on a nearest-neighbor approach. Then we wrote some basic pseudocode to lay out what we wanted the code to accomplish, and in what order. This allowed us to be much more efficient when implementing our code because we clearly defined what tasks we wanted functions to complete, and what tasks they didn't need to worry about.

10/06/2021 - Week 6

Summary of Work

Overview of Tasks

  • Quiz 5: Completed Quiz 5 for a perfect score
  • Discussion 5: Created a flowchart for the typical day of a William & Mary student with Caroline Wall and posted it to the discussion board
  • Discussion 6: Visualized my ASCII raster data file in QGIS after scaling it down so it didn't cover the entire map
  • Intro to GIS Reading: Completed the Intro to GIS reading, with emphasis on the QGIS and Python sections

Next Steps

  • Complete Sandbox 4
  • Quiz 6
  • Sparse Data Challenge

Question of the Week

How does it look?: In reference to my ASCII raster file, I scaled it down so it covers the entire southwest quadrant of the map, instead of the entire thing. It is a gradient from the top-left cell to the bottom-right, with increasing values. This leads to darker colors in the top-left, which gradually move to lighter colors as you move to the bottom-right.

09/29/2021 - Week 5

Summary of Work

Overview of Tasks

  • Sandbox 2: Completed sandbox 2 by creating sandbox/sb2-bswhitneyWM.py, which takes in JPEG images, extracts their Exif tags to create a GeoJSON containing the location where each image was taken, and then marks those locations on a map.
  • Quiz 4: Atmmpted Quiz 4 until I got a perfect score.
  • ASCII Raster Format Reading: Read about the ASCII Raster data format, focusing on the specific formatting requirements so I could create my own for Sandbox 3
  • Sandbox 3: Pushed data/bswhitneyWM.txt as my custom ASCII Raster file, which contains 6 rows and 12 columns spanning the entire earth.
  • Koops & Galič Reading: Completed the reading which discussed the difference between spaces and places, along with some of the different philosophical approaches people have taken in the past.

Next Steps

Question of the Week

What spaces/places?: In simple terms, I believe places are just special, meaningful places. Places are spaces, but spaces are not places. The same way a square is a rectangle but a rectangle is not a square. Individual people will have a different set of "places", as individuals have different locations that are meaningful to them. For example, while the White House is likely a place for everyone in the United States over the age of 6, the same can not be said about my house. My house is meaningful to me, so it is a place, but for someone across the country who knows nothing about me, it is simply a space on earth.

09/22/2021 - Week 4

Summary of Work

  • Discussion 3 and Podcast
  • Quiz 3
  • Schwartz Reading
  • Discussion 4

Overview of Tasks

  • Discussion 3 and Podcast: I created my podcast about my utility script randomPassword.py and added it to my "About" page. I then did part B of the discussion by reviewing the dad_joke_emailer.py script created by Luke. I was successfully able to get that running after a minor syntax edit to the code.
  • Quiz 3: I attempted quiz 3 until I got a perfect score. I had to review reference links in markdown before this could happen.
  • Schwartz Reading: I completed the reading regarding the ethics of Google's (supposed) Monopoly on the map industry.
  • Discussion 4: I completed discussion 4 by noting my views and definitions before and after completing the Google Earth activity with Paris in class on 09/20/2021.

Next Steps

Question of the Week

What is Spatial Data?: I believe spatial data is anything that refers to a point or area within a coordinate system. This is most commonly thought of as a map where longitude and latitude represent the coordinates. In sports, spatial data could describe where on the field different events happen, such as data that describes where on a football field every interception was thrown throughout the season. The only constraint I would put is that the coordinates can't be more than three dimensions. This is because humans can't comprehend more than three spatial dimensions, so for all practical purposes, this "spatial data" would likely be useless.

09/15/2021 - Week 3

Summary of Work

  • Discussion 2
  • The"About" Page
  • Utility Script

Overview of Tasks

  • Discussion 2: I Made a post on GitHub talking about my thoughts on sustainable authorship based on the Tenen & Wythoff and Lowndes readings.
  • The "About" Page: I created my "About" page, which included a picture of the W&M Ultimate Team, my bio, and a list of my hobbies. I went back and added my script and the accompanying code later.
  • Utility Script: I created and published my utility script randomPassword.py. It generates a random password of a given length, with the option to set a random seed for reproducibility purposes. I then updated scripts/README.md to reflect the script I added.

Next Steps

  • Utility Script Podcast
  • Discussion 3
  • Quiz 3
  • Consider Topics for the Project

Question of the Week:

What is Utility?: I believe utility refers to something that is useful or beneficial to one's life. Something that provides utility will make one's life easier, oftentimes helping them to accomplish tasks in a more efficient, less effort-intensive, manner. An example of this could be college notebooks. These provide students a place where they can write notes, and don't have to worry about keeping track of all the different pages and organizing them. This is already done for them, allowing students to focus more on the subject, and less on making sure they don't lose some of their notes.

09/08/2021 - Weeks 1 & 2

Summary of Work

Overview of Tasks

  • Goldsberry Reading: This reading discussed the importance of spatial thinking and how it has been a lacking part of the education system for a while. There is a lot to be learned from the pre-computing era, and lacking this knowledge can lead to scientists lacking the crucial spatial thinking and spatial reasoning skills to accompany the new data visualization techniques provided by computing.
  • Quiz 1: After completing the Goldsberry reading I attempted this quiz twice until I got a perfect score.
  • Sandbox 1: I created a script to add two numbers that are entered on the command line using Python's sys library.
  • Tenen & Wythoff and Lowndes Readings: These readings discussed the importance of sustainable authorship and demonstrated a real-life application in the Lowndes reading. It stressed the importance of using simple formats, such as plain text formats, to communicate findings with others to increase the ease of access and sustainability of any documents people share with the world. There has always been a struggle to communicate scientific findings to the public, so using simple formats that don't require software to read, and are platform-independent, are a great step to making this process easier and more accessible for everyone.
  • Quiz 2: After completing the two readings for this week, I attempted the quiz twice until I got a perfect score

Next Steps

  • Discussion 2 post
  • The "About" Page
  • Utility script

Question(s) of the Week:

  1. What is spatial?: In the first discussion I mentioned that I have an interest in, and am currently involved in Ultimate Frisbee. While analysis would usually involve crunching numbers, I do believe there could be some spatial analysis of the field. It could be especially useful to analyze where on the field the most scores and turnovers occur, as this could help teams develop a "smarter" offensive plan (according to the data). I think it is very important to expand our view of what counts as "spatial data", as it can be more than looking at data about the earth's surface.
  2. Getting Started: Everything has been going great so far, and I look forward to the rest of the semester!