Bryce Whitney's Work Report - spatial-data-discovery/sdd-2021 GitHub Wiki

11/04/2021 - Week 10

Summary of Work

Learn NetCDF
Sandbox 6
Discussion 10

Overview of Tasks

Learn NetCDF: Learned about the NetCDF file format and how it is similar and different compared to the HDF5 file format.
Sandbox 6: Completed Sandbox 6 where I transferred the EVI data into a NetCDF file
Discussion 10: Completed Discussion 10 by converting my sandbox 3 ASCII raster data into a NetCDF file, and then reviewed professor Davis' file

Next Steps

Project Idea
Quiz 9
Quiz 10

Question of the Week

What dimensions?: Dimensions are critical to NetCDF files to help users understand what the data is representing. One of the most useful aspects of the NetCDF file format is that one dimension can be infinite. This allows time-series data to be stacked on each other indefinitely, allowing someone to easily access all of the data in one file.

10/28/2021 - Week 9

Summary of Work

Sandbox 5
Learn HDF5
Discussion 9

Overview of Tasks

Sandbox 5: Attempted to complete Sandbox 5. I had success reading and outputting the data in the ASCII raster file, but had some more issues obtaining the .png file in QGIS. I struggled to switch the GeoCRS of the raster data for a while, and am not 100% confident I did it properly.
Learn HDF5: I spent a lot of time reviewing the PowerPoint about the HDF5 file format to be able to complete Sandbox 5
Discussion 9: Completed Discussion 9, talking about the benefits and weaknesses of the HDF5 format, and how they are useful in the deep learning field.

Next Steps

Quiz 9
Discussion 10
Quiz 10

Question of the Week

What attributes?: I think it is better to have too many attributes than too little. This is because you would rather someone have unused data than to have them constantly email you with questions about the dataset so they can analyze it properly. Personally, I think it would be good practice to put information about who created the dataset in the root, and then more data-specific attributes as close to the data it describes as possible.

10/21/2021 - Week 8

Summary of Work

de La Beaujardiere (2019) Reading
Quiz 6
Quiz 7
Quiz 8

Overview of Tasks

de La Beaujardiere (2019) Reading: Completed the reading about The Four V's of Data Science
Quiz 6: Attempted Quiz 6 a bunch of times until I got a perfect score, as I would always miss one or two questions
Quiz 7: Completed Quiz 7 for a perfect score
Quiz 8: Completed Quiz 8 for a perfect score

Next Steps

Discussion 9
Quiz 9
Brainstorm Final Project Ideas

Question of the Week

What's the Challenge?: When writing algorithms in computer science, there is always a trade-off between how much memory is used and how fast the algorithm runs. If you want something to run fast, you usually need to use a lot of memory, and if you don't want to use a lot of memory, you usually need to sacrifice speed. This seems similar to the Five V's of data science. Let's take variability caused by surges in data arrival as an example. Cloud computing makes it so companies only have to pay for resources they use, and not for idle times. But there are sacrifices that need to be made to obtain this. For example, more data security concerns pop up when you are running it through other vendors' servers instead of your own. It doesn't seem possible to get the ideal for all five V's without making significant sacrifices elsewhere.

10/14/2021 - Week 7

Summary of Work

Sandbox 4
Sarafanov et al. (2020) Reading
Discussion 7
Sparse Data Challenge

Overview of Tasks

Sandbox 4: Completed Sandbox 4 which involved creating a script to output whether ASCII raster data files were valid or not. I relied on lots of try/except blocks in my code.
Sarafanov et al. (2020) Reading: Completed this reading to learn more about some gap-filling techniques. It served as a starting point and inspiration for how we would go about completing the Sparse Data Challenge.
Discussion 7: Participated in Disc 7 by describing our process and solution to the Sparse Data Challenge.
Sparse Data Challenge: Worked with Luke Denoncourt to tackle the Sparse Data Challenge. This involved reading in three color bands of ASCII raster files, filling in the missing data with a nearest-neighbor approach, and outputting the resulting ASCII raster files and image they create.

Next Steps

Quiz 6
Quiz 7
de La Beaujardiere (2019) Reading
Discussion 8

Question of the Week

What's the process: For the sparse data challenge, we wanted to have a plan of attack before jumping into the coding. We started by researching different gap-filling techniques and evaluating two things: 1) How effective a method it seemed to be, and 2) How easy it would be to implement in python. Unfortunately, a lot of the methods we looked at were based in R, making it tough to use them in python out of the box. For time reasons, we ultimately decided on a nearest-neighbor approach. Then we wrote some basic pseudocode to lay out what we wanted the code to accomplish, and in what order. This allowed us to be much more efficient when implementing our code because we clearly defined what tasks we wanted functions to complete, and what tasks they didn't need to worry about.

10/06/2021 - Week 6

Summary of Work

Quiz 5
Discussion 5
Discussion 6
Intro to GIS Reading

Overview of Tasks

Quiz 5: Completed Quiz 5 for a perfect score
Discussion 5: Created a flowchart for the typical day of a William & Mary student with Caroline Wall and posted it to the discussion board
Discussion 6: Visualized my ASCII raster data file in QGIS after scaling it down so it didn't cover the entire map
Intro to GIS Reading: Completed the Intro to GIS reading, with emphasis on the QGIS and Python sections

Next Steps

Complete Sandbox 4
Quiz 6
Sparse Data Challenge

Question of the Week

How does it look?: In reference to my ASCII raster file, I scaled it down so it covers the entire southwest quadrant of the map, instead of the entire thing. It is a gradient from the top-left cell to the bottom-right, with increasing values. This leads to darker colors in the top-left, which gradually move to lighter colors as you move to the bottom-right.

09/29/2021 - Week 5

Summary of Work

Sandbox 2
Quiz 4
ASCII Raster Format Reading
Sandbox 3
Koops & Galič Reading

Overview of Tasks

Sandbox 2: Completed sandbox 2 by creating sandbox/sb2-bswhitneyWM.py, which takes in JPEG images, extracts their Exif tags to create a GeoJSON containing the location where each image was taken, and then marks those locations on a map.
Quiz 4: Atmmpted Quiz 4 until I got a perfect score.
ASCII Raster Format Reading: Read about the ASCII Raster data format, focusing on the specific formatting requirements so I could create my own for Sandbox 3
Sandbox 3: Pushed data/bswhitneyWM.txt as my custom ASCII Raster file, which contains 6 rows and 12 columns spanning the entire earth.
Koops & Galič Reading: Completed the reading which discussed the difference between spaces and places, along with some of the different philosophical approaches people have taken in the past.

Next Steps

Discussion 5
Herring et al. Reading
QGIS Print Layout Reading
Sandbox 4
Quiz 5

Question of the Week

What spaces/places?: In simple terms, I believe places are just special, meaningful places. Places are spaces, but spaces are not places. The same way a square is a rectangle but a rectangle is not a square. Individual people will have a different set of "places", as individuals have different locations that are meaningful to them. For example, while the White House is likely a place for everyone in the United States over the age of 6, the same can not be said about my house. My house is meaningful to me, so it is a place, but for someone across the country who knows nothing about me, it is simply a space on earth.

09/22/2021 - Week 4

Summary of Work

Discussion 3 and Podcast
Quiz 3
Schwartz Reading
Discussion 4

Overview of Tasks

Discussion 3 and Podcast: I created my podcast about my utility script randomPassword.py and added it to my "About" page. I then did part B of the discussion by reviewing the dad_joke_emailer.py script created by Luke. I was successfully able to get that running after a minor syntax edit to the code.
Quiz 3: I attempted quiz 3 until I got a perfect score. I had to review reference links in markdown before this could happen.
Schwartz Reading: I completed the reading regarding the ethics of Google's (supposed) Monopoly on the map industry.
Discussion 4: I completed discussion 4 by noting my views and definitions before and after completing the Google Earth activity with Paris in class on 09/20/2021.

Next Steps

Quiz 4
Sandbox 2
Esri ASCII Raster format Reading

Question of the Week

What is Spatial Data?: I believe spatial data is anything that refers to a point or area within a coordinate system. This is most commonly thought of as a map where longitude and latitude represent the coordinates. In sports, spatial data could describe where on the field different events happen, such as data that describes where on a football field every interception was thrown throughout the season. The only constraint I would put is that the coordinates can't be more than three dimensions. This is because humans can't comprehend more than three spatial dimensions, so for all practical purposes, this "spatial data" would likely be useless.

09/15/2021 - Week 3

Summary of Work

Discussion 2
The"About" Page
Utility Script

Overview of Tasks

Discussion 2: I Made a post on GitHub talking about my thoughts on sustainable authorship based on the Tenen & Wythoff and Lowndes readings.
The "About" Page: I created my "About" page, which included a picture of the W&M Ultimate Team, my bio, and a list of my hobbies. I went back and added my script and the accompanying code later.
Utility Script: I created and published my utility script randomPassword.py. It generates a random password of a given length, with the option to set a random seed for reproducibility purposes. I then updated scripts/README.md to reflect the script I added.

Next Steps

Utility Script Podcast
Discussion 3
Quiz 3
Consider Topics for the Project

Question of the Week:

What is Utility?: I believe utility refers to something that is useful or beneficial to one's life. Something that provides utility will make one's life easier, oftentimes helping them to accomplish tasks in a more efficient, less effort-intensive, manner. An example of this could be college notebooks. These provide students a place where they can write notes, and don't have to worry about keeping track of all the different pages and organizing them. This is already done for them, allowing students to focus more on the subject, and less on making sure they don't lose some of their notes.

09/08/2021 - Weeks 1 & 2

Summary of Work

Goldsberry reading
Quiz 1
Sandbox 1
Tenen & Wythoff and Lowndes readings
Quiz 2

Overview of Tasks

Goldsberry Reading: This reading discussed the importance of spatial thinking and how it has been a lacking part of the education system for a while. There is a lot to be learned from the pre-computing era, and lacking this knowledge can lead to scientists lacking the crucial spatial thinking and spatial reasoning skills to accompany the new data visualization techniques provided by computing.
Quiz 1: After completing the Goldsberry reading I attempted this quiz twice until I got a perfect score.
Sandbox 1: I created a script to add two numbers that are entered on the command line using Python's sys library.
Tenen & Wythoff and Lowndes Readings: These readings discussed the importance of sustainable authorship and demonstrated a real-life application in the Lowndes reading. It stressed the importance of using simple formats, such as plain text formats, to communicate findings with others to increase the ease of access and sustainability of any documents people share with the world. There has always been a struggle to communicate scientific findings to the public, so using simple formats that don't require software to read, and are platform-independent, are a great step to making this process easier and more accessible for everyone.
Quiz 2: After completing the two readings for this week, I attempted the quiz twice until I got a perfect score

Next Steps

Discussion 2 post
The "About" Page
Utility script

Question(s) of the Week:

What is spatial?: In the first discussion I mentioned that I have an interest in, and am currently involved in Ultimate Frisbee. While analysis would usually involve crunching numbers, I do believe there could be some spatial analysis of the field. It could be especially useful to analyze where on the field the most scores and turnovers occur, as this could help teams develop a "smarter" offensive plan (according to the data). I think it is very important to expand our view of what counts as "spatial data", as it can be more than looking at data about the earth's surface.
Getting Started: Everything has been going great so far, and I look forward to the rest of the semester!