Accessing data - Griffith-ICT/1701ICT-Creative-Coding GitHub Wiki
Historically data was stored in local files on the computer you were using. Today, it is increasingly more common for data to be located remotely, on another computer, across the network.
p5.js provides a number of mechanisms for accessing files. Which mechanism you use depends on the file type and where it is located.
One focus of this course is data visualisation. One aspect we will look at is using open data, i.e. data that is freely usable. Open data is commonly available in comma separated values (CSV) text files as these files are simple to read and write with computer programs. Another popular format is JSON (JavaScript Object Notation) as this format is directly supported by JavaScript, however it is also supported by many other languages now too. A more general purpose format is XML, however JSON has become more popular than XML for a number of reasons, one being that it uses less characters.
p5.js provides built in support for raw text, CSV, JSON, and XML files.
The loadStrings() function will read an entire text file and return each line as a separate string in an array. This is not overly useful for working with data as we would generally need to do more processing of the data.
We are going to focus today on the CSV file type.
CSV is a common file type which is simply a raw text file with a specific format. Each line represents a record of data, and each attribute or that record (or column) is separated by a comma. There are some variations of CSV, some files may include a header line describing each column, others may use a different delimiter (e.g. a tab). p5.js' loadTable() function can load CSV files and supports headers and tab delimiters.
An example CSV file with a header line describing each column:
Scientific Name,Common Name,Current Scientific Name,Threatened status,ACT,NSW,NT,QLD,SA,TAS,VIC,WA,ACI,CKI,CI,CSI,JBT,NFI,HMI,AAT,CMA,ListedID,CurrentID,Kingdom,Class,Profile,Date extracted
"Neophoca cinerea","Australian Sea-lion, Australian Sea Lion","-","Vulnerable","-","-","-","-","Yes","-","-","Yes","-","-","-","-","-","-","-","-","Yes","22","-","Animalia","Mammalia","http://www.environment.gov.au/cgi-bin/sprat/public/publicspecies.pl?taxon_id=22","2017-Mar-20"
"Mirounga leonina","Southern Elephant Seal","-","Vulnerable","-","-","-","-","Yes","Yes","-","-","-","-","-","-","-","-","Yes","Yes","Yes","26","-","Animalia","Mammalia","http://www.environment.gov.au/cgi-bin/sprat/public/publicspecies.pl?taxon_id=26","2017-Mar-20"
"Balaenoptera borealis","Sei Whale","-","Vulnerable","-","Yes","-","Yes","Yes","Yes","Yes","Yes","-","Yes","Yes","Yes","-","-","Yes","Yes","Yes","34","-","Animalia","Mammalia","http://www.environment.gov.au/cgi-bin/sprat/public/publicspecies.pl?taxon_id=34","2017-Mar-20"
This data is from an open data set on the Australian Government's Open Data Portal.
There appear to be some problems in loading files over the network so the best thing to do for now is to upload the CSV file to your project directory in the same way we do for images.
To load this file using the loadTable() function we will need to indicate that the file contains a header row:
var table = loadTable("epbcthreatenedspecies.csv", "header");
The loadTable() function returns a special p5.js table object. The table object contains a function getRowCount() which returns the number of rows.
print("Number of rows: " + table.getRowCount());
In this example we have used the print() function. The print() function outputs text to the console. This is useful for debugging.
In this case it prints out that there are 1808 rows in the file:
Number of rows: 1808
We can also get the number of columns using table.getColumnCount() which in this case returns 27.
The p5.js table object provides a get() function which allows us to get the value at a particular row and column. The row is specified first followed by the column. The row is a number starting from 0, whereas the column may either also be a number starting from zero or the title of the column.
For example the following returns the Scientific Name for the first row:
print("Scientific Name: " + table.get(0, "Scientific Name"));
Scientific Name: Neophoca cinerea
We could use a loop to get all of the scientific names:
for (var i = 0; i < table.getRowCount(); i++) {
print("Scientific name for row " + i + ": " + table.get(i, "Scientific Name"));
}
e.g.:
Scientific name for row 1802: Chamelaucium sp. Gingin (N.G.Marchant 6)
Scientific name for row 1803: Acacia terminalis subsp. terminalis MS
Scientific name for row 1804: Hypotaenidia philippensis andrewsi
Scientific name for row 1805: Hypotaenidia philippensis macquariensis
Scientific name for row 1806: Diomedea epomophora
Scientific name for row 1807: Diomedea exulans
In the data file each species is labelled as either Extinct, Endangered, or Vulnerable. We want to know the number of Extinct species.
We can use a for loop, access the "Threatened status" column of each row, and test if it has the value "Extinct".
var extinct = 0;
for (var i = 0; i < table.getRowCount(); i++) {
if (table.get(i, "Threatened status") == "Extinct") {
extinct++;
}
}
print("Number of extinct species: " + extinct);
Number of extinct species: 90
We can narrow our search to only count the number of extinct species in Queensland. The QLD column contains "Yes" if that species is found in Queensland.
var extinct = 0;
for (var i = 0; i < table.getRowCount(); i++) {
if (table.get(i, "Threatened status") == "Extinct" && table.get(i, "QLD") == "Yes") {
extinct++;
}
}
print("Number of extinct species in Qld: " + extinct);
Number of extinct species in Qld: 26