Functional data cleaning - RooyyDoe/functional-programming GitHub Wiki
At first I created the map structure of my project and looked into functional programming with a simple but good article that goes in on the basics of this Ali Spittel. She mainly talks about the Core Principles of Functional Programming and gives little examples with this.
I needed to work with a .then
chain so that I could make use of async functions and could get rid of the IIFE
. This was pretty new for me and I did not know anything about it. Thijs Spijker showed me how all of this worked and showed me examples. Making use of promise chaining would also improve the readability of my code. I started trying to get the .then
working in my code and understand how it all worked and what I could do with it. My final .then
chain will look as following:
- .then => Getting rid of the
IIFE
and changing it into an.then
chain - .then => RunQuery Fetch function for getting data from API
- .then => Cleaning the rawContinent data
- .then => Creating a mainData array with a object inside of it
- .then => Adding all the category Uri's to mainData
- .then => Making an object of the category array
- .then => Promise chain of looping through query 50+ time (Array with objectCount)
- .then => adding up every objectCount array together from one category
- .then => Making global objectCount of every continent
- .then => Calculating the percentage of every category in a continent.
- .then => Sending it to D3.js
This is how I started I read my IIFE
from top to bottom and tried to figure out how I needed to make my .then
chain. The IIFE
was just for the time being a good way to get results in the browser console. But for now I need to rebuild it. This will make my code much more readable if someone is looking into it and I can explain it way easier step-by-step.
The idea is that the result is passed through the chain of .then
handlers.
- The initial promise resolves in 1 second
(*)
, - Then the
.then
handler is called(**)
. - The value that it returns is passed to the next
.then
handler(***)
- and so on..
IIFE
:
.then
chain:
Final .then
chain:
I needed a fetch
function that helped me get the data from the API
. I have done this in frontend applications as well, so I took that piece of code as an example and try to rebuild it into my functional programming project. I found out that whenever I tried to recieve the data I was getting a pending response
and I did not know how to work around this. This was also when I found out I could use a IIFE
. This is also how I started my main .then
chain you can just invoke the function and give the url and query as parameters.
Old runQuery
function runQuery(url, query){
return fetch(url+'?query='+ encodeURIComponent(query) +'&format=json')
.then(res => res.json())
.then(json => {
return json.results.bindings;
});
}
My new runQuery is an async
function that can be re-used for the different queries I am using in my project. It fetches the url and the query and gets all the data in a response
and turns it into a json
file and saved this in the variable json
. At the end of this function it returns the results and when this happens the data can be used outside this function scope.
New runQuery
async function runQuery(url, query){
let response = await fetch(url+'?query='+ encodeURIComponent(query) +'&format=json');
let json = await response.json();
return json.results.bindings;
}
Dirty data
The first step in my .then
chain was to clean up the continent data that I recieved from my main query. So in my functions I needed one that cleans the API
data and I needed this function needs to be dynamic so I could re-use it for all my other queries that I was going to use in this project. So I could just invoke
the clean function and it would automatically clean the data for me. This will be done a couple times in my process.
Old Cleaning function
let cleanResults = []
rawResults.forEach(function (rawResult) {
for(let key in rawResult) {
cleanResults.push(rawResult[key].value);
}
console.log(rawResults[0].categoryName.value, index);
});
return cleanResults;
In my new cleaning function I also added that It would turn the objectCount into an integer
when it hat the datatype
of an integer
. So as results I would get an array with only integers and to get a total count of the objects I needed to invoke
a function for this. This will be explained later in the process.
New Cleaning Function
function cleanData(rawResults) {
return rawResults.reduce((cleanResults, rawResult) => {
for(let key in rawResult) {
if (rawResult[key].datatype === 'http://www.w3.org/2001/XMLSchema#integer') {
let parsed = parseInt(rawResult[key].value, 10);
cleanResults.push(parsed);
} else cleanResults.push(rawResult[key].value);
}
return cleanResults;
},[]);
}
To load in my data to d3.js
I need to get a certain data structure. I have written this structure down to get a good overview for myself and see how I need to build it. I am going to do this in .then
steps and will explain them in this process.
Data Structure Example
I started with creating this data structure with creating a variable mainData
. In this variable I am saving the results that I get from cleaning the continent query. I am using map to override the variable and give it an object { Uri: Uri }
. This will post a list of the continent Uri and is the start of my data structure.
Mapping object into mainData
.then((mainData) => mainData.map(uri => {
return {uri: uri}
}))
This function will map over mainData and will returns an object with the five Uri urls in it. This is how it looks like before the function invokes:
And this is after:
As the next .then
step I want to create a category array in every continent with all the main category Uri
links. To do this I needed to make a function that runs the query to get this data and then invoke it in the .then
getCategorie Function
function getCategories() {
return new Promise(async(resolve) => {
const categoryQuery = `
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?superCategory WHERE {
<https://hdl.handle.net/20.500.11840/termmaster2802> skos:narrower ?superCategory .
}`;
runQuery(url, categoryQuery)
.then((rawCategoryData) => cleanData(rawCategoryData))
// .then((cleanCategoryData => combineContinentWithCategory(continentUriArray, cleanCategoryData)))
.then((cleanResults) => resolve(cleanResults));
});
}
I am making use of a promise within this function so that I can return the .then
chain out of this function when the promise is done. So when the query runs and the data is cleaned it will resolve it into the global scope so that I can invoke this function at the top .then
chain.
I am making an async .then
where I am invoking the getCaregories()
with an await
so it waits till the query data is returned in the global scope and then it will put the data into the catArray
variable. It will then map over mainData and makes key categories in continent and put in the value of catArray
and after that it will return the object
(continent).
.then(async (mainData) => {
let catArray = await getCategories();
return mainData.map((continent) => {
continent.categories = catArray;
return continent;
});
})
Old code
In the bottom of the function I made another .then
chain. I am running the cleanData
over this query and after this I am running my new function combineContinentWithCategory
when that function is done running it automatically resolves the promise
and goes back to the main .then
chain.
In this function I am combining the uri
from continent with the uri
from the main categories. This is how I wanted to start structuring my data. But when I wanted to figure out how I can count up the objects
and add the results to this data structure as well I got stuck.
This is what I got as results from combining the Uri's I wanted to leave this for what it was now. This was because I did not know how to go further on this. I started working on how to fix up that I one big array with only integers
for this I made another function with a promise inside this.
Results after new code
This is how the catArray
looks
And this is after the function runs
Now that I have an array in continents with all the Uri urls I need to convert this to a Object with the next .then
This is exact the same as I have done when I turned the continent Uri urls into an object. Only the categories array is nested into continent so I need to use map two times to get into the categories.
.then((mainData)=> mainData.map(continent => {
continent.categories = continent.categories.map(uri => {
return {uri: uri};
});
return continent;
}))
This is how the data in categories looks when this .then
has not run yet.
And this is after the .then
runs
This .then
is for getting all the objectCounts
of the different categories. In this .then
I also need to invoke a function that I made where I am loading in a new query. I need to loop through this query to get all the data I need. But this will give 50+ responses so I need to make a promise.all into my .then
to let it work. Thijs Spijker helped me a lot with explaining the promise.all and help me get good results instead of a promise chain.
old function
Also added the last query that I made where I count every object in different categories.
SELECT (COUNT(?category) AS ?categoryAmount) WHERE {
<https://hdl.handle.net/20.500.11840/termmaster3> skos:narrower* ?continent .
?obj dct:spatial ?continent .
<https://hdl.handle.net/20.500.11840/termmaster2803> skos:narrower* ?category .
?obj edm:isRelatedTo ?category .
?category skos:prefLabel ?categoryName .
} GROUP BY ?categoryName
I needed to turn the string into an integer with parseInt
and this is where I used the if statement in cleaningData code. It checks if there is a key.value
with datatype and then if it sees that there is an integer in this datatype it changes it into an integer instead of a string and then adds it into a new array. When this is done the countUp function will run over this and add up every integer in this array and make a final value and show this.
After doing this I want to loop through a query and link all the values to each other. I was trying to do this with template literals
Thijs Spijker showed me how this worked and how I could use this. At first I needed to make a foreach that looped trough the continents and inside this another foreach
that looped through every category.
// foreach that loops over all the cleaned continent results
cleanedContinentResults.forEach(async continentUri => {
// foreach that loops over all the cleaned category results
cleanedCategories.forEach(async categoryUri => {
// Using template literals to put the results of the foreach into the query
let catQuery = `
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT (COUNT(?category) AS ?categoryAmount) WHERE {
<${continentUri}> skos:narrower* ?continent .
?obj dct:spatial ?continent .
<${categoryUri}> skos:narrower* ?category .
?obj edm:isRelatedTo ?category .
?category skos:prefLabel ?categoryName .
} GROUP BY ?categoryName`;
// Variable with the results of the query that I made with the template literals
let catResults = await runQuery(url, catQuery );
// Cleaning the catQuery with my cleaning function
let cleanResult = cleanData(catResults);
console.log(cleanResult);
});
});
When I am running this code it returns everything in the console, even empty arrays. I want to clean this up with an if statement so that I get a better overview of the data that I am getting in return.
I wanted to filter all the empty arrays out of this clean result. I did this with an if statement that only shows the arrays with more than one value in it. It is still not perfect but it is a big step. After that made a string with template literals in it to give a overview for what I have right now.
if (cleanResult.length > 0) {
let finalCatResult = countCategoryResults(cleanResult);
// Presentation of what I have right now. with template literals.
console.log(`Continent ${continentUri} has ${finalCatResult} in categorie ${categoryUri}`);
}
getCountOfCategory()
function
This function is a promise because the same reason as the getCategorie()
function. But now it is running multiple queries instead of one. So I need to make a promise.all
for this to get the data that I need. If I don't do this I am getting a chain with promises
that aren't resolved yet
function getCountOfCategory(continentUri, categoryUri) {
return new Promise(async(resolve) => {
let totalResult = `
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT (COUNT(?category) AS ?categoryAmount) WHERE {
<${continentUri}> skos:narrower* ?continent .
?obj dct:spatial ?continent .
<${categoryUri}> skos:narrower* ?category .
?obj edm:isRelatedTo ?category .
?category skos:prefLabel ?categoryName .
} GROUP BY ?categoryName`;
runQuery(url, totalResult)
.then((rawCountData) => cleanData(rawCountData))
// .then((countUpData) => countCategoryResults(countUpData))
.then((countResults) => resolve(countResults));
// .then((cleanCountData => combineCountWithCategory(continentUriArray, cleanCategoryData)))
// .then((cleanResults) => resolve(cleanResults));
});
}
With this async function I am looping through the query multiple times to fill it in with the continent Uri urls and the category Uri urls. I am making a variable with count
that is invoking the getCountOfCategory()
and gives the continent/categorie Uri urls as argument with it. It will take the first continent and it will go over this till there are no more categorie Uri Urls when this happens it will go to the next continent and do it all over again till it reaches the last continent.
This will give back a chain with promises. For this I am getting all the promises into promise.all
in here it waits till the chain is complete and after that it will use an await on the return and after that the object will be returned with the count in it.
.then(async (mainData) => {
let mainDataPromiseArray = mainData.map(async continent => {
let categoriesPromiseArray = continent.categories.map(async categorie => {
let count = await getCountOfCategory(continent.uri, categorie.uri);
return {
uri: categorie.uri,
count: count
};
});
let newCategories = await Promise.all(categoriesPromiseArray);
continent.categories = newCategories;
return continent;
});
let newContinents = await Promise.all(mainDataPromiseArray);
return newContinents;
})
Results of .then
function
This is how the chain looks like when it is getting all the data
And this is how it is going to look when it is done and returns all the promises
Now that I have arrays of objectCount
I want to add up all the values in one array and get one value as result. I am doing this in the next .then
of the chain. in this .then
I am invoking the countCategoryResults()
so that it will run that function over all the arrays and put out one value in return.
.then((mainData) => mainData.map(continent => {
// continent.categories.count = countCategoryResults(continent.categories.count);
continent.categories = continent.categories.map(category => {
category.count = countCategoryResults(category.count);
return category;
});
return continent;
}))
Results I am getting
This is how one categorie object is looking like
When the .then
function runs and it finishes it will return this
countCategoryResults()
function
function countCategoryResults(results) { return results.reduce((a, b) => a + b, 0); }
I wanted to have a global count in the continent object. I needed to count up every count of continent.category.count
and return that into a new object in continent. This is how that is going to look like
.then((mainData) => mainData.map(continent => {
let sum = 0;
for (let i = 0; i < continent.categories.length; i++) {
sum = continent.categories[i].count + sum;
}
return {
categories: continent.categories,
uri: continent.uri,
count: sum
};
}))
Results of .then
function
It will put the totalCount
into the sum
And here it will turn the value of sum
into count and push it as an object into continent
This is one of the last .thens
in the chain and I want to calculate the percentage of the category count and add a new object into continent.categories
. This is of course nested so I need to use two maps to get into the categories and add the percentages when this is calculated
.then((mainData) => mainData.map(continent => {
for (let i = 0; i < continent.categories.length; i++) {
continent.categories[i].percentage = continent.categories[i].count / continent.count;
}
continent.categories = continent.categories.map(categories => {
return {
uri: categories.uri,
count: categories.count,
percentage: categories.percentage
};
});
return continent;
}))
Results
When the for loop is running it will look like this
And when test is getting converted to percentage the final data structure will look like this
To get it into my d3 code I needed to make an new object where I only put two values in. The name (Uri for now) and the value (Percentage) and then return this so that I could link this to the data variable that is used for the d3 visualization.
.then((mainData) => mainData.map(continent => {
const test = [];
continent.categories = continent.categories.map(categories => {
let obj = {
axis: categories.uri,
value: categories.percentage
};
test.push(obj);
});
return test;
}))
My final data structure will look like this
But the final data structure I need to load into d3 will look like