Functional data cleaning - RooyyDoe/functional-programming GitHub Wiki

Functional programming process

At first I created the map structure of my project and looked into functional programming with a simple but good article that goes in on the basics of this Ali Spittel. She mainly talks about the Core Principles of Functional Programming and gives little examples with this.

I needed to work with a .then chain so that I could make use of async functions and could get rid of the IIFE. This was pretty new for me and I did not know anything about it. Thijs Spijker showed me how all of this worked and showed me examples. Making use of promise chaining would also improve the readability of my code. I started trying to get the .then working in my code and understand how it all worked and what I could do with it. My final .then chain will look as following:

  • .then => Getting rid of the IIFE and changing it into an .then chain
  • .then => RunQuery Fetch function for getting data from API
  • .then => Cleaning the rawContinent data
  • .then => Creating a mainData array with a object inside of it
  • .then => Adding all the category Uri's to mainData
  • .then => Making an object of the category array
  • .then => Promise chain of looping through query 50+ time (Array with objectCount)
  • .then => adding up every objectCount array together from one category
  • .then => Making global objectCount of every continent
  • .then => Calculating the percentage of every category in a continent.
  • .then => Sending it to D3.js

Getting rid of the IIFE and changing it into an .then chain

This is how I started I read my IIFE from top to bottom and tried to figure out how I needed to make my .then chain. The IIFE was just for the time being a good way to get results in the browser console. But for now I need to rebuild it. This will make my code much more readable if someone is looking into it and I can explain it way easier step-by-step.

The idea is that the result is passed through the chain of .then handlers.

  1. The initial promise resolves in 1 second (*),
  2. Then the .then handler is called (**).
  3. The value that it returns is passed to the next .then handler (***)
  4. and so on..

IIFE:

Schermafdruk 2019-11-15 02 04 30

.then chain:

Schermafdruk 2019-11-12 12 36 06

Final .then chain:

Schermafdruk 2019-11-15 02 11 09

RunQuery Fetch function for getting data from API

I needed a fetch function that helped me get the data from the API. I have done this in frontend applications as well, so I took that piece of code as an example and try to rebuild it into my functional programming project. I found out that whenever I tried to recieve the data I was getting a pending response and I did not know how to work around this. This was also when I found out I could use a IIFE. This is also how I started my main .then chain you can just invoke the function and give the url and query as parameters.

Old runQuery

function runQuery(url, query){
	return fetch(url+'?query='+ encodeURIComponent(query) +'&format=json')
		.then(res => res.json()) 
		.then(json => {
			return json.results.bindings;
		});
	}

My new runQuery is an async function that can be re-used for the different queries I am using in my project. It fetches the url and the query and gets all the data in a response and turns it into a json file and saved this in the variable json. At the end of this function it returns the results and when this happens the data can be used outside this function scope.

New runQuery

async function runQuery(url, query){
	let response = await fetch(url+'?query='+ encodeURIComponent(query) +'&format=json');
	let json = await response.json();
	return json.results.bindings;
}	

Cleaning the rawContinent data

Dirty data

Schermafdruk 2019-11-08 12 33 16

The first step in my .then chain was to clean up the continent data that I recieved from my main query. So in my functions I needed one that cleans the API data and I needed this function needs to be dynamic so I could re-use it for all my other queries that I was going to use in this project. So I could just invoke the clean function and it would automatically clean the data for me. This will be done a couple times in my process.

Old Cleaning function

let cleanResults = []
        rawResults.forEach(function (rawResult) {
	for(let key in rawResult) {
	 	cleanResults.push(rawResult[key].value);
	}	
	console.log(rawResults[0].categoryName.value, index);
});

return cleanResults;

In my new cleaning function I also added that It would turn the objectCount into an integer when it hat the datatype of an integer. So as results I would get an array with only integers and to get a total count of the objects I needed to invoke a function for this. This will be explained later in the process.

New Cleaning Function

function cleanData(rawResults) {
	return rawResults.reduce((cleanResults, rawResult) => {
		for(let key in rawResult) {
			if (rawResult[key].datatype === 'http://www.w3.org/2001/XMLSchema#integer') {
				let parsed = parseInt(rawResult[key].value, 10);
				cleanResults.push(parsed);
			} else cleanResults.push(rawResult[key].value);
		}
		return cleanResults;	
	},[]); 
	
}

Creating a mainData array with a object inside of it

To load in my data to d3.js I need to get a certain data structure. I have written this structure down to get a good overview for myself and see how I need to build it. I am going to do this in .then steps and will explain them in this process.

Data Structure Example

Schermafdruk 2019-11-13 21 41 13

I started with creating this data structure with creating a variable mainData. In this variable I am saving the results that I get from cleaning the continent query. I am using map to override the variable and give it an object { Uri: Uri }. This will post a list of the continent Uri and is the start of my data structure.

Mapping object into mainData

.then((mainData) => mainData.map(uri => {
	return {uri: uri}
}))

This function will map over mainData and will returns an object with the five Uri urls in it. This is how it looks like before the function invokes:

Schermafdruk 2019-11-15 09 54 37

And this is after:

Schermafdruk 2019-11-15 09 59 20

Adding all the category Uri's to mainData

As the next .then step I want to create a category array in every continent with all the main category Uri links. To do this I needed to make a function that runs the query to get this data and then invoke it in the .then

getCategorie Function

function getCategories() {
	return new Promise(async(resolve) => {
		const categoryQuery = `
		PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
		PREFIX dc: <http://purl.org/dc/elements/1.1/>
		PREFIX dct: <http://purl.org/dc/terms/>
		PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
		PREFIX edm: <http://www.europeana.eu/schemas/edm/>
		PREFIX foaf: <http://xmlns.com/foaf/0.1/>

		SELECT ?superCategory  WHERE {
				<https://hdl.handle.net/20.500.11840/termmaster2802> skos:narrower ?superCategory .
		}`;

		runQuery(url, categoryQuery)
			.then((rawCategoryData) => cleanData(rawCategoryData))
			// .then((cleanCategoryData => combineContinentWithCategory(continentUriArray, cleanCategoryData)))
			.then((cleanResults) => resolve(cleanResults));
	});
}

I am making use of a promise within this function so that I can return the .then chain out of this function when the promise is done. So when the query runs and the data is cleaned it will resolve it into the global scope so that I can invoke this function at the top .then chain.

I am making an async .then where I am invoking the getCaregories() with an await so it waits till the query data is returned in the global scope and then it will put the data into the catArray variable. It will then map over mainData and makes key categories in continent and put in the value of catArray and after that it will return the object (continent).

.then(async (mainData) => {
	let catArray = await getCategories();

	return mainData.map((continent) => {
		continent.categories = catArray;
		return continent;
	});
})

Old code

Schermafdruk 2019-11-12 12 35 15

In the bottom of the function I made another .then chain. I am running the cleanData over this query and after this I am running my new function combineContinentWithCategory when that function is done running it automatically resolves the promise and goes back to the main .then chain.

Schermafdruk 2019-11-12 12 34 57

In this function I am combining the uri from continent with the uri from the main categories. This is how I wanted to start structuring my data. But when I wanted to figure out how I can count up the objects and add the results to this data structure as well I got stuck.

Schermafdruk 2019-11-12 12 35 37 Schermafdruk 2019-11-12 12 35 50

This is what I got as results from combining the Uri's I wanted to leave this for what it was now. This was because I did not know how to go further on this. I started working on how to fix up that I one big array with only integers for this I made another function with a promise inside this.

Results after new code

This is how the catArray looks

Schermafdruk 2019-11-15 10 31 06

And this is after the function runs

Schermafdruk 2019-11-15 10 33 09

Making an object of the category array

Now that I have an array in continents with all the Uri urls I need to convert this to a Object with the next .then This is exact the same as I have done when I turned the continent Uri urls into an object. Only the categories array is nested into continent so I need to use map two times to get into the categories.

.then((mainData)=> mainData.map(continent => {
	continent.categories = continent.categories.map(uri => {
		return {uri: uri};
	}); 
	return continent;
}))

This is how the data in categories looks when this .then has not run yet.

Schermafdruk 2019-11-15 10 33 09

And this is after the .then runs

Schermafdruk 2019-11-15 10 38 28

Promise chain of looping through query 50+ time (Array with objectCount)

This .then is for getting all the objectCounts of the different categories. In this .then I also need to invoke a function that I made where I am loading in a new query. I need to loop through this query to get all the data I need. But this will give 50+ responses so I need to make a promise.all into my .then to let it work. Thijs Spijker helped me a lot with explaining the promise.all and help me get good results instead of a promise chain.

old function

Also added the last query that I made where I count every object in different categories.

SELECT (COUNT(?category) AS ?categoryAmount) WHERE {
  
       <https://hdl.handle.net/20.500.11840/termmaster3> skos:narrower* ?continent .
  	   ?obj dct:spatial ?continent .
  
  	   <https://hdl.handle.net/20.500.11840/termmaster2803> skos:narrower* ?category .
       ?obj edm:isRelatedTo ?category .
  	   ?category skos:prefLabel ?categoryName .
  	   
} GROUP BY ?categoryName

I needed to turn the string into an integer with parseInt and this is where I used the if statement in cleaningData code. It checks if there is a key.value with datatype and then if it sees that there is an integer in this datatype it changes it into an integer instead of a string and then adds it into a new array. When this is done the countUp function will run over this and add up every integer in this array and make a final value and show this.

Schermafdruk 2019-11-08 13 21 59

After doing this I want to loop through a query and link all the values to each other. I was trying to do this with template literals Thijs Spijker showed me how this worked and how I could use this. At first I needed to make a foreach that looped trough the continents and inside this another foreach that looped through every category.

// foreach that loops over all the cleaned continent results
	cleanedContinentResults.forEach(async continentUri => {
		// foreach that loops over all the cleaned category results
		cleanedCategories.forEach(async categoryUri => {
			// Using template literals to put the results of the foreach into the query
			let catQuery = `
				PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
				PREFIX dc: <http://purl.org/dc/elements/1.1/>
				PREFIX dct: <http://purl.org/dc/terms/>
				PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
				PREFIX edm: <http://www.europeana.eu/schemas/edm/>
				PREFIX foaf: <http://xmlns.com/foaf/0.1/>
	
				SELECT (COUNT(?category) AS ?categoryAmount) WHERE {
					
					<${continentUri}> skos:narrower* ?continent .
						?obj dct:spatial ?continent .
	
					<${categoryUri}> skos:narrower* ?category .
						?obj edm:isRelatedTo ?category .
						?category skos:prefLabel ?categoryName .
					
			} GROUP BY ?categoryName`;
			// Variable with the results of the query that I made with the template literals
			let catResults = await runQuery(url, catQuery );
			// Cleaning the catQuery with my cleaning function
			let cleanResult = cleanData(catResults);
                        console.log(cleanResult);
		});
	});

QueryResult

When I am running this code it returns everything in the console, even empty arrays. I want to clean this up with an if statement so that I get a better overview of the data that I am getting in return.

I wanted to filter all the empty arrays out of this clean result. I did this with an if statement that only shows the arrays with more than one value in it. It is still not perfect but it is a big step. After that made a string with template literals in it to give a overview for what I have right now.

if (cleanResult.length > 0) {
	let finalCatResult = countCategoryResults(cleanResult);
	// Presentation of what I have right now. with template literals.
	console.log(`Continent ${continentUri} has ${finalCatResult} in categorie ${categoryUri}`);
}

BetterQuery

getCountOfCategory() function

This function is a promise because the same reason as the getCategorie() function. But now it is running multiple queries instead of one. So I need to make a promise.all for this to get the data that I need. If I don't do this I am getting a chain with promises that aren't resolved yet


function getCountOfCategory(continentUri, categoryUri) {
	return new Promise(async(resolve) => {
		let totalResult = `
			PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
			PREFIX dc: <http://purl.org/dc/elements/1.1/>
			PREFIX dct: <http://purl.org/dc/terms/>
			PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
			PREFIX edm: <http://www.europeana.eu/schemas/edm/>
			PREFIX foaf: <http://xmlns.com/foaf/0.1/>

			SELECT (COUNT(?category) AS ?categoryAmount) WHERE {
				
				<${continentUri}> skos:narrower* ?continent .
					?obj dct:spatial ?continent .

				<${categoryUri}> skos:narrower* ?category .
					?obj edm:isRelatedTo ?category .
					?category skos:prefLabel ?categoryName .
				
		} GROUP BY ?categoryName`;

		runQuery(url, totalResult)
			.then((rawCountData) => cleanData(rawCountData))
			// .then((countUpData) => countCategoryResults(countUpData))
			.then((countResults) => resolve(countResults));
		// .then((cleanCountData => combineCountWithCategory(continentUriArray, cleanCategoryData)))
		// .then((cleanResults) => resolve(cleanResults));
	});
}

With this async function I am looping through the query multiple times to fill it in with the continent Uri urls and the category Uri urls. I am making a variable with count that is invoking the getCountOfCategory() and gives the continent/categorie Uri urls as argument with it. It will take the first continent and it will go over this till there are no more categorie Uri Urls when this happens it will go to the next continent and do it all over again till it reaches the last continent.

This will give back a chain with promises. For this I am getting all the promises into promise.all in here it waits till the chain is complete and after that it will use an await on the return and after that the object will be returned with the count in it.

	.then(async (mainData) => {
	
		let mainDataPromiseArray = mainData.map(async continent => {

			let categoriesPromiseArray = continent.categories.map(async categorie => {
				let count = await getCountOfCategory(continent.uri, categorie.uri);
				return {
					uri: categorie.uri,
					count: count
				};
	
			});

			let newCategories = await Promise.all(categoriesPromiseArray);

			continent.categories = newCategories;

			return continent;
		});

		let newContinents = await Promise.all(mainDataPromiseArray);

		return newContinents;

	})

Results of .then function

This is how the chain looks like when it is getting all the data

Schermafdruk 2019-11-15 11 31 25

And this is how it is going to look when it is done and returns all the promises

Schermafdruk 2019-11-15 11 44 48

adding up every objectCount array together from one category

Now that I have arrays of objectCount I want to add up all the values in one array and get one value as result. I am doing this in the next .then of the chain. in this .then I am invoking the countCategoryResults() so that it will run that function over all the arrays and put out one value in return.

.then((mainData) => mainData.map(continent => {
	// continent.categories.count = countCategoryResults(continent.categories.count);
	continent.categories = continent.categories.map(category => {
		category.count = countCategoryResults(category.count);
		return category;
	}); 
	return continent;
}))

Results I am getting

This is how one categorie object is looking like

Schermafdruk 2019-11-15 11 58 15

When the .then function runs and it finishes it will return this

Schermafdruk 2019-11-15 11 58 22

countCategoryResults() function

function countCategoryResults(results) { return results.reduce((a, b) => a + b, 0); }

Making global objectCount of every continent

I wanted to have a global count in the continent object. I needed to count up every count of continent.category.count and return that into a new object in continent. This is how that is going to look like

.then((mainData) => mainData.map(continent => {
	let sum = 0;
	for (let i = 0; i < continent.categories.length; i++) {
		sum = continent.categories[i].count + sum;

	}
	return {
		categories: continent.categories,
		uri: continent.uri,
		count: sum
	};
}))

Results of .then function

It will put the totalCount into the sum

Schermafdruk 2019-11-15 12 23 47

And here it will turn the value of sum into count and push it as an object into continent

Schermafdruk 2019-11-15 12 24 00

Calculating the percentage of every category in a continent.

This is one of the last .thens in the chain and I want to calculate the percentage of the category count and add a new object into continent.categories. This is of course nested so I need to use two maps to get into the categories and add the percentages when this is calculated

.then((mainData) => mainData.map(continent => {
	for (let i = 0; i < continent.categories.length; i++) {
		continent.categories[i].percentage = continent.categories[i].count / continent.count;
	}	
	continent.categories = continent.categories.map(categories => {
		return {
			uri: categories.uri,
			count: categories.count,
			percentage: categories.percentage
		};
		
	});
	return continent;
}))

Results

When the for loop is running it will look like this

Schermafdruk 2019-11-15 12 42 25

And when test is getting converted to percentage the final data structure will look like this

Schermafdruk 2019-11-15 12 42 36

Sending it to D3.js

To get it into my d3 code I needed to make an new object where I only put two values in. The name (Uri for now) and the value (Percentage) and then return this so that I could link this to the data variable that is used for the d3 visualization.

.then((mainData) => mainData.map(continent => {
	const test = [];
	continent.categories = continent.categories.map(categories => {
		let obj = {
			axis: categories.uri,
			value: categories.percentage
		};
		test.push(obj); 
	});
	return test;	
}))

My final data structure will look like this

Schermafdruk 2019-11-15 12 56 56

But the final data structure I need to load into d3 will look like

Schermafdruk 2019-11-15 12 51 51
⚠️ **GitHub.com Fallback** ⚠️