cleaning - NathanNeelis/functional-programming GitHub Wiki
Cleaning data
My goal is to clean data to useful arrays of data.
At the start, I used the survey data from the datavis course.
Further in my progress, I start cleaning data from the RDW API's.
Content
Survey data column: Aantal glazen water per dag
Survey data column: Kleur ogen
RDW data: ...
Cleaning RDW data or other data from API's will be added later on
Survey data column: Aantal glazen water per dag
Because my coding skills are still a bit rusty, I started off with an easy data column to clean.
The data in this column are the amount of glasses water the participant drinks each day.
Original column information
- 93 datapoints
- string data
To do list
- remove all empty keys
- convert strings to number / integer data.
Remove all empty keys
In order to use the data, I want to remove all empty keys in the array.
The code below is how:
function removeEmptySlots(arr) {
let cleanData = arr.filter(keys => keys != "");
return cleanData;
}
Convert strings to integers
To use this data I can imagine one would prefer the data in numbers (integers) instead of strings.
In the code below I convert the string to integers by using the following code:
function stringToNumbers(arr) {
let newCleanData = arr.map(x => +x);
return newCleanData;
}
New clean column information
- 92 datapoints
- numbers
Survey data column: Oog kleur
To challenge myself a bit further I choose to clean the eye color data column.
I want all eye colors to be a HEX color code.
Original column information
- 93 datapoints
- string data
- Most of the data is a Hex color code (#d2691e)
To do list
- Convert all input to lowercase
- Convert all color names to hex color codes
- Remove all spaces in-between strings
- Check if there is a hashtag in front of the code
Convert all input to lowercase
Before I can start converting all color names to hex color codes, I want all color names to have lowercases.
Below here is the code I wrote to convert all array data to lowercases:
function toLowerCase(arr) {
let newCleanData = arr.map(x => x.toLowerCase());
return newCleanData;
}
Convert all color names to hex color codes
To have a uniform array of hex color codes I have to convert all color names to color codes.
I checked on a website (see resources) which color name pares with which hex color code.
After that I converted the names by using the .replace() method shown in the code below:
Not all of these color names are present in the original data, but for future data, I took all of the main colors to convert.
function replaceColorNamesToHexcolors(arr) {
var cleanData = arr.map(
x => {
return x
.replace(/blauw/, '#0000FF')
.replace(/blue/, '#0000FF')
.replace(/groen/, '#008000')
.replace(/green/, '#008000')
.replace(/bruin/, '#A52A2A')
.replace(/brown/, '#A52A2A')
.replace(/rood/, '#FF0000')
.replace(/red/, '#FF0000')
.replace(/roze/, '#FFC0CB')
.replace(/pink/, '#FFC0CB')
.replace(/oranje/, '#FFA500')
.replace(/orange/, '#FFA500')
.replace(/geel/, '#FFFF00')
.replace(/yellow/, '#FFFF00')
.replace(/paars/, '#800080')
.replace(/purple/, '#800080')
.replace(/grijs/, '#808080')
.replace(/gray/, '#808080')
.replace(/wit/, '#FFFFFF')
.replace(/white/, '#FFFFFF');
});
return cleanData; // Array with colornames converted to hex colors.
}
Resource: https://stackoverflow.com/questions/953311/replace-string-in-javascript-array
Resource: https://stackoverflow.com/questions/7990879/how-to-combine-str-replace-expressions-in-javascript
Resource: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace
Resource: Color data https://htmlcolorcodes.com/color-names/
Remove all spaces in strings
There were some data points that had a space between the # and the color code.
To make these data entries to valid color codes I have to remove the spaces.
function removeSpaces(arr) {
let cleanData = arr.map(keys => keys.replace(/ /, ''));
return cleanData;
}
Check if there is a hashtag in front of the code
There are a few data entries that just have the color code without the hashtag.
To make sure these are valid color codes I have to add a hashtag in front of the code.
I got a bit stuck here, because I wanted to do this with (indexOf()), but concluded that this wasn't going to work.
I took a sneak peek at the code of my support group member 'Marco' and found out he was using the charAt() method, this was my break through.
function hexCheck(arr) { // Check if arrayItems start with #
let cleanData = arr;
for (result in cleanData) {
if (arr[result].charAt(0) !== '#') { // If the first char is nog a #
cleanData[result] = '#' + cleanData[result] // add the # infront of the string
}
}
return cleanData // return array with added #
}
New data information
So after all my data cleaning I still have 93 entry points. But there are still 2 data points not valid.
One is a combination of color code and color name.
The other is an RGB entry point.
At this point, the course advances toward the RDW datasets, and for now I decided to move on.