Data Cleaning - MartijnKeesmaat/frontend-data GitHub Wiki

Functional principles

On the Functional programming in Javascript page I look at what functional programming is and how to apply it.

For this assignment, I used the definition from FreeCodeCamp:

  1. Isolated functions - there is no dependence on the state of the program, which includes global variables that are subject to change
  2. Pure functions - the same input always gives the same output
  3. Functions with limited side effects - any changes, or mutations, to the state of the program outside the function are carefully controlled

What is the goal of the data cleaning exercise?

Current

The original dataset has 5 ways the values have been filled in.

  • #000000
  • #fff
  • 000000
  • bruin

Result

The goal is to end up with an array with only valid hex colors. This means a hash sign followed by either 3 or 6 values. These values can be a number from 0-9 or a letter from a-f. The letters can be lowercase and uppercase. If a value has these characteristics but misses the hash, we'll add it. If there is a color name we check if we have the same name stored in our object and then replace that value for a hex.

Empty values

If there is an empty value we replace it with null. These values will not be removed since each value comes from a student. If the empty value would be removed then the order of colors wouldn't equal the order of students.

1 - Fetch and handle the data

The first step is to retrieve data from the local JSON. This JSON was exported from Excel and then converted to JSON. The simplest way of doing this at the time was using an external site. Laurens showed a way to do this with Javascript after I implemented this approach.

fetch('data.json')
  .then(response => response.json())
  .then(json => handleData(json));

const handleData = data => {
  ...
};

2 - Only look at the eye color row

This JSON contains all the survey data. For this exercise, I'll need the eye color row. Here I simply map over the original array, create a new one and retrieve the eye color row. The map method returns a new array. This is good since we don't mutate the original array. The function created also has no side-effects because it only returns a value and doesn't change anything else.

const query = 'Kleur ogen (HEX code bijv.#ff5733)';
const eyeColors = getRow(data, query);

const getRow = (data, query) => data.map(el => el[query]);

3 - Remove empty value

Once we have the eye color array, I removed the empty values. You can check the length and then check if it's higher than 0. Meaning there is a value. What I did here, is check if the value is larger than 3. You can do this since hex codes need at least 3 values. Again no mutation or side-effects.

const filterEmpties = arr => arr.map(color => (color.length > 3 ? color : null));

4 - Convert color names to hex

So, this function is a bit experimental. It doesn't work dynamically yet. The idea of this function is that it checks the string with a value in an object. This object contains a string with a color name and a respective hex value. Some people type in a string like: "Groen, grijs". Therefore I couldn't check if it equals the string and used includes.

// Check each color name and return hex
const convertColorNamesToHex = (arr, colors) => {
  return arr.map(currentColor => {
    const str = currentColor.toString().toLowerCase();
    // TODO make dynamic
    if (str.includes(colors[0].name)) return colors[0].hex;
    if (str.includes(colors[1].name)) return colors[1].hex;
    if (str.includes(colors[2].name)) return colors[2].hex;
    else return currentColor;
  });
};

The object with color names

const validColors = [
  { name: 'bruin', hex: '#795548' },
  { name: 'groen', hex: '#4caf50' },
  { name: 'grijs', hex: '#607d8b' }
  // TODO add more colors
];

5 - Add a hash if it misses

This function checks if the value is valid hex. At first, I wrote an unspecific regex that checked if the value contained 3 or 6 numbers and or letters. Then I realized that hex values only have letters from A to F. With regex you can then use this [a-f]. Hex values can have both lower and uppercase values as well as numbers from 0-9. [A-Fa-f0-9] this part check for a character within those conditions. [A-Fa-f0-9]{6} checks for 6 characters with those conditions. ([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}) check for 6 characters or 3 with those conditions.

By either adding the hash or not, we can check for hex values that are valid and those that are valid but miss a hash. Then if it's a valid hex we return. If it misses the hex, we add it. Then if it doesn't check out, we return null.

This function could be broken up into two functions since it does 2 things. However, sometimes making more single-use functions results in more code and is harder to read.

const addMissingHex = colorArr => {
  return colorArr.map(i => {
    // 3 and 6 char HEX, A-F lower+upper or 0-9
    const validHex = RegExp('#( |[A-Fa-f0-9]{3})');
    const missingHashHex = RegExp('^([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})');

    // if hex misses, add it, otherwise return
    if (missingHashHex.test(i)) return `#${i}`;
    if (validHex.test(i)) return i;
    else return null;
  });
};

Complete script

fetch('data.json')
  .then(response => response.json())
  .then(json => handleData(json));

const validColors = [
  { name: 'bruin', hex: '#795548' },
  { name: 'groen', hex: '#4caf50' },
  { name: 'grijs', hex: '#607d8b' }
  // TODO add more colors
];

const handleData = data => {
  const query = 'Kleur ogen (HEX code bijv.#ff5733)';
  const eyeColors = getRow(data, query);
  const convertedColors = convertColorNamesToHex(eyeColors, validColors);
  const validEyeColors = addMissingHex(filterEmpties(convertedColors));
  renderDOM(validEyeColors);
};

// Only get the correct row
const getRow = (data, query) => data.map(el => el[query]);

// Replace empties with null
const filterEmpties = arr => arr.map(color => (color.length > 3 ? color : null));

// Check each color name and return hex
const convertColorNamesToHex = (arr, colors) => {
  return arr.map(currentColor => {
    const str = currentColor.toString().toLowerCase();
    // TODO make dynamic
    if (str.includes(colors[0].name)) return colors[0].hex;
    if (str.includes(colors[1].name)) return colors[1].hex;
    if (str.includes(colors[2].name)) return colors[2].hex;
    else return currentColor;
  });
};

const addMissingHex = colorArr => {
  return colorArr.map(i => {
    // 3 and 6 char HEX, A-F lower+upper or 0-9
    const validHex = RegExp('#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})');
    const missingHashHex = RegExp('^([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})');

    // if hex misses, add it, otherwise return
    if (missingHashHex.test(i)) return `#${i}`;
    if (validHex.test(i)) return i;
    else return null;
  });
};

const renderDOM = colorArr => {
  colorArr.forEach(color => {
    // create new el for each item
    const el = document.createElement('div');

    // Add attr
    el.classList.add('color-square');
    el.style.backgroundColor = color;
    el.textContent = color;

    // Append to body
    document.body.appendChild(el);
  });
};

// pro review: jeetje.... super man. lekker gewerkt - Marc "swagmeister" Kunst