3 Process - kylebot0/functional-programming GitHub Wiki

Intro

On monday we had an introduction to cleaning data. It was important because D3 doesn't just accept any kind of data, it needs to be clean. We also had a small introduction to D3, but what is D3?

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.

Next to that we also got to meet Rik, Rik specializes in collections and their data. He wanted to see a visualisation based on the materials that were used the most throughout the collection. This sounded like a great idea to visualize, so i decided to stick with it. I also found a useful diagram that i could visualize it with

Concept

To view the concept and it's different versions check here

Data cleaning

After we chose what kinda data we were gonna use, i saw that i didn't need a lot of data cleaning. So i had to do an exercise that cleaned some of the data up. More on that you can find on: 4 Exercises.

Visualisation

First i did some research on what i wanted to accomplish, i wanted to get data from the materials. I had to use somekind of query that searched all of that data. For more detail on the query check: 5 Query. I first got had to do this query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT  ?materialLabel (COUNT(?materialBroad3) AS ?countMaterialLabel) 
WHERE {
  ?cho dct:medium ?medium .
  ?medium skos:broader ?materialBroad .
  ?materialBroad skos:broader ?materialBroad2 .
  ?materialBroad2 skos:broader ?materialBroad3 .
  ?materialBroad3 skos:prefLabel ?materialLabel .
  #FILTER(REGEX (?materialLabel, "Materiaal en techniek")) . 
}ORDER BY DESC(?countMaterialLabel)
LIMIT 10000

Using the count function

(COUNT(?materialBroad3) AS ?countMaterialLabel)

I get all of the counts of the type of material in the database, i figured this was good. But then i wanted a more narrow search, a problem i saw was that the count was just off on the more narrower function. Sometimes it gave me a count of just 1, or a count of 1.300.000, which is the entire collection. But i figured if i wanted a normal pie chart i would just need the normal broad search.

Next steps

I changed my mind after some thinking, i wanted it to be something i was really proud of. So i looked up some sunburst diagrams, which are a more evolved version of a pie chart. And thought it looked cool, yet functional. The first step was to make the visualisation with some fake data. I looked up some tutorials and found this. It explained step by step on how to make the diagram, as i thought it was pretty difficult in the beginning, turned out to be not that hard. I also didn't want to re-invent the wheel, so i got a lot of inspiration from here.

Structure

To make the sunburst diagram, D3 expects some kind of structure. It's a normal array, or json file with a top node and children nodes attached to them. Below is an example on what it looks like.

{
  "name": "TOPICS",
  "children": [
       {
          "name": "Sub A2",
          "count": 4
        }
      ]
    },
    {
      "name": "Topic B",
      "children": [
        {
          "name": "Sub B1",
          "count": 3
        },
        {
          "name": "Sub B2",
          "count": 3
        },
        {
          "name": "Sub B3",
          "count": 3
        }
      ]
    },
    {
      "name": "Topic C",
      "children": [
        {
          "name": "Sub A1",
          "count": 4
        },
        {
          "name": "Sub A2",
          "count": 4
        }
      ]
    }
  ]
}

It expects some kind of name and count value with it. Which was convenient because the query i'd written contained both of those values. But there was a problem, the data gotten from the query didn't look like this, so i had to clean some up. The original data actually looked like this:

{
        "medium": {
          "type": "uri",
          "value": "https://hdl.handle.net/20.500.11840/termmaster26637"
        },
        "materialLabel": {
          "type": "literal",
          "value": "hout"
        },
        "countMaterialLabel": {
          "type": "typed-literal",
          "datatype": "http://www.w3.org/2001/XMLSchema#integer",
          "value": "94520"
        }
      },
      {
        "medium": {
          "type": "uri",
          "value": "https://hdl.handle.net/20.500.11840/termmaster26983"
        },
        "materialLabel": {
          "type": "literal",
          "value": "fotopapier"
        },
        "countMaterialLabel": {
          "type": "typed-literal",
          "datatype": "http://www.w3.org/2001/XMLSchema#integer",
          "value": "83418"
        }
      },
function changeJsonParent(results) {
  let newArray = [{ name: "Materials", children: [] }];
  results.forEach(e => {
    let currentObject = {
      uri: e.medium.value,
      name: e.materialLabel.value,
      //   value: e.countMaterialLabel.value,
      children: []
    };
    newArray[0].children.push(currentObject);
  });
  return newArray;
}

To change the data i wrote this function. It makes a newArray with the new values gotten from the original query. The newArray contains a top node with the name materials and an array of children, which is empty. And with the forEach i map over every item and get the values from the query, i then push them to a newArray which i return to the fetch.

D3 SVG

I then pass all of the data from the changeJsonParent function to the makeSVG function disguised as nodeData. A bit of this code is from https://bl.ocks.org/denjn5/e1cdbbe586ac31747b4a304f8f86efa5 and it's just not possible to find anything else about sunburst diagrams, with code that i fully understand.

function makeSVG(nodeData) {
  console.log(nodeData);
  const width = screen.width;
  const height = screen.height / 1.3 ;
  const radius = Math.min(width, height) / 2;
  const color = d3.scaleOrdinal(
    //   d3.schemeSet3
      d3.quantize(d3.interpolateRainbow, nodeData[0].children.length + 1)
      );

I also declare some global variables in this function like width and height, so that i can use them later in my project. The color is something i will get to later, because it's quite complicated.

Next i declare to my svg a couple of groups. i transform it so it becomes a perfect circle.

const g = d3
    .select("svg")
    .attr("width", width)
    .attr("height", height)
    .append("g")
    .attr("transform", "translate(" + width / 2 + "," + height / 2 + ")");

To determine if the hierarchy of my data is correct and used the correct structure i use this piece of code below. I create the partitions of the data to calculate what size its going to be. This is a built in function from D3. Next i define the root of my nodeData. I use the function d3.hierarchy() to set the root of my data and determine if the structure is correct, then if it is i should sum up all of the values. A best practice is to give all of my children the count values and not the parent. But because my data isn't correct i'll still have to look at it.

 // Data strucure
  const partition = d3.partition().size([2 * Math.PI, radius]);

  // Find data root and sets hierarchy
  const root = d3.hierarchy(nodeData[0]).sum(d => {
    return d.value;
  });

Then i partition the root with

partition(root)

I also had to draw the paths of the nodes of my data, i do that with this code below. The .data(root.descendants() checks my data and gives it it's depth and height properties so i can use it to make my sunburst diagram. I also append a path to my node.

g.selectAll("g")
    .data(root.descendants())
    .enter()
    .append("g")
    .attr("class", "node")
    .append("path")

To give color to my diagram i use this style. It fills every path according to every child or parent node. If it's from a different node it should get a new color.

.style("fill", d => {
      // Get color of children
        while (d.depth > 1) d = d.parent; return color(d.data.name);
    })

Children

An important thing in a sunburst is ofcourse the children that connect to the parent. I do that with a search to a more narrower query. In the previous json change, i added a uri and children to a node. Now i want to query that uri and add children from that query.

function changeJsonChildren(broadArray) {
  console.log(broadArray);
  broadArray[0].children.forEach((item, i) => {
   let uri = broadArray[0].children[i].uri;
    const queryNarrow = `
        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

        SELECT  ?materialNarrow ?materialLabel (COUNT(?materialLabel) AS ?countMaterialLabel) 
        WHERE {
        VALUES ?term  {<${uri}>}
        ?term skos:narrower ?materialNarrow .
        #?materialNarrow skos:narrower ?materialNarrow2 .
        ?materialNarrow skos:prefLabel ?materialLabel .
        }ORDER BY DESC(?countMaterialLabel)
        LIMIT 100
`;
    fetch(url + "?query=" + encodeURIComponent(queryNarrow) + "&format=json")
      .then(res => res.json())
      .then(json => {
        let childrenArray = json.results.bindings;
        childrenArray.forEach(e => {
          let currentObject = {
            uri: e.materialNarrow.value,
            name: e.materialLabel.value,
            // value: e.countMaterialLabel.value,
            children: []
          };
          broadArray[0].children[i].children.push(currentObject);
        });
        console.log(broadArray)
        return broadArray;
      })
  });
}

I do almost the same thing as the original changeJsonParent function, except i fetch a couple of times for the children and add the correct values. I maybe want to add in the future more children, so i still make a empty array of children.

Async / Promises

The only problem i'm getting is that the fetches are async, in the future i want to resolve that but for now i just use a timeout due to too little time.

.then(narrowArray => {
      setTimeout(() => {
        changeJsonChildrenOfChildren(narrowArray);
      }, 1000);
      return narrowArray;
    })
    .then(data => {
      setTimeout(() => {
        console.dir(data);
        makeSVG(data);
      }, 4000);
    });
⚠️ **GitHub.com Fallback** ⚠️