Timeline - JulianThijssen/TelegraafES GitHub Wiki

Timeline

The timeline you can see on the search engine is generated with Python in MatPlotLib.

When a user enters a search query, the engine will consult Elastic Search which will return a number of results. We then crawl through these results and write the year of their dates to a file. We then simply run a Python program called timeline.py.

This program will read the year-file and tally up the number of times each year occurs in the file. It then generates a plot that shows each year and the number of hits for that year. This image is then saved to file and loaded back into the website as the page is being made.

Timeline image

This approach seems to be fairly robust, it generates the image quickly and does not cause any significant delay in page loading times in most queries. MatPlotLib also offers a lot of functionality for designing how the graph should look.

A downside is that if the query returns a very high amount of documents, page loading may become impacted by the plotting of the timeline. The timeline functionality might then need to be disabled or confined to the top k results.

Overall, we are happy with the quality of the timeline, it is very obvious which periods of time produce the most results. However, the specific year might be a bit hard to deduce from the plot.