Welcome to the repository "Hadoop practical work" ! - gregoiremassot/TPs-Hadoop GitHub Wiki

Installation guide : here

What is it ?

It is a Java Hadoop code taking a graph of webpages as input and returning the PageRank of each page, which depends on the relative number of hyperlinks linking to the page, reproducing the historical Google algorithm in a simplistic way.

hadoop is an open-source programm aiming at dealing with Big Databases. Hadoop became very popular a few years ago, due to the growing importance of Big Data analytics.

With which tools ?

The programm is written in Java with the Hadoop libraries. I coded extra shell scripts to make the execution of the programm easier.

The tests were carried out on the Cloudera CDH 5.4 virtual machine on VirtualBox.

To what occasion ?

This code was the homework of the "Hadoop" part in the Big Data course I attended at the Ecole des Mines. I coded the short Java programm in November 2015