Project Overview - achalsoni81/Jarvis- GitHub Wiki

What has been done?

The following fundamental tasks of the system has been accomplished:

  • Creation of indices and metadata through Hadoop to allow for low latency, random point lookups
    • Done so in a generic manner so to allow data to be sorted by different attributes
  • Ability to utilize metadata to optimize queries for range scans
  • The most basic write functionality has been created

What is left to be done?

Most important tasks

  • Continue working on the write functionality to allow for:
    • High performance on both streaming and batch writes
    • A sophisticated program that manages the relationship between data living on the frontend and on Hadoop, which triggers compactions appropriately and such
    • Allow for data to be persisted to disk locally on the front end machine yet appear as data that can be processed for Hadoop instantaneously (basically streaming into Hadoop without actually streaming over the network)