milestone - Gukie/building-recommend GitHub Wiki

version 1 (accomplished at 2017.11.08)

Architecture: JSoup+MySQL+Spring micro services

  • crawl data from websites (JSoup)
  • store them into DB (MySQL)
  • generate excel report (Apache POI)
  • send email to recipient.(Gmail OAuth2)

version 2

This version is mainly to learn Big Data tech. Change architecture to: MongoDB+Redis+ELK

  • MongoDB (Store original Data)
  • ELK (Index )
  • Redis (store ES index)

The whole process might be like following:

  • crawl origial data into mongodb
  • build aggregated data from MongoDB, and index them to ES
  • store the ES index to Redis for speeding up searching.
  • store the already existing data into Redis, rather than in Memory

1. Design:

ELK will have 2 collection:

  • collection1: the aggregate building data for avg-price in each plate.
  • collection2: trimmed data from mongoDB (maybe the whole data )
  • collection3: flagged data from mongoDB.