Question Answers - Pariks/smallbackendproject GitHub Wiki
# What would you change in your architecture to cope with the load?
-
Divide load 100000/13 = equally. Why 13? Because we have 26 alphabets and page name starting with 2 alphabets/cron
-
I will create 13 cron running simultaneously.
-
Each cron handles fan pages name starting with two alphabets in a sequence.
-
For example 1st cron will handle all pages starting with A & B, 2nd will handle C & D and rest will follow likewise till 13th cron with Y & Z
-
Use MongoDb
-
Right now I have used MySql considering a small application and limited fan pages.
-
If we have to deal with more than 100,000 fan pages and cron inserting into db every 10 minutes, it becomes problem of big data
-
And MongoDb is very good choice for storing such huge amount of data and do big data analysis.
-
Idea again is the same as above. Create 13 documents. Each handles fan pages name starting with two alphabets in a sequence.
-
For example 1st document will handle all pages starting with A & B, 2nd will handle C & D and rest will follow likewise till 13th document with Y & Z
# What kind of other possible problems would you think of? I will try to answer in the context of above answer.
-
The Possible problem would be what if all 100,000 pages starts with the same alphabet? It would be worst case.
-
I would like to add a condition to handle this case. That 100000/13 = 7700 approx pages per cron and per document.
-
To maintain and divide load
-
For fb fan page, I have used fb graph API. It requires “acccess_token”. And the life of long/extended “access_tokrm” is 30 days.
-
So, every 30th day need to reacquire “acccess_token”.
-
Likewise if we are using any other social network API and token we need to keep updated that token.
-
we are inserting too much of data every 10 minutes. So MongoDb document has a limit of 25MB of data/document. So we need to handle failure insertion.
# How would you propose to control data quality? Well, we are depending on respective API call.
-
So to manage a quality of data we need to test application and crons manually as well as automated testing. We need to include all possible scenarios of failure in automated or unit testing script.
-
Back and forth data verification.