Milestone 4: Asynchronous Services and Experiment Catalogs - airavata-courses/TeamAurora GitHub Wiki
#Team Aurora - Project Milestone 4: Asynchronous Services and Experiment Catalogs
##Team Members: Kushal Sheth (kmsheth)
Pratik Sanghvi (psanghvi)
Srikanth Srinivas Holavanahalli (sriksrin)
Vikrant Kaushal (vkaushal)
##Instructions Manual: Here we have provided the details regarding the Milestone 4 and how to run and execute the micro-services on Mesos cluster. The jobs are submitted asynchronously to the Mesos computing cluster and user is able to monitor the progress of his job request. We have also listed the points regarding what tools and methods we have used for this Milestone.
##Prerequisites: Multiple EC2 Instances
Valid Google account
##Tools used: Java: Maven, Jersey for REST APIs, Thrift API
Python: FLASK for python
CI/CD: TRAVIS CI, AWS S3, AWS CodeDeploy, AWS EC2
Database: PostgreSQL
##Aim: In continuation with the project, this milestone focuses on running the services on the Mesos cluster and returning the results back to the users. NOTE: For each request submitted from the UI, we have ‘requestId’ that will uniquely identify the request.
##Design Questions: ###1] What are the job states on the Mesos cluster? • Following are the job states in which a request can be in at any given time:
- PENDING
- ASSIGNED
- STARTING
- RUNNING
- FINISHED/FAILED
###2] How do you monitor jobs on the Mesos cluster? • For every new job we are creating a unique requestId and a jobId is generated on basis of this requestId.
• The forecasttriggerworker submits this job to the mesos cluster.
• The APIGateway on basis of the requestId, generates the jobId and fetches the details of this jobs using getJobDetails()
###3] How do you reliably record state changes of the job progresses through the microservices? • Whenever a user submits a new job, it goes through the pipeline of microservices and at each service, we update the job progress in the database.
• User needs to refresh the browser to view the continuous status changes of the job.
###4] What happens if the microservice that submits the job crashes? How do you recover? • In our architecture, we have 3 EC2 instances and each instance is running the ForecastTriggerWorker microservice. If this microservice gets crashed on one of these instances, the same microservice will automatically be triggered on different instance through RabbitMQ.
• This is because, the crashed microservice won't be able to send the job completion acknowledgement and thus job /message will be forwarded to different instance.
###5] How do you model the metadata about a submission state and expose it to the user? • For each job, we display all the tasks with taskID and taskExecutionStatus in the UI.
• We also display the gif file to the user on completion of job
###6] How do you manage user-selected resubmissions? • Each job is mapped with ‘requestId’, so if job is resubmitted, a new task is created against that 'requestId'
##UPDATES TO ARCHITECTURE: We used thrift.exe giving api.thrift as input to generate all the aurora thrift client classes
###1] Forecast Trigger: • Job is created using Aurora-Thrift client with following entities:
o Role : ‘team-aurora’
o Env: ‘devel’
o Job-name: ‘job_aurora + requestId
o CPU: 0.25
o RAM: 200MB
o DISK: 500MB
• Using above configurations and docker, we create two processes.
• To submit the new job we use ‘AuroraSchedulerManagerClient’ class.
###2] API Gateway: • We are using ‘ReadOnlySchedulerClient’ class of Thrift API to get the tasks status.
• Since each request is mapped to the unique job we use ‘requestID’ to map all the tasks.
##How to RUN: ##Step 1: Go to https://github.com/airavata-courses/TeamAurora. In branches menu, open feature-apigateway branch. Make edit to ‘Edit Me.txt’ file inside the branch. This will trigger the TRAVIS CI and it will start deploying the service. It will upload the zip folder to S3. Along with this, AWS CodeDeploy is configured to deploy the service on EC2 instance.
Once the TRAVIS CI successfully build the service and AWS CodeDeploy is done deploying/starting the docker containers on EC2 instance, go to branch dataingestorworker and make some changes to ‘Edit Me.txt’ and commit the changes. This will trigger the TRAVIS CI.
After successful build and AWS CodeDeploy is done deploying/starting the docker containers on all 3 EC2 instances, go to Feature-StormDetector branch and make changes to ‘Edit Me.txt’ and commit the changes. This will trigger the TRAVIS CI.
After successful build and AWS CodeDeploy is done deploying/starting the docker containers on all 3 EC2 instances, go to Feature-StormClustering branch and make changes to ‘Edit Me.txt’ and commit the changes. This will trigger the TRAVIS CI.
After successful build and AWS CodeDeploy is done deploying/starting the docker containers on all 3 EC2 instances, forecasttriggerworker branch and make changes to ‘Edit Me.txt’ and commit the changes. This will trigger the TRAVIS CI.
Docker for each service will be deployed on the assigned EC2 instance as per the deployment group. APIGateway will be deployed on one instance while all the other services will be deployed on other instances.
###NOTE: DO NOT MAKE CHANGE TO ‘Edit Me.txt’ UNTILL THE PREVIOUS SERVICE IS SUCCESSFULLY BUILD AND DEPLOYED BY TRAVIS AND CODEDEPLOY.
##Step 2: At the end of this cycle, the entire system is up and running on the AWS EC2 instance. You can visit the service at: http://ec2-35-161-35-175.us-west-2.compute.amazonaws.com:8081/apigateway/jsp/login.jsp
The user credentials for authentication are:
Username : Teamaurora
Password : Teamaurora
Or user can use “Sign in with Google”
Services will run on basis of the load on particular instance.
##Testing: 1]. On the browser, to monitor the job, go to ‘Existing Jobs’ tab and look for the status of the request. When the status shows Completed, it means that the request is successfully submitted to the Mesos cluster.
2]. On seeing the status as ‘Completed’, click on the jobs link of the corresponding request. The link will take you to the status page of the Mesos Computing Cluster. Here the status will show either of following
- RUNNING
- FINISHED
- FAILED
3]. Once the status changes to FINISHED, an Output link will be generated. Click on this link, and a GIF will get downloaded.