Guzzle Devt Plan Q1 2020 - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Boot camp

  • Plans for training /boot camps / ideas : have to get them going: they know the data well..
  • for 1 and 2 days - practical/hands on, so get them to setup guzzle from scratch
  • Understanding the core architecture
  • Setup guzzle piece by piece and konw what each things done.. (databricks)
  • Role of each services
  • How a job gets submitted (using spark submitted and rest call for DB)
  • How the shared file system is the API/Guzzle VM talks to the workers/ or cluster..:
  • How binaries are structured
  • How the YMLs are structured and the rational of it..
  • Read guzzle Logs , also driver logs are useful to konw whats going on within spark (we also dont' have full stack trace) - we can consider bring ERROR fom spark to include that in our logs..
  • Unix fundamental: Process architecture, network and file system..(permission , mounts etc); service/demons; yum/packing...; standard unix shell scirp (for above topics and us standalone utilizes like grep, sed etc)
  • Troubleshooting in Spark: Sprak UI / Jobs/stages
  • Some relevant Videos which nicely explains Spark architecture: https://www.youtube.com/watch?v=fyTiJLKEzME

Resource plan

  • Parth V: Put him into SP , and let him learn how to use Guzzle (60-80% on SP ) - 2 to 3 months
  • Parth Pandya: PowerBI reports and dashboard for Runtime Audit and DQ/Recon modules reports
    • Recon and Constraint checks - reports - they have generic column names and actual column names are as values - we have to see how PBI report can show things well -show stats and drill-down
  • Gopal (PQE) , SP oversight and Guzzle R&D - 0-80% (no more test automation for Gopal)
  • Parth M - After ASB + Lease setup done 100%
  • Phu - Test Automation
  • Total 6-7 man month for Guzzle R&D in next 3 months
  • Other resources that can be considered: Arpan , BE guy , Naveen

Guzzle UI - Current and Planned

  • UI , how fast we can crank it
    • UI Guy to come and do cosmetic.. the actual work to execute those recommendation or changes can be more or less planned over next 3 months- Mehul (200 hours / over 6 monhs )
    • The left and top nav don't change..
    • We can make it better in terms of look and fee/ color and interaction - but data entry and etc remain same
    • On right and side.. we can get some idea..
    • The wizard for diff section of ingestion .. how to launch validation.. .
    • Pipeline - how to make it simple to drag/drop the jos..
    • Expression editor..
    • How to prevent people to make less mistakes..
    • Search ahead of column names, functions when typing expression etc..
    • Making easier to use param
  • Current UI (we should try it out and give some feedback):
    • Mostly functional, except triggering the jobs; config editor , search on left side is working..
    • data store / env are still in progress
    • Where to show the job logs/ and pass run.
    • top search is not running
    • upgrade steps -primary for the sample table..
    • have to find the placement
      • Batch /Context and batch Runs..
      • File upload module -
      • Admin module (for adding user etc)

Breakdown of tasks

  • UI 1.5 - current leg - 1 man month to wrap it up so that its fully functional
  • UI 2.0 (color scheme, making the config editors better etc) - 2.5 man month
  • Security (some tweaks) - 1 to 2 man month
    • We put user id and password so that the same password may not have same hash..
    • For the actual encryption of source and target endpoint (one of which is ot use
  • Market place offer - including integration with Azure monitor/etc - 1 man month
  • Billing integration - 1 man month

Architectural Changes

  • Bring new web server instead of Node (that we use currently)
    • push state (you pass any url which is not found is serverd by index.html )
    • netty - this are lib which use for http..
    • light weight
    • externalize things like api host name and ports/ssl (param to specify)
    • ssl is possible and use pkcs12 as a bundled keystore
    • performance is ok (async socket io, 2 core / 100K) - one thread servers 100K (uliimt) connection same as node
  • Get rid of Elasticserach - parsing job yml - we get listing by job types and names, application cache (for tag) - that cache is prepared directly from yml
  • shared file system - only in future.
⚠️ **GitHub.com Fallback** ⚠️