Guzzle Roadmap Q2 and Q3 2020 - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

UI 2.0

Key Theme

  1. Do multiple things in parallel - edit multiple jobs, pipeline
  2. More intuitive down and searchable drop-down
  3. Provide feedback - whether its error or sampling of data based on source config; testing the connections created
  4. Short-cuts to do various things: Clone jobs, Run jobs, Download YML, Create new data store (or connection)
  5. Easy to understand terminology
Old term New Term
Logical Endpoint Datastore
Physical Endpoint Connection
Environment Environment
Job Activity
Job Group Pipeline
Context Batch

Common patterns

  1. No plus buttons
  2. Boiler plates (labels) externalized
  3. Consistent design for "..." the action button
  4. Consistent Fonts/color/look and feel
  5. Tooltip are already externalized
  6. Remain in the context - we don't want to loose the changes , so prompt user.
  7. Switching between monitor and Author. When some goes to Monitor and we know that he can come back to Author and see those unsaved jobs. But it someone closes browser in the Monitor section we should force him to go back to Author and discard go to unsaved button
  8. Single cilck should select the item and take to next page or target ui (sometime we need to force double click and i will advise which one). Where its single click there we usually dont need select and Ok button (example in ui 2.0 current Data Source UI when you select tech) you have to select and then Ok: This should be simple single click. If there are multiple action then can show up ... and one can see other actions
  9. Right click is achieved thorugh three ...
  10. organize the images , css and other resources etc by theme. So that when we support additional themes we can simply load up css and images there and enable options from ui
  11. Back button handling
  12. Switching the end points technology should prompt that changes will be lost (and any other options) which some time causes things to be lost
  13. Hiding irrelevant options - example: for template based processing we dontt support framework columns like surrogate key
  14. Searchable drop-downs everywhere. Even when you are creating Data store one can search the list of technologies we support
  15. Metadata interdependence (since we don't maintain dependency graph) - This has to be handled case by case. But its required. Example batch references pipeline; pipeline references activity; activities references data stores. Some time this references go missing (Deleted) or yml are broken. The handling is following: for missing references or broken yml (in case we do look thru yml to get tech for data store etc) we show the object being referred with diff color (boxed in red or red font stating missing yml or corrupted yml based on situation) so that one can fixed it switch it. If we can't load the UI as we can't make out tech of depended object, ideally if we can infer it using current yml of the job in question. Worst case we can show yml editor directly or other option is to include tech as part of activity (which is major change)
  16. Broken yml - this can happen due to many reasons like upgrade or changes done from backedn - when it is launched and since UI can't load it, we should show YML editor striaght off instead of blank UI or blank page as it show currently. (example: http://spatest1.southeastasia.cloudapp.azure.com:8082/app/author/datastore/lo_lfs)

Major changes

  1. Layout
  2. Color/fonts
  3. Missing UI : Azure Synapse, Azure SQL
  4. Uploads - Remove it for now
  5. Consistency - No more plus button and other things
  6. Cleanup : Google Analyitcs, Kafka , Google and Sales force End points
  7. Admin UI
  8. Sampling, make it more seamless
  9. Missing UI - Specific JDBC Drivers UI

Architecture Changes and Marketplace and Admin modules

  1. Simplified marketplace offering as described here: https://github.com/ja-guzzle/docs/-/wikis/Design/Azure-Marketplace-Offering-Gen-2
  2. Third party licences   - i want us to put them in a older and give a links (how Ambari shows) - also we find the issues in third party where they are injecting worng log4j (matallion keeps in one of folder ) - we can see how datiku handles it
  3. Extension folder to keep cusom lib
  4. Custom Pushstate server
  5. Ability to configure Guzzle Shared storage / re-structure guzzle home
  6. Include licences of third party
  7. We make sure its complete and all the flows are supported and resilience
  8. Restart  from UI
  9. Monitoring API logs
  10. Upgrade from UI
  11. Deploy Guzzle from UI
  12. Ability to change the Guzzle Repository
  13. User Management

Runtime audit

  1. Simple and streamlined UI to do job monitoring, drill-down the details and able to get a view of what is pending in a given pipeline and stages.

Security

  • Security group roles/ we bring the roles from 1.0 and enhance where we need
  • User name used for salting the passwords when doing becrypt - so that same passwords cannot be identified
  • File permissions in Guzzle VM and running Guzzle process using an account which does not have sudo access
  • Guzzle Edit UI- does it bring credential to frontend - follow standard practice followed by rest, i am fine to bring encrypted, but even if that can be avoided we should as any such credential are only required by Guzzle VM or DB jobs
  • Password changing feature for native users
  • When creating external users - we should not allow them to have a password native and support it. Basically if the native is disabled the login option should not even show up
  • Third party libraries vulnerabilities - we can do some scan
  • Guzzle VM without any public IP (outbound only to download upgrades - again follow best practices)
  • Other security in Guzzle VM, push state server, penetration testing
  • Best practices on securing Azure resources - we want to make sure all the Azure resources are are with vnet to prevent any outbound tarffic from VM to internet
  • Better way to access blob stroage from Guzzle VM - we are using blobfuse, can we change that to use Managed identity directly or use key valuts or ohter option
  • Since now the marketplace is going to simple now, can we allow people to upload certificate or put key vaults details to enable SSL for guzzle web
  • Ability to upload private keys for remote shell access
  1. Abiity to change the password

Documentation

  1. Workflow and approach for maintaining documentation
  2. Bootcamps (as per https://github.com/ja-guzzle/docs/-/wikis/Release-Plan/Guzzle-Devt-Plan-Q2-2020/edit)
  3. Cheat-sheet / Guzzle handbook (simple 2 pager giving essence of Guzzle architecture) - Similar to this one: https://github.com/Cyb3rWard0g/HELK/raw/master/resources/papers/7-steps-for-a-developer-to-learn-apache-spark.pdf
  4. Core Guzzle documentation made up of : Concept Guide, How-to Videos and Best Practices Guide (some urgent ones are ADF Guzzle integration)
  5. Document the upgrade process to UI 2.0 (Do this incrementally - as we will keep forgetting what needs to be done for upgrade)
  6. White boarding video tool

Integration

  1. Whole ADF API revamp - and provide a status API
  2. How we call guzzle from API in block fashion - is there a better way.. have some time out.. etc/ call back (can crate custom activity in ADF)
  3. Call ADF pipeline in Guzzle Job groups using External job (similar to how ADF sync was build)
  4. Run notebooks from Guzzle (should be straight as its similar to what we do for submitted Guzzle jobs)

Future Roadmap

  • Export/import of Guzzle configs as zip file (to streamline deployment)
  • Postgres Support
  • Suppressing Streaming module is outdated (quite independent)
  • Key vaults and DB secrets as much to store credentials /access key/ client secrete and Reading the details from Azure ey valuts when accessing from VM and using database secrets when accessing from Spark jobs

UI Longterm (Future)

  1. Formula editor for Glob/Groovy/SQL
  2. Giving is of tables to browser, Files to browse
  3. Able to see effective mapping
  4. Missing UI - Scheduler
  5. Security and Keystore support
  6. Wizard to build ingestion jobs (replication jobs for multiple tables) or cloning entire source schema
⚠️ **GitHub.com Fallback** ⚠️