Matillion - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Product Intro

Notes

  1. They have SA / 15+ exp in support and pre-sales not just rookies or sales guys
  2. Three different solution: a. Snowflake b. Matalion
  3. One third of business comes via partner
  4. Its on all the cloud : AWS, Google, Azure, and Snowflake
  5. Market pace for matallion: a. EC2 instance on your virtual private cloud - b. It inherits the same compliance which you have for your Azure resources, firewall rules c. And connectivity to on-premise d. What IAM instance e. Post that - you just take
  6. ETL or ELT - Matillion largely uses MPP for snowfalke, ingest, reshape it.. a. ETL - stream it through to the S3, b. Copy the data c. Push down technology - SQL statement, d. How about data pumps..
  7. Has a feature of
  8. Each of the component gives you immdate feedback as you are manipulating the data - make the changes easily and prevents a long regresssion - the Postscript or wro
  9. No charging for all the connectors for different app
  10. Dynamic - is sql based on its very basic..
  11. They do support Python and shell script running on EC2 - if they are good for setting up data param and other manipulation.
  12. Iterator - or looping so that he can help you walk thru files in a folder, or looping thru records in table and do some work using.. It
  13. They have just same construct - where you can do the actual activites like copy data and alos call other jobs and do all the routing jobs on success and failure - basically the orchestration
  14. API to invoke the jobs from third party scheduler , also send the logs to third party og system.
  15. Direct GIT integration - and has good branch
  16. Can you drop the note or the logic and geneate the relevant documents.. And each of the notes is
  17. It does not have lineage.. Rather they have
  18. The have concept of "stages" - which are more to host the data lake / raw data, reachout to it as external table.
  19. Variant data type which stores json document- we can simply flatten this and use this for othe table joins and use it easily.
  20. Data lineage allows you go back to the data..
  21. They have security - where user groups can have granular access - mange prjojects
  22. OAUTH - you can configure oahut credential , we will refresh the tockens
  23. Open ID - can use other syste.., external cnfigure ldap - map metl group to ldap group.
  24. With snowflake product METL does not support glue catalog
  25. Migration accclerator - processing and ingestion jobs - there is no easy way to migrate the jobs
  26. Questions from others: a. DMS - is used for log reading.. CDC product limited is postgres, mysql, sql server and Oralce - use DMS to generate the files out b. Fivetran - does not allow manipulation of data.
  27. METL seem to have similar variable feature as ours

Open Questions

  1. What is concurrent user here means? And how its measured?
  2. Is Matillion for business analyst or power user?
  3. Any difference on different cloud offering? Is it all same?
  4. 72 hours test drive - what does it consist of ?
  5. Comes as image or container
  6. What kind of optimization it comes with for bulk load -
  7. APIs - like Salesforce- they have views and calcs- how it can effciently take..
  8. How it differs with Fivetran? Fivetran has transformation but with SQL
  9. Governed data - usually its more complicated and usually you may not want to leave to the business analyst -
  10. All the replication phenomena is very strong and does schema dirft etc in case of fivetran -
  11. You got to manage this stage data - the life cycle etc yourself I assume before it gets used in pushdow
  12. Does it auto-create the table - there but little explicit and not at runtime
  13. Does it auto column binding - yes it does
  14. What is the "use" grid variable in some screens.
  15. What is the push down the technology - just SQL alone
  16. Staging tables are to be loaded before can be used in sql and further operations - directly operate on files?
  17. What are the targets you support for push-down apart from Snowflake; I guess they support Redshift (they have product for Redshit separately)
  18. The SQL generated is coded /crafted in the program - and any optimization you can do like putting hints - does not seem to be. They may have some template generator
  19. When you run those calcs, is it on sample data. I assume it circles its thru snowflake. If its aggregate then I assume its going thru everything.
  20. Security - how tight it is; once you spin the EC2 instance, what all additional thing / resources you need on Azure .
  21. I assume it keeps internal repo in DB
  22. Bulk generation - what is processing..
  23. Snowflake supports copy operations for Avro, csv, parquete, - but then can you enforce validation..
  24. What is the load on that EC2 instance - just orchestrating or it does bring the compute
  25. Is there a scheduler - seems that they have one
  26. How does on-premise connector works for those DB2 etc.. We have to setup the VPNs I assume - as it does not support on-premise gatewa..
  27. API query - is this json.. ->does it goes via again the stg tables.. -a nd can manipulate.. Via snowflake or native api
  28. Basic and advce mode- just advance mode settles to SQL only..
  29. Don't you support adls gen2
  30. What is support for Spark?
  31. What is support for streaming?
  32. What all Git repo you supports
  33. Documentation genrated instead of lineage and have table structure - I assume its point in time
  34. So where to use external table or staging table? Staging is a concept of Snowflake

Trying the tool

General feedback

  1. Installation is simple VM - the thing does not work
  2. Has basic UX issues - how can they have default user as "azure-user" - some one wil enter is AD account
  3. Logout on the project menu
  4. Gitflow uses local git repo managed and then sync with remote git repo - much complicate
  5. Built using Java, running on tomcat
  6. Free Licence does not work image
  7. Its a dropin like guzzle, the updates are available from UI image
  8. can restart the services and download the logs
  9. Manage users, and do various admin activities from UI: image
  10. UI not very smart/responsive

Design patterns

  1. Jobs are similar to ADF pipeline - basically series of components which are sequenced to acehive certain task. Two types of jobs: Orchestration (to load data to staging) and Transformation (to process and load into target). There are diff components for each type of jobs
  2. Forces you to bring data to staging and then do ETL. It generates smart INSERT into SELECT.. or MERGE INTO..xx.
  3. Automatic schema binding, auto create of tables and other features appears to be there

Licence

  1. Billing is simply via Azure. As long as that VM is up they charge you.
  2. All jobs will run via the Matillion VM -hence it has to be up for the jobs to run
  3. The larger the VM, the more you pay with a cap of xxx and number of user. You have BYOL which lets you get some special one
  4. They have three offerings for Snowflake, Bigquery and Redshit

References

https://bigquery-support.matillion.com/s/article/2694747 https://bigquery-support.matillion.com/s/article/2679765

⚠️ **GitHub.com Fallback** ⚠️