Matillion - ja-guzzle/guzzle_docs GitHub Wiki
- Product Intro
- Notes
- Open Questions
- Trying the tool
- General feedback
- Design patterns
- Licence
- References
- They have SA / 15+ exp in support and pre-sales not just rookies or sales guys
- Three different solution: a. Snowflake b. Matalion
- One third of business comes via partner
- Its on all the cloud : AWS, Google, Azure, and Snowflake
- Market pace for matallion: a. EC2 instance on your virtual private cloud - b. It inherits the same compliance which you have for your Azure resources, firewall rules c. And connectivity to on-premise d. What IAM instance e. Post that - you just take
- ETL or ELT - Matillion largely uses MPP for snowfalke, ingest, reshape it.. a. ETL - stream it through to the S3, b. Copy the data c. Push down technology - SQL statement, d. How about data pumps..
- Has a feature of
- Each of the component gives you immdate feedback as you are manipulating the data - make the changes easily and prevents a long regresssion - the Postscript or wro
- No charging for all the connectors for different app
- Dynamic - is sql based on its very basic..
- They do support Python and shell script running on EC2 - if they are good for setting up data param and other manipulation.
- Iterator - or looping so that he can help you walk thru files in a folder, or looping thru records in table and do some work using.. It
- They have just same construct - where you can do the actual activites like copy data and alos call other jobs and do all the routing jobs on success and failure - basically the orchestration
- API to invoke the jobs from third party scheduler , also send the logs to third party og system.
- Direct GIT integration - and has good branch
- Can you drop the note or the logic and geneate the relevant documents.. And each of the notes is
- It does not have lineage.. Rather they have
- The have concept of "stages" - which are more to host the data lake / raw data, reachout to it as external table.
- Variant data type which stores json document- we can simply flatten this and use this for othe table joins and use it easily.
- Data lineage allows you go back to the data..
- They have security - where user groups can have granular access - mange prjojects
- OAUTH - you can configure oahut credential , we will refresh the tockens
- Open ID - can use other syste.., external cnfigure ldap - map metl group to ldap group.
- With snowflake product METL does not support glue catalog
- Migration accclerator - processing and ingestion jobs - there is no easy way to migrate the jobs
- Questions from others: a. DMS - is used for log reading.. CDC product limited is postgres, mysql, sql server and Oralce - use DMS to generate the files out b. Fivetran - does not allow manipulation of data.
- METL seem to have similar variable feature as ours
- What is concurrent user here means? And how its measured?
- Is Matillion for business analyst or power user?
- Any difference on different cloud offering? Is it all same?
- 72 hours test drive - what does it consist of ?
- Comes as image or container
- What kind of optimization it comes with for bulk load -
- APIs - like Salesforce- they have views and calcs- how it can effciently take..
- How it differs with Fivetran? Fivetran has transformation but with SQL
- Governed data - usually its more complicated and usually you may not want to leave to the business analyst -
- All the replication phenomena is very strong and does schema dirft etc in case of fivetran -
- You got to manage this stage data - the life cycle etc yourself I assume before it gets used in pushdow
- Does it auto-create the table - there but little explicit and not at runtime
- Does it auto column binding - yes it does
- What is the "use" grid variable in some screens.
- What is the push down the technology - just SQL alone
- Staging tables are to be loaded before can be used in sql and further operations - directly operate on files?
- What are the targets you support for push-down apart from Snowflake; I guess they support Redshift (they have product for Redshit separately)
- The SQL generated is coded /crafted in the program - and any optimization you can do like putting hints - does not seem to be. They may have some template generator
- When you run those calcs, is it on sample data. I assume it circles its thru snowflake. If its aggregate then I assume its going thru everything.
- Security - how tight it is; once you spin the EC2 instance, what all additional thing / resources you need on Azure .
- I assume it keeps internal repo in DB
- Bulk generation - what is processing..
- Snowflake supports copy operations for Avro, csv, parquete, - but then can you enforce validation..
- What is the load on that EC2 instance - just orchestrating or it does bring the compute
- Is there a scheduler - seems that they have one
- How does on-premise connector works for those DB2 etc.. We have to setup the VPNs I assume - as it does not support on-premise gatewa..
- API query - is this json.. ->does it goes via again the stg tables.. -a nd can manipulate.. Via snowflake or native api
- Basic and advce mode- just advance mode settles to SQL only..
- Don't you support adls gen2
- What is support for Spark?
- What is support for streaming?
- What all Git repo you supports
- Documentation genrated instead of lineage and have table structure - I assume its point in time
- So where to use external table or staging table? Staging is a concept of Snowflake
- Installation is simple VM - the thing does not work
- Has basic UX issues - how can they have default user as "azure-user" - some one wil enter is AD account
- Logout on the project menu
- Gitflow uses local git repo managed and then sync with remote git repo - much complicate
- Built using Java, running on tomcat
- Free Licence does not work
- Its a dropin like guzzle, the updates are available from UI
- can restart the services and download the logs
- Manage users, and do various admin activities from UI:
- UI not very smart/responsive
- Jobs are similar to ADF pipeline - basically series of components which are sequenced to acehive certain task. Two types of jobs: Orchestration (to load data to staging) and Transformation (to process and load into target). There are diff components for each type of jobs
- Forces you to bring data to staging and then do ETL. It generates smart INSERT into SELECT.. or MERGE INTO..xx.
- Automatic schema binding, auto create of tables and other features appears to be there
- Billing is simply via Azure. As long as that VM is up they charge you.
- All jobs will run via the Matillion VM -hence it has to be up for the jobs to run
- The larger the VM, the more you pay with a cap of xxx and number of user. You have BYOL which lets you get some special one
- They have three offerings for Snowflake, Bigquery and Redshit
https://bigquery-support.matillion.com/s/article/2694747 https://bigquery-support.matillion.com/s/article/2679765