Product Intro

Notes

They have SA / 15+ exp in support and pre-sales not just rookies or sales guys
Three different solution: a. Snowflake b. Matalion
One third of business comes via partner
Its on all the cloud : AWS, Google, Azure, and Snowflake
Market pace for matallion: a. EC2 instance on your virtual private cloud - b. It inherits the same compliance which you have for your Azure resources, firewall rules c. And connectivity to on-premise d. What IAM instance e. Post that - you just take
ETL or ELT - Matillion largely uses MPP for snowfalke, ingest, reshape it.. a. ETL - stream it through to the S3, b. Copy the data c. Push down technology - SQL statement, d. How about data pumps..
Has a feature of
Each of the component gives you immdate feedback as you are manipulating the data - make the changes easily and prevents a long regresssion - the Postscript or wro
No charging for all the connectors for different app
Dynamic - is sql based on its very basic..
They do support Python and shell script running on EC2 - if they are good for setting up data param and other manipulation.
Iterator - or looping so that he can help you walk thru files in a folder, or looping thru records in table and do some work using.. It
They have just same construct - where you can do the actual activites like copy data and alos call other jobs and do all the routing jobs on success and failure - basically the orchestration
API to invoke the jobs from third party scheduler , also send the logs to third party og system.
Direct GIT integration - and has good branch
Can you drop the note or the logic and geneate the relevant documents.. And each of the notes is
It does not have lineage.. Rather they have
The have concept of "stages" - which are more to host the data lake / raw data, reachout to it as external table.
Variant data type which stores json document- we can simply flatten this and use this for othe table joins and use it easily.
Data lineage allows you go back to the data..
They have security - where user groups can have granular access - mange prjojects
OAUTH - you can configure oahut credential , we will refresh the tockens
Open ID - can use other syste.., external cnfigure ldap - map metl group to ldap group.
With snowflake product METL does not support glue catalog
Migration accclerator - processing and ingestion jobs - there is no easy way to migrate the jobs
Questions from others: a. DMS - is used for log reading.. CDC product limited is postgres, mysql, sql server and Oralce - use DMS to generate the files out b. Fivetran - does not allow manipulation of data.
METL seem to have similar variable feature as ours

Open Questions

What is concurrent user here means? And how its measured?
Is Matillion for business analyst or power user?
Any difference on different cloud offering? Is it all same?
72 hours test drive - what does it consist of ?
Comes as image or container
What kind of optimization it comes with for bulk load -
APIs - like Salesforce- they have views and calcs- how it can effciently take..
How it differs with Fivetran? Fivetran has transformation but with SQL
Governed data - usually its more complicated and usually you may not want to leave to the business analyst -
All the replication phenomena is very strong and does schema dirft etc in case of fivetran -
You got to manage this stage data - the life cycle etc yourself I assume before it gets used in pushdow
Does it auto-create the table - there but little explicit and not at runtime
Does it auto column binding - yes it does
What is the "use" grid variable in some screens.
What is the push down the technology - just SQL alone
Staging tables are to be loaded before can be used in sql and further operations - directly operate on files?
What are the targets you support for push-down apart from Snowflake; I guess they support Redshift (they have product for Redshit separately)
The SQL generated is coded /crafted in the program - and any optimization you can do like putting hints - does not seem to be. They may have some template generator
When you run those calcs, is it on sample data. I assume it circles its thru snowflake. If its aggregate then I assume its going thru everything.
Security - how tight it is; once you spin the EC2 instance, what all additional thing / resources you need on Azure .
I assume it keeps internal repo in DB
Bulk generation - what is processing..
Snowflake supports copy operations for Avro, csv, parquete, - but then can you enforce validation..
What is the load on that EC2 instance - just orchestrating or it does bring the compute
Is there a scheduler - seems that they have one
How does on-premise connector works for those DB2 etc.. We have to setup the VPNs I assume - as it does not support on-premise gatewa..
API query - is this json.. ->does it goes via again the stg tables.. -a nd can manipulate.. Via snowflake or native api
Basic and advce mode- just advance mode settles to SQL only..
Don't you support adls gen2
What is support for Spark?
What is support for streaming?
What all Git repo you supports
Documentation genrated instead of lineage and have table structure - I assume its point in time
So where to use external table or staging table? Staging is a concept of Snowflake

Trying the tool

General feedback

Installation is simple VM - the thing does not work
Has basic UX issues - how can they have default user as "azure-user" - some one wil enter is AD account
Logout on the project menu
Gitflow uses local git repo managed and then sync with remote git repo - much complicate
Built using Java, running on tomcat
Free Licence does not work
Its a dropin like guzzle, the updates are available from UI
can restart the services and download the logs
Manage users, and do various admin activities from UI:
UI not very smart/responsive

Design patterns

Jobs are similar to ADF pipeline - basically series of components which are sequenced to acehive certain task. Two types of jobs: Orchestration (to load data to staging) and Transformation (to process and load into target). There are diff components for each type of jobs
Forces you to bring data to staging and then do ETL. It generates smart INSERT into SELECT.. or MERGE INTO..xx.
Automatic schema binding, auto create of tables and other features appears to be there

Licence

Billing is simply via Azure. As long as that VM is up they charge you.
All jobs will run via the Matillion VM -hence it has to be up for the jobs to run
The larger the VM, the more you pay with a cap of xxx and number of user. You have BYOL which lets you get some special one
They have three offerings for Snowflake, Bigquery and Redshit

References

https://bigquery-support.matillion.com/s/article/2694747 https://bigquery-support.matillion.com/s/article/2679765

Matillion - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Product Intro

Notes

Open Questions

Trying the tool

General feedback

Design patterns

Licence

References

⚠️ GitHub.com Fallback ⚠️

Matillion - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Product Intro

Notes

Open Questions

Trying the tool

General feedback

Design patterns

Licence

References

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️