CData - ja-guzzle/guzzle_docs GitHub Wiki
- Overview
- Data sync
- Licence and Support
- Licence Model
- Licence of DataSync
- Support
- Open Questions
- Next actions (for Hemanta)
- Next actions (for Devt Team)
- OAuth for CData Drivers
- References
- JDBC Connectors
- They have many products
- CData JDBC driver (https://www.cdata.com/jdbc) are used standalone and can be use in any app, etl tool or BI tools. You have to buy each licence separately.
- Datasync is like a full blown product which allows to sync data from any of the sources for which they sell JDBC connectors to a target sink - which can be lake stroage , DB (onprem or cloud)
- can be installed on Windows (https://www.cdata.com/download/download.aspx?sku=ASPE-A&type=free&file=free/ASPE-A/setup.exe ) or Java installer on linux
- Does end to end orchestration of replication - can schedule, does incremental, allows you to select the object you can sync etc..
- Usually based on machines where the connectors or data sync is running . in most of the use cases its individual machines or dedicated integration server runs this connector and counted for licence.
- They also support running connectors from containers and ephemeral cluster - we have to understand more on this.
- They have been selling a lot to real end user (like people buy tableau online and start using ) - People buy it and use it in Excel or desktop reporting tools like tableau
- The licence goes by Cores/ CPU/ activation in most of the cases
- You buy base as licence (usually 999) + connectors. Sink and sources are counted as connector
- A pair of connector cost you 1K. So if you have 10, you pay roughly 5K
- Each unique connection for a given type is counted separate - example if I setup 5 LinkedIn connection with 5 different account - its counted as 5 and not 1 (so not by type of but actual connection setup)
- They seem to be picky on licence so its no trust but seems to be forced on UI and other places.
- They throttle you by number of CPUs that you buy licnece for - so if you have baught connector or Data sync - you can't just run on huge server and cater to wide user base/ consumption
- when you use it for enterprise kind of use case like loading to cloud data lake etc then you have to use Enterprise edition. If you are using it to simply load local database etc then you can live with Standard edition
- Enterprise support - lets you call them. Guess they may give you hot fixes (yet to be confirmed)
- Standard fixed support - only email, first cum first serve. May be the SLA will be slow
- Does premium support give you privilege to ask for quick fixes?
- how licence works on Containers and ephemeral workload like DB spark cluster?
- how does the licence vary if the CPU is from 4 to 32? Proportionate
- OEM agreement with Ryan
- Re-Seller Agreement so that we don't have to ask for quot every time
- Technical Deep dive
- Videos and Documentation
Try out Data Sync and see if its usable. Below is order of priroity
-
Install, Setup, configuration ,managing and monitoring
- Installation in linux:
- Download and extract
- Go to webserver and run java -jar start.jar in terminal.
- Wait for it to start and access the ui on localhost with port mentioned in terminal (8181)
- Setup and Configurations:
- Go to About page.
- Provide the license key or activate free trial.
- Installation in linux:
-
How easy is to setup new source and sink connection including OAuth /call backs. Change the sink to new target, does it reload hist data etc
- OAuth flow is handled automatically including refreshing access token.
- Need to provide neccesary connection details to connect to datasource. Follow this video: https://www.youtube.com/watch?v=4r2z5E-d3VQ
- For every source to sink transfer, new job is created with source and sink configs. Once a job is created, source and sink cannot be changed for it. New job is required to create for each different combination.
-
Does all the source and sink connectors have consistent setup and config steps or its all over the place
- Yes, it has consistent steps
-
Security - whatever we can verify (example: running the portal on SSL, any user/password / authorization of using the portal, how it stores OAut tokens/user password for source sinks)
- We can enable TLS for the Java Servlet that is being used to host the application to securely access the application.
- Server login Password is stored in plain text initially. We have to apply our own hashing methods to securely store passwords.
- We can configure TLS certificates to securely encrypt connections to data sources.
- We can also restrict ip addresses that can access the application.
- We can also setup proxy-based firewall for some connections. 3 Firewall types are available: TUNNEL, SOCKS4, SOCKS5
- We can also provide SSL certificate for some connections individually.
- Some connections also have the proxy configuration.
-
Fidelity of the cloning - our benchmark is Fivetran and Stitch - primarily data types and whether there are cases (the way it does incremental) that we loose data
-
Through put and performance
- Writing 175240 records from local mysql to local csv file took almost 15 minutes.
-
Scheduling capabilities
- Minute
- Hourly
- Daily
- Weekly
- Monthly
- Cron expressions
- Can also be triggered through api.
-
Notification and logs/instrumentation
- Mail server can be configured to send notifications to the mail.
- For each job an email and a subject can be conifgured.
- An option to send an email only when error occurs is also available
- Each job run log is saved at a configured path. Log level for each job can also be configured. Additionally log file can also be downloaded.
-
Handling of nested (or multi structured data) from JSON or APIs
- Converts nested json to dotted notation.
- For e.g if json is {"Company": {"employee": {"name": "john"}}}, this will be converted as Company.employee.name
- And the array will be converted to string
- Converts nested json to dotted notation.
-
Does it provides enough levers to configure the source and target behavior (select column, tables, able to keep snapshot on target and not just simply merge/overwrite, data type overrides, force full refresh)
NOTES:
- All replication are incremental
- Data Sync is like an ingestion module.
- It cannot be used to read data and then process on that data.
- We can only do copy from some place to another and provide some custom transformation.
- Does not refresh the connection once it is established.
- Connected with elasticsearch, checked the table in data sync app. Inserted new index in elasticsearch, refreshed the connection in data sync but did not show the new table. Have to restart for change to take effect.
- I am able to query it though it does not show up in detected table list.
OAuth apps - For the applications that require OAuth flow, we have tested the Linkedin driver. CData have a pre-registered OAuth app on Linkedin but also have an option to register our own OAuth app on linkedin while using the CData driver so that we can specify our own callback URL for accepting the tokens. Whether to register one OAuth app per guzzle instance or one centralized guzzle instance on cloud which always accepts the callback and forwards it back to the relevant guzzle instance is something that needs to be evaluated. This is the same issue that we already have with other OAuth integrations from Guzzle which we need to test.
Overall, the CData drivers have more consistent interface compared to HDP
- Our Call with them : https://youtu.be/_Aea-6W2rUU
- https://www.youtube.com/watch?v=vbjvEBhLLdg
Initial testing of CData using DBVis
I tried Facebook and LinkedIn driver. Quite easy to use. They are quite straight and thorough. Documentation is cleaer. We have to deeper in terms of:
- Fidelity – both for metadata, data
- Performance
- Logging
- Security
Its easy to test and configure. The driver I assume will keep changing as the schema gets expanded and new entities gets supported by the API. Most of this drivers exposed Views (and not tables). Some of them need a filter Like “Ads” to query- Again I don’t understand Facebook constract well.
This drivers are two ay and hence they provide procs to
Below is for Linked in:
So net, net – we can ask them for pricing –
-- Discussion with CDATA
Hemanata,
Please find my answers below:
- We can accommodate any method of tracking you would prefer. We have a licensing mechanism that will allow us to track the individual machine activations. In this scenario, you would need one license key per driver (i.e. one for Salesforce and one for Twitter, etc.). If you are able to track usage, we have a single “OEMKey” for all drivers that you append to the connection string which then has a code check that validates that your application is calling the driver.
- Correct, we ask for them annually. If you would prefer to report more frequently, then we can accommodate that as well.
- The license fees are annual. We can structure perpetual deployment pricing, as well, for a higher price point per deployment.
- The OEM fee is converted to a credit every year
- This will cover both on-premise and cloud deployments.
For a POC, we can accommodate testing of up to (5) drivers for 60 days. We have found it very helpful to evaluate Salesforce, BigQuery, and MongoDB. Salesforce is a representative web API; BigQuery will show our performance on large cloud data sets, and MongoDB is a good source to explore how we handle nested structures. Additionally, if there are any specific data sources you are interested in or any integrations you have already done that you can compare our drivers to, those are also good options. Overall, it is up to you which five you select.
Let me know which ones you are interested in and I will generate evaluation licenses.
--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com
[email protected] | 919-928-5214 x105
Book a time with me: www.calendly.com/ryanlcdata
From: Hemanta Banerjee [mailto:[email protected]] Sent: Thursday, April 25, 2019 9:03 AM To: Ryan Lee Subject: Re: OEM CData JDBC drivers
Hi Ryan
Thanks for the proposal. Very clear. Couple of quick questions
- Is there is a license key for each deployment ? Or do we need to track on our end .
- The utilisation report will need to be submitted only annually ?
- The fee is perpetual license ?
- We will get credit of the OEM fee every year or is it a first year incentive ?
- Does it matter if SaaS deployment ? Even for SaaS deployment we will be charging and enabling for specific data sources only; so from my perspective no difference.
- Thanks for including the 60 day trial. Very critical for the POC and evaluations for our customers
There will be a lot of questions on the technical front so wanted to see if we can evaluate 5-10 drivers for performance and also ease of embedding. Can we get a 60 day trial key.
Thanks Hemanta
On 25 Apr 2019, at 5:15 PM, Ryan Lee [email protected] wrote:
Hi Hemanta,
Attached is our proposal for an OEM partnership with access to all CData JDBC drivers. For your reference, the OEM fee access 5 – 10 drivers is $25,000/year, so I selected the all access option for you.
Once you have had a chance to review, please let me know and we can schedule a time to discuss. If you would like to speak today, I am available today at 9am and 11am Eastern Time. Otherwise, I am on PTO tomorrow but am available all next week. Regards,
--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com
[email protected] | 919-928-5214 x105
Book a time with me: www.calendly.com/ryanlcdata
From: Ryan Lee [mailto:[email protected]] Sent: Tuesday, April 23, 2019 6:50 PM To: 'Hemanta Banerjee' Subject: RE: OEM CData JDBC drivers
Thanks, Hemanta. I will work up some preliminary pricing tomorrow and reach out to you to discuss.
Regards,
--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com
[email protected] | 919-928-5214 x105
Book a time with me: www.calendly.com/ryanlcdata
From: Hemanta Banerjee [mailto:[email protected]] Sent: Tuesday, April 23, 2019 1:45 AM To: Ryan Lee Subject: Re: OEM CData JDBC drivers
Hi Ryan
Please see attached. Once you have had a chance to review lets discuss. I am at GMT+8.
Thanks
Hemanta Banerjee| Co-Founder M: +65.8139.0140 | www.justanalytics.com Click here to schedule a meeting
From: Ryan Lee [email protected] Date: Tuesday, 23 April 2019 at 3:14 AM To: Hemanta Banerjee [email protected] Subject: RE: OEM CData JDBC drivers
Hi Hemanta,
Thank you for your interest in CData. I manage our OEM Partnerships and will be happy to assist you.
For our OEM partnerships, we have a two-pronged approach:
-
Technical: Assist your team in evaluating 2-3 JDBC drivers. You can find our full list of drivers here: www.cdata.com/jdbc
-
Action Item: Select the data sources you are interested in testing, and I will generate licenses.
-
Business: Customize our OEM licensing/pricing model to your solution
-
Action item: Complete the attached OEM form
Regards,
--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com
[email protected] | 919-928-5214 x105
Book a time with me: www.calendly.com/ryanlcdata
-----Original Message----- From: Hemanta Banerjee [mailto:[email protected]] Sent: Sunday, April 21, 2019 12:55 AM To: [email protected] Subject: OEM CData JDBC drivers
Hi
I am looking to OEM CData drivers for my application. We would need a server side license as it is a BI application and we would need JDBC access from the web server to the remote databases. Can you please advice the next steps and how I can evaluate the drivers for my application.
Thanks Hemanta