CData - ja-guzzle/guzzle_docs GitHub Wiki

Overview
Data sync
Licence and Support
Licence Model
Licence of DataSync
Support
Open Questions
Next actions (for Hemanta)
Next actions (for Devt Team)
OAuth for CData Drivers
References
JDBC Connectors

Overview

They have many products
CData JDBC driver (https://www.cdata.com/jdbc) are used standalone and can be use in any app, etl tool or BI tools. You have to buy each licence separately.
Datasync is like a full blown product which allows to sync data from any of the sources for which they sell JDBC connectors to a target sink - which can be lake stroage , DB (onprem or cloud)

Data sync

can be installed on Windows (https://www.cdata.com/download/download.aspx?sku=ASPE-A&type=free&file=free/ASPE-A/setup.exe ) or Java installer on linux
Does end to end orchestration of replication - can schedule, does incremental, allows you to select the object you can sync etc..

Licence and Support

Licence Model

Usually based on machines where the connectors or data sync is running . in most of the use cases its individual machines or dedicated integration server runs this connector and counted for licence.
They also support running connectors from containers and ephemeral cluster - we have to understand more on this.
They have been selling a lot to real end user (like people buy tableau online and start using ) - People buy it and use it in Excel or desktop reporting tools like tableau
The licence goes by Cores/ CPU/ activation in most of the cases

Licence of DataSync

You buy base as licence (usually 999) + connectors. Sink and sources are counted as connector
A pair of connector cost you 1K. So if you have 10, you pay roughly 5K
Each unique connection for a given type is counted separate - example if I setup 5 LinkedIn connection with 5 different account - its counted as 5 and not 1 (so not by type of but actual connection setup)
They seem to be picky on licence so its no trust but seems to be forced on UI and other places.
They throttle you by number of CPUs that you buy licnece for - so if you have baught connector or Data sync - you can't just run on huge server and cater to wide user base/ consumption
when you use it for enterprise kind of use case like loading to cloud data lake etc then you have to use Enterprise edition. If you are using it to simply load local database etc then you can live with Standard edition

Support

Enterprise support - lets you call them. Guess they may give you hot fixes (yet to be confirmed)
Standard fixed support - only email, first cum first serve. May be the SLA will be slow

Open Questions

Does premium support give you privilege to ask for quick fixes?
how licence works on Containers and ephemeral workload like DB spark cluster?
how does the licence vary if the CPU is from 4 to 32? Proportionate

Next actions (for Hemanta)

OEM agreement with Ryan
Re-Seller Agreement so that we don't have to ask for quot every time
Technical Deep dive
Videos and Documentation

Next actions (for Devt Team)

Try out Data Sync and see if its usable. Below is order of priroity

Install, Setup, configuration ,managing and monitoring
- Installation in linux:
  1. Download and extract
  2. Go to webserver and run java -jar start.jar in terminal.
  3. Wait for it to start and access the ui on localhost with port mentioned in terminal (8181)
- Setup and Configurations:
  1. Go to About page.
  2. Provide the license key or activate free trial.
How easy is to setup new source and sink connection including OAuth /call backs. Change the sink to new target, does it reload hist data etc
- OAuth flow is handled automatically including refreshing access token.
- Need to provide neccesary connection details to connect to datasource. Follow this video: https://www.youtube.com/watch?v=4r2z5E-d3VQ
- For every source to sink transfer, new job is created with source and sink configs. Once a job is created, source and sink cannot be changed for it. New job is required to create for each different combination.
Does all the source and sink connectors have consistent setup and config steps or its all over the place
- Yes, it has consistent steps
Security - whatever we can verify (example: running the portal on SSL, any user/password / authorization of using the portal, how it stores OAut tokens/user password for source sinks)
- We can enable TLS for the Java Servlet that is being used to host the application to securely access the application.
- Server login Password is stored in plain text initially. We have to apply our own hashing methods to securely store passwords.
- We can configure TLS certificates to securely encrypt connections to data sources.
- We can also restrict ip addresses that can access the application.
- We can also setup proxy-based firewall for some connections. 3 Firewall types are available: TUNNEL, SOCKS4, SOCKS5
- We can also provide SSL certificate for some connections individually.
- Some connections also have the proxy configuration.
Fidelity of the cloning - our benchmark is Fivetran and Stitch - primarily data types and whether there are cases (the way it does incremental) that we loose data
Through put and performance
- Writing 175240 records from local mysql to local csv file took almost 15 minutes.
Scheduling capabilities
- Minute
- Hourly
- Daily
- Weekly
- Monthly
- Cron expressions
- Can also be triggered through api.
Notification and logs/instrumentation
- Mail server can be configured to send notifications to the mail.
- For each job an email and a subject can be conifgured.
- An option to send an email only when error occurs is also available
- Each job run log is saved at a configured path. Log level for each job can also be configured. Additionally log file can also be downloaded.
Handling of nested (or multi structured data) from JSON or APIs
- Converts nested json to dotted notation.
  - For e.g if json is {"Company": {"employee": {"name": "john"}}}, this will be converted as Company.employee.name
  - And the array will be converted to string
Does it provides enough levers to configure the source and target behavior (select column, tables, able to keep snapshot on target and not just simply merge/overwrite, data type overrides, force full refresh)

NOTES:

All replication are incremental
Data Sync is like an ingestion module.
It cannot be used to read data and then process on that data.
We can only do copy from some place to another and provide some custom transformation.
Does not refresh the connection once it is established.
- Connected with elasticsearch, checked the table in data sync app. Inserted new index in elasticsearch, refreshed the connection in data sync but did not show the new table. Have to restart for change to take effect.
- I am able to query it though it does not show up in detected table list.

OAuth for CData Drivers

OAuth apps - For the applications that require OAuth flow, we have tested the Linkedin driver. CData have a pre-registered OAuth app on Linkedin but also have an option to register our own OAuth app on linkedin while using the CData driver so that we can specify our own callback URL for accepting the tokens. Whether to register one OAuth app per guzzle instance or one centralized guzzle instance on cloud which always accepts the callback and forwards it back to the relevant guzzle instance is something that needs to be evaluated. This is the same issue that we already have with other OAuth integrations from Guzzle which we need to test.

Overall, the CData drivers have more consistent interface compared to HDP

References

Our Call with them : https://youtu.be/_Aea-6W2rUU
https://www.youtube.com/watch?v=vbjvEBhLLdg

JDBC Connectors

Initial testing of CData using DBVis

I tried Facebook and LinkedIn driver. Quite easy to use. They are quite straight and thorough. Documentation is cleaer. We have to deeper in terms of:

Fidelity – both for metadata, data
Performance
Logging
Security

Its easy to test and configure. The driver I assume will keep changing as the schema gets expanded and new entities gets supported by the API. Most of this drivers exposed Views (and not tables). Some of them need a filter Like “Ads” to query- Again I don’t understand Facebook constract well.

This drivers are two ay and hence they provide procs to

Below is for Linked in:

So net, net – we can ask them for pricing –

-- Discussion with CDATA

Hemanata,

Please find my answers below:

We can accommodate any method of tracking you would prefer. We have a licensing mechanism that will allow us to track the individual machine activations. In this scenario, you would need one license key per driver (i.e. one for Salesforce and one for Twitter, etc.). If you are able to track usage, we have a single “OEMKey” for all drivers that you append to the connection string which then has a code check that validates that your application is calling the driver.
Correct, we ask for them annually. If you would prefer to report more frequently, then we can accommodate that as well.
The license fees are annual. We can structure perpetual deployment pricing, as well, for a higher price point per deployment.
The OEM fee is converted to a credit every year
This will cover both on-premise and cloud deployments.

For a POC, we can accommodate testing of up to (5) drivers for 60 days. We have found it very helpful to evaluate Salesforce, BigQuery, and MongoDB. Salesforce is a representative web API; BigQuery will show our performance on large cloud data sets, and MongoDB is a good source to explore how we handle nested structures. Additionally, if there are any specific data sources you are interested in or any integrations you have already done that you can compare our drivers to, those are also good options. Overall, it is up to you which five you select.

Let me know which ones you are interested in and I will generate evaluation licenses.

--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com [email protected] | 919-928-5214 x105 Book a time with me: www.calendly.com/ryanlcdata

From: Hemanta Banerjee [mailto:[email protected]] Sent: Thursday, April 25, 2019 9:03 AM To: Ryan Lee Subject: Re: OEM CData JDBC drivers

Hi Ryan

Thanks for the proposal. Very clear. Couple of quick questions

Is there is a license key for each deployment ? Or do we need to track on our end .
The utilisation report will need to be submitted only annually ?
The fee is perpetual license ?
We will get credit of the OEM fee every year or is it a first year incentive ?
Does it matter if SaaS deployment ? Even for SaaS deployment we will be charging and enabling for specific data sources only; so from my perspective no difference.
Thanks for including the 60 day trial. Very critical for the POC and evaluations for our customers

There will be a lot of questions on the technical front so wanted to see if we can evaluate 5-10 drivers for performance and also ease of embedding. Can we get a 60 day trial key.

Thanks Hemanta

On 25 Apr 2019, at 5:15 PM, Ryan Lee [email protected] wrote:

Hi Hemanta,

Attached is our proposal for an OEM partnership with access to all CData JDBC drivers. For your reference, the OEM fee access 5 – 10 drivers is $25,000/year, so I selected the all access option for you.

Once you have had a chance to review, please let me know and we can schedule a time to discuss. If you would like to speak today, I am available today at 9am and 11am Eastern Time. Otherwise, I am on PTO tomorrow but am available all next week. Regards,

--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com [email protected] | 919-928-5214 x105 Book a time with me: www.calendly.com/ryanlcdata

From: Ryan Lee [mailto:[email protected]] Sent: Tuesday, April 23, 2019 6:50 PM To: 'Hemanta Banerjee' Subject: RE: OEM CData JDBC drivers

Thanks, Hemanta. I will work up some preliminary pricing tomorrow and reach out to you to discuss.

Regards,

--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com [email protected] | 919-928-5214 x105 Book a time with me: www.calendly.com/ryanlcdata

From: Hemanta Banerjee [mailto:[email protected]] Sent: Tuesday, April 23, 2019 1:45 AM To: Ryan Lee Subject: Re: OEM CData JDBC drivers

Hi Ryan

Please see attached. Once you have had a chance to review lets discuss. I am at GMT+8.

Thanks

Hemanta Banerjee| Co-Founder M: +65.8139.0140 | www.justanalytics.com Click here to schedule a meeting

From: Ryan Lee [email protected] Date: Tuesday, 23 April 2019 at 3:14 AM To: Hemanta Banerjee [email protected] Subject: RE: OEM CData JDBC drivers

Hi Hemanta,

Thank you for your interest in CData. I manage our OEM Partnerships and will be happy to assist you.

For our OEM partnerships, we have a two-pronged approach:

 Technical: Assist your team in evaluating 2-3 JDBC drivers. You can find our full list of drivers here: www.cdata.com/jdbc

     Action Item: Select the data sources you are interested in testing, and I will generate licenses.

 Business: Customize our OEM licensing/pricing model to your solution

     Action item: Complete the attached OEM form

Regards,

--Ryan Lee | DIRECTOR, OEM SALES
CDATA SOFTWARE | ww.cdata.com [email protected] | 919-928-5214 x105 Book a time with me: www.calendly.com/ryanlcdata

-----Original Message----- From: Hemanta Banerjee [mailto:[email protected]] Sent: Sunday, April 21, 2019 12:55 AM To: [email protected] Subject: OEM CData JDBC drivers

I am looking to OEM CData drivers for my application. We would need a server side license as it is a BI application and we would need JDBC access from the web server to the remote databases. Can you please advice the next steps and how I can evaluate the drivers for my application.

Thanks Hemanta

CData - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Overview

Data sync

Licence and Support

Licence Model

Licence of DataSync

Support

Open Questions

Next actions (for Hemanta)

Next actions (for Devt Team)

OAuth for CData Drivers

References

JDBC Connectors

⚠️ GitHub.com Fallback ⚠️

CData - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Overview

Data sync

Licence and Support

Licence Model

Licence of DataSync

Support

Open Questions

Next actions (for Hemanta)

Next actions (for Devt Team)

OAuth for CData Drivers

References

JDBC Connectors

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️