ETL ESB - sgml/signature GitHub Wiki
History
| Decade | Equivalent to Airbyte | Why it fits | Active/Inactive | Open-Source Alternatives at the Time |
|---|---|---|---|---|
| 1960s | Custom COBOL/Assembler batch jobs | Early data integration was hand-coded batch processes moving records between mainframes and tape. | Active: Inactive (legacy only, no active development) | None (integration was bespoke, proprietary mainframe code) |
| 1970s | IBM Information Management System (IMS) and Customer Information Control System (CICS) batch utilities | Enterprises used IBM's IMS databases and CICS transaction systems with utilities to extract and load data. | Partially Active (IMS/CICS still exist, but batch ETL utilities are legacy) | None (open-source movement hadn't reached enterprise ETL yet) |
| 1980s | SAS Data Integration / early ETL utilities | SAS and similar tools offered reusable scripts for extraction and transformation. | Active (SAS still maintained, though niche) | Early Unix shell scripting, awk/sed pipelines (community-driven but not formal ETL tools) |
| 1990s | Informatica PowerCenter (1993), IBM DataStage (1997) | Commercial ETL platforms matured, providing graphical interfaces and connectors. | Active (still maintained, enterprise use) | GNU tools (Perl, Bash, awk) used for DIY ETL; no formal open-source ETL platforms yet |
| 2000s | Talend Open Studio (2006), Pentaho Kettle (2001) | Open-source ETL tools appeared, democratizing integration. | Active (Talend and Pentaho still maintained, though less dominant) | Talend, Pentaho Kettle (a.k.a. PDI), CloverETL (community edition) |
| 2010s | Apache NiFi (2014), Singer.io (2017), Fivetran (2012) | Cloud-native and open-source pipelines emerged; Singer's tap/target spec resembles Airbyte's connectors. | Active (NiFi, Fivetran, Singer maintained, though Singer less active) | Apache NiFi, Singer.io, Apache Sqoop, Luigi, Airflow (for orchestration) |
Non-HTTP Transports
| Alternative Submission Method | Transport Mechanism | Example Free Server (Pi 1 default‑ready) | GitHub URL | License |
|---|---|---|---|---|
| FTP uploads | File Transfer Protocol | vsftpd | https://github.com/vsftpd/vsftpd | GPL-2.0 |
| SFTP uploads | SSH File Transfer Protocol | Dropbear SFTP | https://github.com/mkj/dropbear | MIT-style |
| Raw TCP sockets | Custom socket protocols | Netcat (listen mode) | https://github.com/openbsd/src/tree/master/usr.bin/nc | BSD |
| UDP datagrams | Lightweight transport | BIND (DNS server) | https://github.com/isc-projects/bind9 | MPL-2.0 |
| Bluetooth data transfer | Short‑range wireless | BlueZ stack | https://github.com/bluez/bluez | GPL-2.0 |
| Email submission | SMTP transport | Exim | https://github.com/Exim/exim | GPL-2.0 |
EDI
[ Open Source EDI Projects (Non-JS) ]
|
-------------------------------------------------------------------
| | |
[ Python ] [ Java ] [ PHP ]
| | |
| | |
[ pyx12 ] [ Smooks ] [ bots-edi ]
"Parses HL7/X12 "Transforms X12 into XML/JSON "EDI translator with
healthcare sets for healthcare flows" mapping & routing"
| | |
| | |
-----------------------------+------------------------------------
|
V
[ Healthcare Transaction Sets ]
-------------------------------------------------------------------
| | |
[ 837 Claims ] [ 835 Remittance ] [ 834 Enrollment ]
"Patient billing & "Electronic remittance advice "Benefit enrollment
insurance claims" for payments" and member updates"
| | |
| | |
-----------------------------+------------------------------------
|
V
[ Shared Capabilities Across Projects ]
-------------------------------------------------------------------
| - Parse X12 segments (ISA, GS, ST...) |
| - Validate compliance (997/999 acknowledgments) |
| - Map EDI → internal models (JSON, CSV, DB) |
| - Support batch transport (SFTP, AS2, TCP/IP, WebDAV) |
| - Transform data (XML, XSLT) |
-------------------------------------------------------------------
Comparison of Open Source Service Bus Implementations
| Feature | Cadence | Titanoboa |
|---|---|---|
| Language | Go | Java (JVM-based) |
| Workflow Model | Event-driven workflow execution | Low-code workflow orchestration |
| Database Support | MySQL, PostgreSQL, Cassandra | Any relational DB via JDBC |
| Security Mechanisms | TLS encryption, authentication via IAM | User authentication, token security, role-based access |
| Scalability | Highly scalable via microservices | Modular, scales based on workflow complexity |
| Fault Tolerance | Durable execution with automatic retries | Workflow recovery and rollback mechanisms |
| State Management | Built-in state persistence | Supports external state persistence |
| Memory Requirements | Lightweight (~MBs) | Scales dynamically (~MBs-GBs) |
| Use Case | Distributed workflow engine for async tasks | Low-code service bus for orchestrating workflows |
References
https://www.freecodecamp.org/news/sqlalchemy-makes-etl-magically-easy-ab2bd0df928/
https://dev.to/zchtodd/sqlalchemy-performance-anti-patterns-and-their-fixes-4bmm
https://news.ycombinator.com/item?id=19098246
https://hakibenita.com/fast-load-data-python-postgresql
https://docs.konghq.com/hub/kong-inc/openid-connect/support/
https://www.ibm.com/docs/en/datapower-gateway/10.5.x?topic=gateway-programming-model-gatewayscript