architecture data lakehouse - ghdrako/doc_snipets GitHub Wiki

It is possible to run typical data warehousing workloads (SQL queries and business analytics) directly on the files in the data lake, alongside data science/machine learning workloads. This architecture is known as a data lakehouse, and it requires a query execution engine and a metadata (e.g., schema management) layer that extend the data lake’s file storage. Apache Hive, Spark SQL, Presto, and Trino are examples of this approach.