Data Lake - KeynesYouDigIt/Knowledge GitHub Wiki

A data lake is a system or repository of data stored in an unprocessed and original format, (think object blobs, raw files, "we found it like this" format). Data in the lake is often subsequently transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. Traditionally, data from the lake is transformed and put into a Data Warehouse.

A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).

Poorly managed data lakes have been facetiously called data swamps.

Tools for implementation

AWS s3, Lake Formation Snowflake db, Azure Lake storage Qubole,

The Data bricks "lakehouse" - https://databricks.com/discover/data-lakes/introduction

See more at 1.3-Storage-and-retrieval#data-warehousing