Lake Formation - YakDriver/fardvag GitHub Wiki

NOTE: AWS consistently uses "Lake<space>Formation" rather than "LakeFormation".

LFTagResource

From AWS

From Community

  • Feature Request Issue #9700
  • Lake Formation Service Client #9701
  • Lake Formation DataLake settings #13250
  • Lake Formation resource #13267
  • Lake Formation permissions #13396

Lake Formation Nouns

  1. Data Lake = S3 path
  2. Data Catalog
  3. Blueprint
  4. Workflow
  5. Write Permission on Data Catalog, S3
  6. Manage Permission on Data Catalog, S3
  7. Athena to query data
  8. Redshift Spectrum to query data

Lake Formation Verbs

The following are the general steps to create and use a data lake:

  1. Register an Amazon Simple Storage Service (Amazon S3) path as a data lake.
  2. Grant Lake Formation permissions to write to the Data Catalog and to Amazon S3 locations in the data lake.
  3. Create a database to organize the metadata tables in the Data Catalog.
  4. Use a blueprint to create a workflow. Run the workflow to ingest data from a data source.
  5. Set up your Lake Formation permissions to allow others to manage data in the Data Catalog and the data lake.
  6. Set up Amazon Athena to query the data that you imported into your Amazon S3 data lake.
  7. For some data store types, set up Amazon Redshift Spectrum to query the data that you imported into your Amazon S3 data lake.

Integrations

AWS Service How Integrated
AWS Glue AWS Glue and Lake Formation share the same Data Catalog. For console operations (such as viewing a list of tables) and all API operations, AWS Glue users can access only the databases and tables on which they have Lake Formation permissions. AWS Glue does not support Lake Formation column permissions.
Amazon Athena When Amazon Athena users select the AWS Glue catalog in the query editor, they can query only the databases, tables, and columns that they have Lake Formation permissions on.

Queries using manifests are not supported.

In addition to principals who authenticate with Athena through AWS Identity and Access Management (IAM), Lake Formation supports Athena users who connect through the JDBC or ODBC driver and authenticate through SAML. Supported SAML providers include Okta and Microsoft Active Directory Federation Service (AD FS). For more information, see Using Lake Formation and the Athena JDBC and ODBC Drivers for Federated Access to Athena in the Amazon Athena User Guide.
Amazon Redshift Spectrum When Amazon Redshift users create an external schema on a database in the AWS Glue catalog, they can query only the tables and columns in that schema on which they have Lake Formation permissions.

Queries using manifests are not supported.
Amazon EMR Lake Formation permissions are enforced when Apache Spark applications are submitted using Apache Zeppelin or EMR Notebooks.
Amazon QuickSight Enterprise Edition When an Amazon QuickSight Enterprise Edition user queries a dataset in an Amazon S3 location that is registered with Lake Formation, the user must have the Lake Formation SELECT permission on the data.
AWS Glue DataBrew To create a dataset for an Amazon S3 location that is registered with Lake Formation, the DataBrew principal must have the DESCRIBE Lake Formation permission on the corresponding AWS Glue Data Catalog table. To access data in the dataset, the DataBrew principal must have the SELECT Lake Formation permission on the table.
AWS KMS Lake Formation also works with AWS Key Management Service (AWS KMS) to enable you to more easily set up these integrated services to encrypt and decrypt data in Amazon Simple Storage Service (Amazon S3) locations.
⚠️ **GitHub.com Fallback** ⚠️