Snowflake Objects - gablesiak/learning-path GitHub Wiki

Snowflake Objects

1.Warehouses

-cluster of compute resources

-provides the required resources (CPU, memory, temp stage)

-used to perform operations like: SELECT, DELETE, INSERT, UPDATE, COPY INTO) -sizes XS – XXL, determine number of clusters

-auto suspend and auto resume – for billing optimization

-scalable

How to create: Snowflake->Warehouses -> Create -> Choose Name&Size – other settings optional

Example:

CREATE OR REPLACE WAREHOUSE my_wh WITH WAREHOUSE_SIZE='X-LARGE';

2.Roles

-Defines which access can be/is granted.

-Roles are assigned to users

-Hierarchy can be created -> assigning roles to other roles

-Inheritance of accesses accordingly to hierarchy

-Best practice – not to mix account management privileges and entity specific privileges in the same role

-No concept of a super-user

ORGADMIN:

-can create accounts

-can view accounts

-can view usage information

ACCOUNTADMIN:

-top-level role

-SYSADMIN+SECURITYADMIN

SECURITYADMIN:

-can manage any object

-can create, monitor and view users/roles

USERADMIN:

-access management only

SYSADMIN:

-create warehouses/databases

3.Databases

-logical containers

-all data is maintained in databases

-Each database consists 1 or more schemas

-no limits for number of databases, schemas or objects that can be created

4.Schemas

-logical groups of database objects (such as tables and views)

-each schema belongs to single database

-database+schema=namespace

5.Tables

Table types:

Temporary tables:

-non permanent, transistory data

-only exist within the session in which they were created

-persist only for the reminder of the session

-not visible for other users or sessions

-after session ending – table is removed, not recoverable

-overwriting existing table is enabled

-after creation it cannot be converted to any other table type

Transient Tables:

-persist until removed

-available for all users with proper access

-similar to permanent tables

-key difference: not having fail-safe period.

-designed for transitory data, that needs to be maintained beyond each session

-do not require the same protection and recovery as permanent

-after creation it cannot be converted to any other table type

Permanent Tables:

-existing until dropped

-time travel allowed – 1 to 90 days recovery https://docs.snowflake.com/en/sql-reference/sql/create-table.html

Time travel:

-availability to retrieve older versions of objects up to 90 days in the past

-ability to undrop objects

-increases storage costs

-if not wanted – set retention period to 0 days

-Usage: for automated testing, debugging

Fail-safe:

-begins where time travel ends

-7 days – cannot configure it

-last resort in the event of damage

-better not to use for retrieving historical data – expensive

6.Views

-allows the result of a query to be accessed as if it’s table

-purposes: combining, segregation, protecting data

Types of views:

Non-materialized: -generically refers to all types of views

-named definition of query

-results are not stored for future use

-performance is slower than with materialized views

-most common type of view

-use case: selecting some columns from table or specific data range, joining 2 or more tables

Materialized:

-behaves more like a table

-results are stored – similar to results fo table

-allows faster access – good when results are often in use

-should be used when query contain small number of rows/columns

-view’s base table doesn’t change frequently

Secure Views:

-Can be both materialized and non-materialized

-improved data privacy and data sharing

-performance impacts

-useful in case of sensitive data

Recursive Views:

-non-materialized only

-the view can refer to itself

-really don’t know what to put here and what’s the purpose of that kind of view

Alter view– modifies the properties for existing view:

-renaming view,

-converting to secure view

-comments for view

7.Constraints

Available funcionalities:

-Unique primary/foreign keys

-Not null columns

-Named constraints

-single- and multicolumn constraints

Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.

8.Table clustering keys

-subset of columns in table that are explicitly designated to co-locate data in the same micro-partitions

-useful for very large tables

-use case: queries are running slower than expected

-improves scan efficiency in queries

-better column compression

9.Stored Procedures

-similar to functions

-created once, can be executed many times

-created with CREATE PROCEDURE, executed with CALL

-returns a single value

-helps with error handling

-improvement of dynamics in SQL statement creation

-it helps in access delegating. Can empower user to run the procedure, even if without it user don’t have enough privileges.

10.Sequences

-used to generate unique numer accross sessions and statements.

-can be used to generate values for primary key/any column that requires unique value

11.Stages/File formats

-locations used to store data.

-can include directory table. It stores a catalog of staged files in cloud storage

Types of Stages:

Internal stage:

-stores data files internally within Snowflake

-can be permanent or temporary

-before ingesting data into a table in Snowflake data has to be loaded into the stage with PUT command

-then loading into table using COPY INTO command

-unloading: GET to load into stage, COPY INTO to export

Further divided to

1.User Stages

-tied to specific user

-each user has default stage created

-not able to modify

-files in one user stage cannot be accessed by other user.

-referring to user stages with @~

2.Table Stages

-tied to specific user.

-when table is created => table stage created automatically

-not able to modify

-table stage for particular table cannot be accessed through another table

-referring with @%

3.Internal Named Stages

-more flexibility

-all operations that can be performed on object can be performed on internal stages as well.

-needs to be created manually

-format options can be specified

-referring with ~

External Stage:

-references data files stored outside of snowflake. It can be Amazon S3 buckets, GC Storage buckets, Azure containers

-storage location can be private/protected/public

-loading/unloading data can by directly done using COPY INTO

-GET PUT – not supported

12.Pipes/Snowpipes

-enables loading data from files as soon as they’re available in a stage.

-data can be loaded in micro-batches, making it available for user within minutes, rather than manually executing COPY statements

-data loaded accordingly to the COPY statements defined in reference

-COPY statements identifies the source location of the data files and a target table.

-All data types are supported

details in separate file

13.External tables

-data is stored in external stage

-store file-level metadata about files (filename, version, relations)

-it allows querying data from files in external stage, as if it were inside a database.

-can access any format supported by COPY INTO.

-read only

-can be used for query/join operations

-views based on external tables can be created

14.Data shares

-secure data sharing enables sharing objects in a database in your account with other accounts

-objects that can be shared: tables, external tables, secure views and materialized views, secure UDFs

-sharing done via snowflake service layer. Not copied or transferred between accounts.

-provider creates a share of a database, specify objects to share.

-consumer side – read only database is created from share. Access configuration – the same as usual

15.Tasks

-can execute single SQL statement/call to a stored procedure

-can be combined with table streams

-part of snowflake data pipeline framework

-can be linked in a tree of tasks with dependencies on other tasks

-Use cases: near real time views, refreshing objects,

-AccountAdmin role can only grant the execute task privilege to a role

RESOURCES QUICK ACCESS

Official Snowflake Docs https://docs.snowflake.com/en/index.html

Snowflake/SQL blog by data engineer https://www.alisa-in.tech/

Stages - blog https://www.analyticsvidhya.com/blog/2021/07/demystifying-stages-in-snowflake/

How to use snowflake tasks https://www.youtube.com/watch?v=XW4SJ8ssJ2k