Snowflake Objects - gablesiak/learning-path GitHub Wiki
Snowflake Objects
1.Warehouses
-cluster of compute resources
-provides the required resources (CPU, memory, temp stage)
-used to perform operations like: SELECT, DELETE, INSERT, UPDATE, COPY INTO) -sizes XS – XXL, determine number of clusters
-auto suspend and auto resume – for billing optimization
-scalable
How to create: Snowflake->Warehouses -> Create -> Choose Name&Size – other settings optional
Example:
CREATE OR REPLACE WAREHOUSE my_wh WITH WAREHOUSE_SIZE='X-LARGE';
2.Roles
-Defines which access can be/is granted.
-Roles are assigned to users
-Hierarchy can be created -> assigning roles to other roles
-Inheritance of accesses accordingly to hierarchy
-Best practice – not to mix account management privileges and entity specific privileges in the same role
-No concept of a super-user
ORGADMIN:
-can create accounts
-can view accounts
-can view usage information
ACCOUNTADMIN:
-top-level role
-SYSADMIN+SECURITYADMIN
SECURITYADMIN:
-can manage any object
-can create, monitor and view users/roles
USERADMIN:
-access management only
SYSADMIN:
-create warehouses/databases
3.Databases
-logical containers
-all data is maintained in databases
-Each database consists 1 or more schemas
-no limits for number of databases, schemas or objects that can be created
4.Schemas
-logical groups of database objects (such as tables and views)
-each schema belongs to single database
-database+schema=namespace
5.Tables
Table types:
Temporary tables:
-non permanent, transistory data
-only exist within the session in which they were created
-persist only for the reminder of the session
-not visible for other users or sessions
-after session ending – table is removed, not recoverable
-overwriting existing table is enabled
-after creation it cannot be converted to any other table type
Transient Tables:
-persist until removed
-available for all users with proper access
-similar to permanent tables
-key difference: not having fail-safe period.
-designed for transitory data, that needs to be maintained beyond each session
-do not require the same protection and recovery as permanent
-after creation it cannot be converted to any other table type
Permanent Tables:
-existing until dropped
-time travel allowed – 1 to 90 days recovery https://docs.snowflake.com/en/sql-reference/sql/create-table.html
Time travel:
-availability to retrieve older versions of objects up to 90 days in the past
-ability to undrop objects
-increases storage costs
-if not wanted – set retention period to 0 days
-Usage: for automated testing, debugging
Fail-safe:
-begins where time travel ends
-7 days – cannot configure it
-last resort in the event of damage
-better not to use for retrieving historical data – expensive
6.Views
-allows the result of a query to be accessed as if it’s table
-purposes: combining, segregation, protecting data
Types of views:
Non-materialized: -generically refers to all types of views
-named definition of query
-results are not stored for future use
-performance is slower than with materialized views
-most common type of view
-use case: selecting some columns from table or specific data range, joining 2 or more tables
Materialized:
-behaves more like a table
-results are stored – similar to results fo table
-allows faster access – good when results are often in use
-should be used when query contain small number of rows/columns
-view’s base table doesn’t change frequently
Secure Views:
-Can be both materialized and non-materialized
-improved data privacy and data sharing
-performance impacts
-useful in case of sensitive data
Recursive Views:
-non-materialized only
-the view can refer to itself
-really don’t know what to put here and what’s the purpose of that kind of view
Alter view– modifies the properties for existing view:
-renaming view,
-converting to secure view
-comments for view
7.Constraints
Available funcionalities:
-Unique primary/foreign keys
-Not null columns
-Named constraints
-single- and multicolumn constraints
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
8.Table clustering keys
-subset of columns in table that are explicitly designated to co-locate data in the same micro-partitions
-useful for very large tables
-use case: queries are running slower than expected
-improves scan efficiency in queries
-better column compression
9.Stored Procedures
-similar to functions
-created once, can be executed many times
-created with CREATE PROCEDURE, executed with CALL
-returns a single value
-helps with error handling
-improvement of dynamics in SQL statement creation
-it helps in access delegating. Can empower user to run the procedure, even if without it user don’t have enough privileges.
10.Sequences
-used to generate unique numer accross sessions and statements.
-can be used to generate values for primary key/any column that requires unique value
11.Stages/File formats
-locations used to store data.
-can include directory table. It stores a catalog of staged files in cloud storage
Types of Stages:
Internal stage:
-stores data files internally within Snowflake
-can be permanent or temporary
-before ingesting data into a table in Snowflake data has to be loaded into the stage with PUT command
-then loading into table using COPY INTO command
-unloading: GET to load into stage, COPY INTO to export
Further divided to
1.User Stages
-tied to specific user
-each user has default stage created
-not able to modify
-files in one user stage cannot be accessed by other user.
-referring to user stages with @~
2.Table Stages
-tied to specific user.
-when table is created => table stage created automatically
-not able to modify
-table stage for particular table cannot be accessed through another table
-referring with @%
3.Internal Named Stages
-more flexibility
-all operations that can be performed on object can be performed on internal stages as well.
-needs to be created manually
-format options can be specified
-referring with ~
External Stage:
-references data files stored outside of snowflake. It can be Amazon S3 buckets, GC Storage buckets, Azure containers
-storage location can be private/protected/public
-loading/unloading data can by directly done using COPY INTO
-GET PUT – not supported
12.Pipes/Snowpipes
-enables loading data from files as soon as they’re available in a stage.
-data can be loaded in micro-batches, making it available for user within minutes, rather than manually executing COPY statements
-data loaded accordingly to the COPY statements defined in reference
-COPY statements identifies the source location of the data files and a target table.
-All data types are supported
details in separate file
13.External tables
-data is stored in external stage
-store file-level metadata about files (filename, version, relations)
-it allows querying data from files in external stage, as if it were inside a database.
-can access any format supported by COPY INTO.
-read only
-can be used for query/join operations
-views based on external tables can be created
14.Data shares
-secure data sharing enables sharing objects in a database in your account with other accounts
-objects that can be shared: tables, external tables, secure views and materialized views, secure UDFs
-sharing done via snowflake service layer. Not copied or transferred between accounts.
-provider creates a share of a database, specify objects to share.
-consumer side – read only database is created from share. Access configuration – the same as usual
15.Tasks
-can execute single SQL statement/call to a stored procedure
-can be combined with table streams
-part of snowflake data pipeline framework
-can be linked in a tree of tasks with dependencies on other tasks
-Use cases: near real time views, refreshing objects,
-AccountAdmin role can only grant the execute task privilege to a role
RESOURCES QUICK ACCESS
Official Snowflake Docs https://docs.snowflake.com/en/index.html
Snowflake/SQL blog by data engineer https://www.alisa-in.tech/
Stages - blog https://www.analyticsvidhya.com/blog/2021/07/demystifying-stages-in-snowflake/
How to use snowflake tasks https://www.youtube.com/watch?v=XW4SJ8ssJ2k