HP Vertica Tips, Tricks, and Best Practices - jackghm/Vertica GitHub Wiki
Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of my current or past employers.
The longer you defer these best practices, the more work you incur in the future
Donations for these insights will be accepted in the form of good quality wine
note: the following is subject to change as Vertica continues to add features version over version
e.g., Grant Inherited Privileges
This guidance was developed from my years of experience as;
-
the Manager of the Twitter Analytics Data Warehouse team, where we had one of the biggest Vertica Data Warehouses in the world (Volume)
-
the VP of Big Data, at Playtika, the World’s leading Social Gaming Company, having a lot of Data Variety and Velocity
Hardware will fail, so plan accordingly
Design your cluster to be rack diverse!
Ensure your node/rack layout matches segmentation of ALL NODES ksafe 1;
note: I personally think 3 racks is the minimum # for a Vertica cluster
Layout your nodes like the following and the cluster will still be available, even if you lose a rack!
Rack1 - Node1 (buddy projection for Node6) Node4 (buddy projection for Node3)
Rack2 - Node2 (buddy projection for Node1) Node5 (buddy projection for Node4)
Rack3 - Node3 (buddy projection for Node2) Node6 (buddy projection for Node5)
Eliminate one of the Racks above and you will notice that every primary projection, or its buddy projection, account for all 6 node projections. Pretty cool.
Nodes
select * from nodes; -- check node state
-
It is critical to get your ‘down’ node out of ‘failure’ and ‘recovery’ state, and back to an ‘up’ status! The more time you get behind, the harder it is to catch up on your primary and buddy projection writes (i.e., physical storage)
-
While attempting to recover from a node failure, have a method to Pause all writes (e.g., ETL) to your database or the recovering node may never get a chance to recover
-
We have proven, time and again that once you pause writes, the node generally recovers very quickly (unless there is an epoch issue).
-
Consider having a stand-by node as a hot spare in the case you have serious hardware issues.
note: During a node failure, no matter how many nodes you have in your cluster, your cluster perf will be ~50% of normal perf.
The database cluster will shut down:
-
If a node goes down and one of its neighbor nodes also goes down (a buddy projection issue)
-
Regardless of rack diversity, if 50% of your nodes are down, Vertica shuts down the database. 3 racks are better than 2 :-)
Vertica, like most Big Data Data Warehouse solutions, does not have a truly viable backup/restore or replication model.
-
Just try restoring your database backup and see how far you get behind on processing your data.
-
You could do continuous copy cluster, but you likely have other services (e.g., ETL, MySQL, cron, …) running in your Data Center that will also be affected by serious hardware failures that affected your downed Vertica cluster nodes.
For the reasons above, I highly recommend Multi Data Center Data Redundancy
If anyone is interested, I would be happy to discuss my solution that has been improved up over earlier solutions
Capacity
You will need more disk capacity than you think
RAID 10 reduces your capacity by ~50% and Vertica “needs” (likes to have) 40% free space, plus you will need working space for your temp tables when doing things like dedup and optimize your tables.
audit() returns the raw, uncompressed bytes for your Vertica license but the actual storage space on disk can be significantly reduced with ENCODING RLE see (Optimize-Tables-Not-Queries).
Here is a sample query to check the bytes stored on disk.
select projection_schema, anchor_table_name
, to_char(sum(row_count), '999,999,999,999,999') as row_count
, to_char(ROUND(sum(ros_used_bytes) / (1024*1024*1024), 2), '999,999') as ros_used_GB
from projection_storage
where projection_schema in ('schemaA', 'schemaB', 'schemaC')
group by 1, 2 order by 1, 4 desc limit 20;
I personally like a lot of 1.2TB drives that are 7.2K RPM (10K RPM really doesn’t help)
At Playtika, our nodes have 24 x 1.2TB 10K RPM drives and at Twitter we had 50 x 1TB 7.2K RPM drives per node
note: ksafe=2 performance is terrible. RAID 10 and ksafe=1 is all you need to protect against data loss
Load Balancing and Maintenance
All access via database connections should use a CName and not a specific hostname!
Define a CName that uses a round robin hostname connection. For example, a jdbc connection like;
jdbc:vertica://VERTICA-DW-PROD-PRIV-JFK:5433/
instead of
jdbc:vertica://VERTICA-01-DW-PROD-PRIV-JFK:5433/
-
This will avoid dogpiling on a specific host. Vertica executes queries on an initiator node so you need to distribute the resource utilization across your cluster nodes
-
This allows you to transparently redirect users to another cluster for maintenance or “fail-over”
-
Also allows an easy way to disable all access to the cluster (from Talend, Tableau, Cron, Ad-hoc connections, …). This becomes important when you want to shutdown a cluster or bring back a down node (see above).
Performance tips
Optimize your tables! (Optimize-Tables-Not-Queries)
If you don’t optimize your tables in the beginning, you will be forced to move all your data from your original table to the new optimized table. Been there, done that, don't want to keep doing it!
Query Performance - An easy win!
As the distribution of data on tables changes over time, statistics on tables needs to be updated!
This was just one Twitter table, where every query would scan 88 Billion rows. Ugh!
Query Plan Before analyze_statistics
STORAGE ACCESS for a [Cost: 6M, Rows: 88B (NO STATISTICS)]
After analyze_statistics()
STORAGE ACCESS for a [Cost: 665, Rows: 2]
select analyze_statistics('schema.tableName'); -- JUST DO IT!
Misc
-
use DIRECT on INSERT and COPY when inserting > 100MB of data. Skip the WOS and go straight to the ROS
-
For large files, do not use stdin on COPY commands as the pipe will likely get broken
-
Vertica is a Write once, Read many store. It was designed for High Bandwidth, High Latency queries. MySQL was designed for low latency, high concurrency query requests. Vertica is not MySQL. Vertica Query requests go into a queue (defaults is 50). Lot’s of recurring Ad-hoc queries overwhelm the query request queue so;
-
Do system monitoring on O/S, File system, etc.. instead of using high frequency Vertica queries
-
Create Aggregate tables and get your BI, BA, and Data Scientists to query the AGG tables instead of them doing their massive agg queries over the fact/raw tables (over and over again) !!!
-
note: Always use a date predicate and a limit clause on your queries
Good Practices
Add these columns to each table
- batch_id
An unique identifier for the rows that were inserted from your ETL (PSTL). This is very handy when cleaning up data or supporting Idempotent operations.
It’s always good to;
-
know when a row was inserted
-
store the date day in an integer column
-
Partition your tables by Month (not day) since Vertica limits the number of ROS containers to ~1024. Vertica Merge partition does not work. ← That is a period.
For example;
CREATE TABLE jackg.test ( anInt INT NOT NULL , insert_ts timestamp NOT NULL DEFAULT now() , date_id INT NOT NULL DEFAULT to_char(now(),'YYYYMMDD')::integer ) PARTITION BY (((date_id / 100))::int);
Avoid Deletes
- Vertica only marks records as deleted. This means you are wasting disk space until a compaction happens (Purge)
select purge_table('schema.tableName');
Avoid UPDATEs
-
Vertica is a write once read many optimized store.
-
An UPDATE is a DELETE followed by an INSERT. There is no update in place. Use techniques such as creating a new table from a table instead of updating the table
Make your jobs idempotent!!! (an operation that is safe to repeat)
Don't have multiple versions of your schema’s. If your DBA suggests that you have a version of your schema that matches versions of your ETL, find a new DBA.
Set up recurring maintenance jobs
Every time you add a table you need to add grants. Automate schema.table grants in a job that runs every 10min.
Do not set the cache on SEQUENCE or your Inserts will take forever.
In fact, avoid SEQ if possible. Do the autoincrement in your ETL (PSTL).
Running ANALYZE_CONSTRAINTS is too much overhead since it runs over the entire table every time
Run a select … where date_id >= the last time you checked the table … group by … having count(*) > 1
tempspace issues
You are only treating the symptom if you keep increasing the user pool tempspace size
Although completely non-obvious, the following error may be a tempspace issue
“ERROR: Join inner did not fit in memory”
check your temp disk space available on your nodes
create a maintenance job to move the dc_% tables data to a different schema and reduce the data collector retention policy
set session tempspacecap none; -- only if you are superuser and desperate :-)
Query Execution
-- Currently executing queries
select session_id, statement_id, user_name, current_statement, last_statement, transaction_description, *
from sessions -- NOTE that if statement_id is null then the query is not actually running
where last_statement is not null order by 2 desc, 3 asc, 1;
-- Query requests that are/have been executed
select * from v_monitor.query_requests
-- Two ways to stop/kill running sessions. Note that this is not always immediate.
select close_session('????'); -- select * from sessions
select interrupt_statement('????', 1); -- session_id, statement_id