nodeSegUDFgen ReadMe - jackghm/Vertica GitHub Wiki

Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of my current or past employers.

Vertica Table Optimization tools

nodeSegUDFgen.scala

Overview

This program is used to create 2 Vertica SQL UDFs

These UDFs are used to determine which node a record resides on

These UDFs are also used for the highly performant, node affinitized/parallelized reads and/or writes in InsSelParallel.sh

Note that a Vertica re-balance will occur if you add or remove nodes in your cluster, which will necessitate recreating these UDFs

PreReq

[Downloading and running Scala] (http://www.scala-lang.org/documentation/getting-started.html)

This program requires an input file that contains the output from the Vertica get_projection_segments() function

get_projection_segments() generates the projection segment ranges for each node in your Vertica cluster

You can use any segmented table projection to generate the file. An example vsql command

-- get a projection name

e.g., $ /opt/vertica/bin/vsql -d databaseName -At -U dbadmin -w redacted -h hostNameOrIP -c "select projection_name, projection_schema, anchor_table_name from projections where projection_schema ilike 'schemaName' and anchor_table_name ilike 'tableName';"

-- get the segmentation range from the table projection

$ /opt/vertica/bin/vsql -d databaseName -At -U dbadmin -w redacted -h hostNameOrIP -c "select get_projection_segments('schema.tableName_b0');" -o "/tmp/get_projection_segments.out"

-- get node names and Private IP address used to ssh too (note that this is only needed for the InsSelParallel.sh

$ /opt/vertica/bin/vsql -h hostNme -U jackg -w <redacted> -d pstl -At -c "select b.node_name, a.ip_address from v_monitor.network_interfaces a join v_catalog.nodes b on (a.node_name = b.node_name) where b.node_state = 'UP' and a.interface = 'eno16777736' and a.ip_address_family = 'ipv4' order by a.node_name asc;" -o /tmp/NodeIP.out

Compile & Run

To compile and run this code from the command line pass the output file names from above, as the parameters to nodeSegUDFgen.scala

$ scala nodeSegUDFgen.scala pathTo_get_projection_segments_OutputFileName NodeIP.out

e.g., sudo scala nodeSegUDFgen.scala "/tmp/get_projection_segments.out" "/tmp/NodeIP.out"

OR

To run this scala code from the within the Scala REPL

scala> :load nodeSegUDFgen.scala "/tmp/get_projection_segments.out" "/tmp/NodeIP.out"

The script output should be a file containing your SQL UDF code.

The output filename is the name of your input file with ".sql" appended to the name.

e.g., /tmp/get_projection_segments.out.sql

Creating the User-Defined SQL functions

NOTE: the create function SQL commands must be run using vsql not dBVizualizer if autocommit is set on

$ /opt/vertica/bin/vsql -U dbadmin -w redacted -h hostNameOrIP -f "/tmp/get_projection_segments.out.sql"

Verify & Test the UDFs

Verify the UDFs have been installed

SELECT * FROM USER_FUNCTIONS where function_name ilike 'segmentation%';

To see the distribution of data for a given table projection

select test.segmentationNode(test.segmentation(hash(columnList))) as NodeName , count(*) as nodeRows from schemaName.tableName group by 1 order by 1;

⚠️ **GitHub.com Fallback** ⚠️