nodeSegUDFgen ReadMe - jackghm/Vertica GitHub Wiki
Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of my current or past employers.
- code can be found in this gist https://gist.github.com/jackghm/b2ea9d5536755288ca5f48cd182731b1
- This only needs to be run one time
- You must rerun this program if you add/remove/re-balance nodes in your Vertica cluster
This program is used to create 2 Vertica SQL UDFs
These UDFs are used to determine which node a record resides on
These UDFs are also used for the highly performant, node affinitized/parallelized reads and/or writes in InsSelParallel.sh
Note that a Vertica re-balance will occur if you add or remove nodes in your cluster, which will necessitate recreating these UDFs
[Downloading and running Scala] (http://www.scala-lang.org/documentation/getting-started.html)
This program requires an input file that contains the output from the Vertica get_projection_segments()
function
get_projection_segments()
generates the projection segment ranges for each node in your Vertica cluster
You can use any segmented table projection to generate the file. An example vsql command
-- get a projection name
e.g., $ /opt/vertica/bin/vsql -d databaseName -At -U dbadmin -w redacted -h hostNameOrIP -c "select projection_name, projection_schema, anchor_table_name from projections where projection_schema ilike 'schemaName' and anchor_table_name ilike 'tableName';"
-- get the segmentation range from the table projection
$ /opt/vertica/bin/vsql -d databaseName -At -U dbadmin -w redacted -h hostNameOrIP -c "select get_projection_segments('schema.tableName_b0');" -o "/tmp/get_projection_segments.out"
-- get node names and Private IP address used to ssh too (note that this is only needed for the InsSelParallel.sh
$ /opt/vertica/bin/vsql -h hostNme -U jackg -w <redacted> -d pstl -At -c "select b.node_name, a.ip_address from v_monitor.network_interfaces a join v_catalog.nodes b on (a.node_name = b.node_name) where b.node_state = 'UP' and a.interface = 'eno16777736' and a.ip_address_family = 'ipv4' order by a.node_name asc;" -o /tmp/NodeIP.out
To compile and run this code from the command line
pass the output file names from above, as the parameters to nodeSegUDFgen.scala
$ scala nodeSegUDFgen.scala pathTo_get_projection_segments_OutputFileName NodeIP.out
e.g., sudo scala nodeSegUDFgen.scala "/tmp/get_projection_segments.out" "/tmp/NodeIP.out"
OR
To run this scala code from the within the Scala REPL
scala> :load nodeSegUDFgen.scala "/tmp/get_projection_segments.out" "/tmp/NodeIP.out"
The script output should be a file containing your SQL UDF code.
The output filename is the name of your input file with ".sql" appended to the name.
e.g., /tmp/get_projection_segments.out.sql
NOTE: the create function
SQL commands must be run using vsql not dBVizualizer if autocommit is set on
$ /opt/vertica/bin/vsql -U dbadmin -w redacted -h hostNameOrIP -f "/tmp/get_projection_segments.out.sql"
Verify the UDFs have been installed
SELECT * FROM USER_FUNCTIONS where function_name ilike 'segmentation%';
To see the distribution of data for a given table projection
select test.segmentationNode(test.segmentation(hash(columnList))) as NodeName , count(*) as nodeRows from schemaName.tableName group by 1 order by 1;