ESGFNode|GridFTPMetrics - ESGF/esgf.github.io GitHub Wiki

Wiki Reorganisation
This page has been classified for reorganisation. It has been given the category MOVE.
The content of this page will be revised and moved to one or more other pages in the new wiki structure.

ESG Usage Parser Introduction and Requirements

This page describes the esg_usage_parser program that is used to pull GridFTP usage statistics from a log file and insert them into the ESG defined database schema. This guide will cover installation, configuration, testing, other technical details, and some troubleshooting.

The following packages are required on your system for a fully functional installation and configuration:

PostgreSQL is required.

  • Cron is required. Perl is required. Perl DBI interface is required.

    Install DBI on RHEL:

    yum install perl-DBD-Pg

    Or alternatively using CPAN:

    perl -MCPAN -e shell cpan> install DBI

Starting GridFTP to log usage data

_ NOTE: Please make sure all requirements listed in the previous section are met before advancing to this section. _

Once GridFTP is patched, we can instruct it to log usage information (required for the esg_usage_parser program to operate) by starting the server with some additional parameters.

Typically, you might start the GridFTP server as follows:

$GLOBUS_LOCATION/sbin/globus-gridftp-server -p PORT

For the usage data to be logged properly and in the required format, the GridFTP server must now be started as follows:

GLOBUS_USAGE_DEBUG=MESSAGES,/PATH/TO/USAGELOG $GLOBUS_LOCATION/sbin/globus-gridftp-server -p PORT -usage-stats-id PORT -usage-stats-target localhost:0\!all

The first environment variable _ GLOBUS_USAGE_DEBUG _ is used to specify which usage information should be logged and to which file. The /PATH/TO/USAGELOG parameter specifies an absolute file location where the usage information should be written. The rest of the parameters are used to enable the specific logging needed.

Once the server is started, you should be able to inspect the contents of the _ /PATH/TO/USAGELOG _ file. It should resemble the following format:

... snip ...
==========SENDING USAGE INFO: localhost:0==(length: 569)===
....................L?"HHOSTNAME=vm-125-67.ci.uchicago.edu START=20100715145919.888004 END=20100715145920.48168 VER="3.17 (gcc64dbg, 1245879812-78) [Globus Toolkit 4.2.1]" BUFFER=87380 BLOCK=262144 NBYTES=107100 STREAMS=1 STRIPES=1 TYPE=RETR CODE=226 FILE=/usr/lib/libgssrpc.so.4.0 CLIENTIP=98.215.66.22 DATAIP=98.215.66.22 USER=neillm USERDN=/O=Grid/OU=GlobusTest2/OU=simpleCA-vm-125-66.ci.uchicago.edu/CN=https://esg.ucar.edu/myopenid/rootAdmin CONFID=20202 DSI=file SCHEMA=gsiftp APP=globus-url-copy APPVER="5.2 (gcc64dbg, 1258045430-1) [Globus Toolkit 4.3.0-HEAD]"
=========================================================

==========SENDING USAGE INFO: localhost:0==(length: 564)===
....................L?"HHOSTNAME=vm-125-67.ci.uchicago.edu START=20100715145920.99916 END=20100715145920.585767 VER="3.17 (gcc64dbg, 1245879812-78) [Globus Toolkit 4.2.1]" BUFFER=87380 BLOCK=262144 NBYTES=1306252 STREAMS=1 STRIPES=1 TYPE=RETR CODE=226 FILE=/usr/lib/libnss3.so CLIENTIP=98.215.66.22 DATAIP=98.215.66.22 USER=neillm USERDN=/O=Grid/OU=GlobusTest2/OU=simpleCA-vm-125-66.ci.uchicago.edu/CN=https://esg.ucar.edu/myopenid/rootAdmin CONFID=20202 DSI=file SCHEMA=gsiftp APP=globus-url-copy APPVER="5.2 (gcc64dbg, 1258045430-1) [Globus Toolkit 4.3.0-HEAD]"
=========================================================
... snip ...

Installing the esg_usage_parser Package

_ NOTE: Please make sure all requirements listed in the previous sections are met before advancing to this section. _

UPDATE : This project now lives in ESG's SVN repository. Please check it out at the following URL instead of downloading the web version below (kept online for reference):

e.g. svn checkout http://www2-pcmdi.llnl.gov/svn/repository/gridftp/esg_usage_parser

Once retrieved, unpack the archive and make sure the script is executable as follows:

# unpack the archive
neillm@boiler:/tmp$ tar -xzf esg_up-NNN.tar.gz
neillm@boiler:/tmp$ cd esg_up
neillm@boiler:/tmp/esg_up$ ls
esg_usage_parser  esg_usage_parser.conf.sample  usage-log
# make sure the script is executable
neillm@boiler:/tmp/esg_up$ chmod a+x esg_usage_parser

Since the program is a perl program, no further compilation or building needs to take place. Whether you run the program from the current directory or move it to a more suitable location (i.e. /usr/local/bin), the program is now installed and able to run on your system.

Configuring the esg_usage_parser Program

The configuration file for the esg_usage_parser is required. By default, the program will look in /etc/esg_usage_parser.conf , but you may easily override this location (recommended) by setting the _ ESG_USAGE_PARSER_CONF _ variable in your environment.

In the package that you downloaded, there is a sample configuration file called esg_usage_parser.conf.sample and for this example configuration, we'll be using this file.

In the shell, run the following command to point the program to the sample configuration file:

export ESG_USAGE_PARSER_CONF=`pwd`/esg_usage_parser.conf.sample

Then, open the sample configuration file in your favorite editor. The contents should resemble the following:

# esg_usage_parser configuration file
#

# Required Database Variables
DBNAME=dbname
DBHOST=dbhost
DBPORT=5432
DBUSER=dbuser
DBPASS=dbpass

# Usage log file (emitted from GridFTP)
USAGEFILE=usage-log

# Default tmp file is "/tmp/__up_tmpfile"
TMPFILE=/tmp/__up_tmpfile

# Default of 0 emits no debugging output of the usage data
DEBUG=0

# Default of 0 writes to DB; set to 1 to disable DB writes
NODBWRITE=1

Configuration Variables Explained

DBNAME should be set to the correct name of the database to be written.

  • DBHOST should be set to the hostname that the database is running on. DBPORT should be set to the port that the database is running on. DBUSER should be set to the username with access to the ESG access_logging table. DBPASS should be the password of the above username. USAGEFILE should be set to the location of the log file that is generated by GridFTP . This is the file that's parsed and broken into the ESG database formatted variables. TMPFILE should be set to a temporary location. This file contains the timestamp of the last packet analyzed to avoid writing the same packet data to the database multiple times. Each time the usage data is read, it skips over all previously visited packets. This file can be manually removed to avoid this behaviour. DEBUG should be set to 0. If set to 1, verbose debug information about each usage packet is written to stdout. NODBWRITE should be set to 0. Unless testing, all analyzed packets are written to the database. For testing purposes, set this value to 1 and no data will be written to the database. By default this is set to 1.

Testing the esg_usage_parser Program

Once you've updated the configuration file with the proper Database parameters (for good form, not strictly required for this test), we can run a test that will not write to the database to be sure that your environment is properly configured. To do this, be sure that the NODBWRITE is set to 1 and point the USAGEFILE variable to the included _ usage-log _ file in the current package directory.

Then run the esg_usage_parser program. A proper test run should look like this:

[root@vm-125-67 sbin]# ./esg_usage_parser
---------------- Configuration ----------------
CONFIG   : current config = /usr/local/gt-current/sbin/config
DBNAME   : option is set to esgtest
DBHOST   : option is set to localhost
DBPORT   : option is set to 5432
DBUSER   : option is set to esgtest
DBPASS   : option is set to 
NODBWRITE: option is set to 1
DEBUG    : option is set to 0
TMPFILE  : option is set to /tmp/__up_tmpfile
USAGEFILE: option is set to usage-log
---------------- Configuration ----------------
Processed 211 packets, Skipped 0 packets, Failed to process 211 packets

In this case, 211 packets are processed and 211 are reported as failures. This is OK. The failures are packets that were not written to the database. If and when you see skipped packets, these are the ones that are seen in the log files that were already written to the database. For example, let's change our test now so that it writes data to the database. Set the config variable NODBWRITE to 0 (assuming now that the database config options are all set properly) and re-run the command above.

[root@vm-125-67 sbin]# ./esg_usage_parser
---------------- Configuration ----------------
CONFIG   : current config = /usr/local/gt-current/sbin/config
DBNAME   : option is set to esgtest
DBHOST   : option is set to localhost
DBPORT   : option is set to 5432
DBUSER   : option is set to esgtest
DBPASS   : option is set to 
NODBWRITE: option is set to 0
DEBUG    : option is set to 0
TMPFILE  : option is set to /tmp/__up_tmpfile
USAGEFILE: option is set to usage-log
---------------- Configuration ----------------
Processed 211 packets, Skipped 0 packets, Failed to process 0 packets

Now you will see that all packets were written to the database and no failures occurred. If you re-run the program now, you should see the following output:

[root@vm-125-67 sbin]# ./esg_usage_parser
---------------- Configuration ----------------
CONFIG   : current config = /usr/local/gt-current/sbin/config
DBNAME   : option is set to esgtest
DBHOST   : option is set to localhost
DBPORT   : option is set to 5432
DBUSER   : option is set to esgtest
DBPASS   : option is set to 
NODBWRITE: option is set to 0
DEBUG    : option is set to 0
TMPFILE  : option is set to /tmp/__up_tmpfile
USAGEFILE: option is set to usage-log
---------------- Configuration ----------------
Processed 0 packets, Skipped 211 packets, Failed to process 0 packets

This time, note that all of the packets are Skipped. This means that the information in the log has already been written to the database and instead of re-writing them as duplicates, they are skipped.

Checking the Database contents

Assuming that you're logged into the Database using psql, you can query to see the packet data inside of the database. Here are some simple queries to get an idea:

esgtest=> select count(*) from access_logging;
 count
-------
   211
(1 row)

esgtest=> select * from access_logging limit 2;
  id  |                 user_id                 | email |                        url                         | file_id | remote_addr  |   user_agent    | service_type | batch_update_time | date_fetched | success | duration
------+-----------------------------------------+-------+----------------------------------------------------+---------+--------------+-----------------+--------------+-------------------+--------------+---------+----------
 1056 | https://esg.ucar.edu/myopenid/rootAdmin |       | gsiftp://vm-125-67.ci.uchicago.edu:20202/etc/group |         | 98.215.66.22 | globus-url-copy | gsiftp       |        1280348138 |   1281899479 | t       |   51.089
 1057 | https://esg.ucar.edu/myopenid/rootAdmin |       | gsiftp://vm-125-67.ci.uchicago.edu:20202/usr/lib   |         | 98.215.66.22 | globus-url-copy | gsiftp       |        1280348138 |   1281902310 | t       |  115.624
(2 rows)

esgtest=> delete from access_logging where id > -1;
DELETE 211

NOTE : The last command deletes all of the test data that was inserted. It is not useful and should not be used for actual ESG metrics.

Configuring the ESG Usage Parser with Cron

Assuming a suitable implementation of Cron is installed on your system, use the _ crontab -e _ command to edit the active cron entries.

neillm@boiler:$ crontab -e

To add an entry for the esg_usage_parser script to run every 10 minutes, add the following line once the editor has been started:

*/10 * * * * /PATH/TO/esg_usage_parser

The notation above tells Cron to execute the specified script every 10 minutes.

To view if the job was properly entered into the cron system, use the following command:

neillm@boiler:$ crontab -l
# m h  dom mon dow   command
*/10 * * * * /PATH/TO/esg_usage_parser

Troubleshooting / FAQ

  1. I see the following when trying to run esg_usage_parser:

[root@vm-125-67 sbin]# ./esg_usage_parser install_driver(Pg) failed: Can't locate DBD/Pg.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux- thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at (eval 3) line 3. Perhaps the DBD::Pg perl module hasn't been fully installed, or perhaps the capitalisation of 'Pg' isn't right. Available drivers: DBM, ExampleP , File, Gofer, Proxy, Sponge. at esg_usage_parser line 154

Answer : Install Perl DBI package for your platform

  1. I see the following when trying to run esg_usage_parser:

[root@vm-125-67 sbin]# ./esg_usage_parser DBI connect('dbname=USERNAME;host=HOSTNAME','DBNAME',...) failed: could not connect to server: Connection refused Is the server running on host "localhost" and accepting TCP/IP connections on port 5432? at esg_usage_parser line 154

Answer : Make sure your postgres database is running on the default port of 5432, or update your configuration settings to the correct port.

  1. I see the following when trying to run esg_usage_parser:

Can't open /etc/esg_usage_parser.conf : No such file or directory at esg_usage_parser line 29.

Answer : Either create the config file at /etc/esg_usage_parser.conf (the default location), or set the environment variable ESG_USAGE_PARSER_CONF to a suitable value before running the program.

For example:

ESG_USAGE_PARSER_CONF=/tmp/config ./esg_usage_parser
# OR
export ESG_USAGE_PARSER_CONF=/tmp/config
./esg_usage_parser
  1. I see the following when trying to run esg_usage_parser:

DBI connect('dbname=dbname;host=dbhost;port=5432','dbuser',...) failed: could not translate host name "dbhost" to address: Name or service not known at ./esg_usage_parser line NNN

Answer : Make sure the configuration file contains the proper database information (host, port, user, pass).