HTTP - UlricE/pen GitHub Wiki

pen -l pen.log -p pen.pid lbhost:80 host1:80 host2:80

If pen is running on one of the web servers, it might seem like a good idea to simply use an alternative port for the web server process, reusing the IP address. Unfortunately, that doesn't work very well. Look at this (simplified) example:

sh-2.05# pen lbhost:80 lbhost:8080
sh-2.05# telnet lbhost 80
Trying 127.0.0.1...
Connected to lbhost.
Escape character is '^]'.
GET /bb
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://lbhost:8080/bb/">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.14 Server at lbhost Port 8080</ADDRESS>
</BODY></HTML>
Connection closed by foreign host.

This will cause the client to attempt to contact the web server directly, which may not be possible depending on firewall configuration and is certainly not desirable since it defeats any load balancing attempts from pen.

The solution is to bind two addresses to the server running pen, and use one address for pen and the other for the web server. Like this:

pen address1:80 address2:80 server2:80

Here, address1 and address2 refer to the same server, while server2 refers to another server.

The programs penlog and penlogd are used to combine the web server logs into a single file which can be used to calculate statistics. Penlog runs on each of the web servers. It reads log entries from stdin and sends them over the network to the host running penlogd. For Apache, this is accomplished by adding a line similar to this to httpd.conf:

CustomLog "|/usr/local/bin/penlog loghost 10000" common

For other web servers, the procedure is different. If the server cannot write its logs to a pipe, this kludge may actually work:

tail -f /path/to/logfile | penlogd loghost 10000

The command line to pen must also be altered to indicate that the logs should go to the penlogd server rather than a file. This is accomplished using the

-l loghost:10000

option.

The log file pen.log is used to combine the web server logs into a single file which can be used to calculate statistics. Example:

mergelogs -p pen.log \
    10.0.0.1:access_log.host1 10.0.0.2:access_log.host2 \
    > access_log

The program mergelogs is distributed with pen. Use matching versions of pen and mergelogs to ensure that the log file format is compatible. 10.0.0.1 and 10.0.0.2 are the IP addresses of host1 and host2. The log files access_log.host1 and access_log.host2 are Apache access log files in combined or common format. The resulting access_log is in the same format as the input files.

If the log file will be used to calculate visitor statistics, you probably want host names rather than IP addresses. This can be accomplished by forcing the web server to do hostname lookups on the clients. This harms performance since the lookups are slow.

A better solution is to use a separate program to process the log file. One such program is webresolve, which is usually run from the splitwr script to perform many lookups in parallel. Example:

splitwr access_log > access_log.resolved

Webresolve is available from http://siag.nu/webresolve/.

⚠️ **GitHub.com Fallback** ⚠️