Tipimaid

What is Tipimaid?

Tipimaid is a collection of tools to ease the handling of Apache logs. For example you can split and rotate logs according to Apache's virtual hosts and have an arbitrary script executed when a file is rotated (e.g. gzip). In addition there are components of Tipimaid which allow logging over a network to one single "logging computer" (i.e. all logs of all Apaches are on one computer where they are processed by tipimaid). During all this time tipimaid does its best to keep the logfile chronologically sorted (and you may set what tipimaid's "best" is!)

Why should I want to use it?

The Tipimaid collection comprises four components which are: tipimaid itself (rotates/splits logfiles, executes stuff when a file is rotated), tipimaid_sender, tipimaid_server and tipimaid_mergelogs. The latter three are used if you want to send logs across a network to one logging computer. In this case the Apache servers would employ tipimaid_sender as a piped logger (tipimaid_sender takes data from stdin and sends it over the network). On the other machine we would have tipimaid_server listening for incoming connections (tipimaid_server accepts socket connections and outputs the transmitted lines on stdout. So, if you pipe tipimaid_server's output to tipimaid this is most certainly what you want, but if you don't trust these tools you might want to pipe it through tee before...)

You didn't explain tipimaid_mergelogs!

Patience, my young padawan...

What is tipimaid able to do and how can I use it?

Tipimaid itself can simply be used as a piped logger in Apache if you have apache's %t in your logfile (i.e. dates/times are in this format: [10/Oct/2000:13:55:36 -0700]) which should be the case for almost all apaches out there. If you use virtual hosts in your Apache (i.e. you host multiple domains in one Apache), then please make sure that the virtual-hosts-column (%v) in the logfile is the first column (default in vhost_combined logging format). This said, let's assume you want the logs of each domain in their own directory and you want one log file per day. In this case you would use tipimaid with this filename pattern as its first parameter: /var/log/apache2/%v/%Y-%m-%d_access.log. You may use all directives listed at the bottom of http://docs.python.org/library/datetime.html#datetime.datetime. This pattern would create a subdirectory for each virtual host under /var/log/apache2 and the names of the logfiles would look like 2009-02-16_access.log. Logfiles are rotated automatically when a new log entry arrives for a new day. If you, however, should use something like /var/log/apache2/%v/%H-%M_access.log (%H is hours in 24-hour-format, %M is minutes) you would get a new logfile every minute. It's up to you.

As tipimaid is written in Python you need Python >= 2.3.

What about the other options?

-z, --continuous-gzip
This option creates continuously gzipped logfiles. This means that the logfiles are not written in plaintext and gzipped after writing to them has been finished but instead a gzipped file is created in which the lines are written. As gzip needs a checksum at the end of the file (which, of course, is not there when the file is not finished yet) you might have problems if you want to use zless/zcat on a file which has not been finished writing to. But if you are really low on hard-disk-space, this is a good option ;-) If you use this option you have to set the compression level from 1 (fast, big files) to 9 (slow, maximum compression).
-u, --utcrotate
This one is easy: Normally tipimaid takes the local time to determine whether it should rotate a logfile. if you specify --utcrotate, tipimaid takes UTC time for this.
-s, --symlink
As directories full of logfiles may sometimes be a bit cluttered, this option instructs tipimaid to create symlinks pointing at the most recent logfile (if you split according to virtual hosts, you get one symlink per virtual host; if you don't you get only one symlink). To enable this you need to specify a symlinkpattern which is quite similar to the filename-pattern except that only %v as the only directive is allowed here. So if you give -s /var/log/apache2/symlinks/%v-access.log you would have all symlinks in one handy directory while -s /var/log/apache2/%v/access.log would create many symlinks (all called access.log) in a directory for their virtual host.
-x, --execute
With this option you may specify a script or any kind of executable which is executed with the full absolute path of an "old" logfile as its first and only argument after it is rotated. Please note that you may only specify the executable here (so, e.g. "gzip" would be OK) but not an executable with parameters (so e.g. "gzip -9" wouldn't be OK). But you can always write a bash script which executes "gzip -9 [argument]". Thus you are free to do anything which can be done in a bash script, e.g. you can process your log with webalizer or awstats and gzip it afterwards. Or you choose to have your logfile read to you by your favorite festival-frontend... or you simply delete your logfile or...
-t, --threads
This option specifies the number of simultaneous threads for the execution of tasks given by -x (You don't want to have 200 gzip-processes at the same time, do you?) Default is 3 threads.
-b, --buffertime
specifies the buffertime, a feature which is only relevant if you use tipimaid for logs which have been sent over a network.

What is tipimaid_sender?

tipimaid_sender takes log lines from stdin and sends them across a network. You have to specify the ip/hostname and the port of the target computer. The best way to use tipimaid_sender is as Apache's piped logger. To do so, you would have to add a line like this to your Apache config file:

CustomLog "|/path/to/tipimaid_sender.py 10.10.10.1 40000" vhost_combined

which would send the logs to 10.10.10.1, port 40000 in vhost_combined logging format (in this format, the virtual host is written into the first column - just where tipimaid will expect it). Furthermore you have to set values for the -b/--buffertime and -p/--backuppath options. "Buffertime" is, once again, a duration in seconds (more about this later) and the backuppath is a directory where local recovery logs can be stored. tipimaid_sender tests whether it can access this directory and if it has appropriate rights for it.

If bad things should happen, e.g. a connection loss, so that the "sender" cannot talk to its server anymore, there are several features which have been added just for this case. In tipimaid, the -b/--buffertime option specifies the time in seconds in which loglines are still held in memory before they are written to the disk.

Please set the buffertime value of tipimaid_sender to exactly the same as the value for buffertime in tipimaid itself!

In addition all lines which are buffered like this, are also sorted. So, if you have two Apaches serving the same virtual host and both of them send their logs to one logging computer you don't have to fear latency if you set buffertime to, say, a minute. If computer1 sends loglines for 10:30:10, 10:30:14 and 10:30:18 and computer2 has loglines for 10:30:12 and 10:30:13 but for some reason these two hits arrive later (but not later than 60 seconds) this is no problem at all, they are simply written away, neatly sorted after they have been in the buffer for 60 seconds. But back to the bad things. Of course you can set buffertime to 10 or even 30 minutes. If the connection between the Apache computer and the logging computer dies, tipimaid_sender buffers these loglines in memory. It then periodically checks whether it can establish a connection to the logging computer. If this is successful and the oldest buffered loglines (buffered by tipimaid_sender) are not older than "buffertime" then they are simply sent to the logging computer because it is no problem to sort them into the buffer of the logging computer and everything is fine. If the duration of the connection loss is, however, longer than buffertime, the loglines which are too old are written to local recovery log ("local" means the Apache computer here). To keep you notified, tipimaid_sender will tell you about this step in Apache's error log. In any case, tipimaid_sender tries to re-establish the connection to its server and will send any new loglines.

If the really bad stuff happened (you had a connection loss which was longer than buffertime), you have two logs now: One processed logfile at the logging computer and a recovery log at the Apache computer. Both of them are sorted, thus we can use tipimaid_mergelogs which, well, merges (already sorted) logs to one log. If you use virtual hosts, you might want to pipe the recovery log through tipimaid, first (to split entries) and use tipimaid_mergelogs afterwards.

That was too much text - what were tipimaid_sender's options?

-b , --buffertime
This option tells tipimaid_sender what the buffertime of its tipimaid (running at the logging server) is.
-p / --backuppath
You have to name a directory as a backup-directory. If really bad things happen, then tipimaid_sender will start to save its log locally in this directory. tipimaid_sender will try to write something to this directory right after it has been started, so make sure that this directory exists and is accessible by the apache-user.

Does tipimaid_sender encrypt its data?

No. (But if you really want to use tipimaid outside of your intranet, you might want to establish a ssh tunnel...)

What is tipimaid_server?

tipimaid_server listens on a port and redirects everything it receives to stdout. To start tipimaid_server you have to specify a port number, for example by running

/path/to/tipimaid_server.py 40000

Are there options

Yes:

-n, --netcat-compatible
With this switch you may put tipimaid_server into the netcat-compatible mode. If you don't want to use tipimaid_sender (maybe you don't trust me, maybe you don't want to look at tipimaid_sender's source code, maybe you don't have python installed) Netcat (http://netcat.sourceforge.net/) may be used instead of tipimaid_sender. If you use netcat, however, tipimaid_server must know about it (i.e. you have to set -n).

Of course, netcat is stupid as far as logs are concerned - it won't try to resend logs after a connection loss or buffer them locally. So, actually you shouldn't want to use it.

How do I use tipimaid_server with tipimaid?

You simply pipe the server's output to tipimaid, like

/path/to/tipimaid_server.py 40000 | /path/to/tipimaid.py [tipimaid's options...]

tipimaid_mergelogs

Let's say you used tipimaid to send logs over a network and really bad things happened (tipimaid_sender lost the connection to tipimaid for more than buffertime seconds, i.e. even if tipimaid_sender could establish the connection again, there are some log lines which were written to a local (local at the Apache computer) recovery file and were not sent over the network).

Of course you want to have one complete log file and that's why we have tipimaid_mergelogs. Unfortunately, you have to wait until a new logfile is started (i.e. if you have daily rotation of logfiles you have to wait until tomorrow). Then you copy the finished logfile which has been written by tipimaid and the recovery file(s) written by tipimaid_sender to one directory and start

/path/to/tipimaid_mergelogs.py [tipimaid_logfile] [recovery_file_1] [recovery_file_2] ... > my_new_logfile.log

and that's it.

How would I use tipimaid on one computer for local splitting/rotating/doing_of_things (no sending of logs across a network)?

Assume we have one apache which serves several virtual hosts. We want all logs to be in

/var/log/apache2/logs/[domain]

and logfiles should have names like

2009-02-18_access.log

and they should be gzipped after they were rotated.

The place to go is Apache's log configuration file. We choose a LogFormat like vhost_combined which has %v in its first column (this format is defined as

LogFormat "%v %h %l %u %t \"%r\" %>s %b \
\"%{Referer}i\" \"%{User-Agent}i\""                     vhost_combined

in apache's mod_log_config.conf)

Then, according to http://httpd.apache.org/docs/2.2/logs.html#piped, you specify tipimaid as your piped logger (You have copied tipimaid.py to /usr/local/bin and did a chmod +x tipimaid.py) by writing

CustomLog "|/usr/local/bin/tipimaid.py /var/log/apache2/logs/%v/%Y-%m-%d_access.log -x gzip" vhost_combined

(Re)start your apache and there you go...

How would I use tipimaid on a serverfarm for splitting/rotating/doing_of_things?

So, let's say we have 3 computers, apache1, apache2 and logger. apache1 and apache2 host the same domains, maybe because they are behind a load balancer and logger has the IP 10.10.10.1.

Setting up logger

We want to send our logs across a network, so we need tipimaid_server to receive the logs. Furthermore we need tipimaid itself for splitting/rotating/doing_of_things. First we pick a port, like port 40000. This port should be free and you should check whether it is free by looking at the output of netstat -a | grep 40000.

Then we have to think about tipimaid's parameters. Let's say we want a 10 minute buffer (i.e. -b 600), want our logs named according to /var/log/apache2/logs/%v/%Y-%m-%d_access.log and wish to have our logs gzipped by two parallel threads (i.e. -x gzip -t 2).

So our tipimaid commandline would look like

/path/to/tipimaid.py /var/log/apache2/logs/%v/%Y-%m-%d_access.log -b 600 -x gzip -t 2

and our tipimaid_server commandline is simply

/path/to/tipimaid_server.py 40000

As tipimaid_server outputs the logfile-lines it receives on stdout, we can simply pipe its output to tipimaid, so for a short test-setup we would write

/path/to/tipimaid_server.py 40000 | /path/to/tipimaid.py /var/log/apache2/logs/%v/%Y-%m-%d_access.log -b 600 -x gzip -t 2

However, if you want to run tipimaid properly and you think that nohup is just ugly, you might want to think about using supervisord (http://supervisord.org/) which allows you to monitor your background services and supports nicer logging and automatic restarts if something goes wrong.

Setting up apache1 and apache2

With tipimaid_server running, we can configure the senders.

As we already know that we have to talk to port 40000 on logger and have a buffertime of 10 minutes (600 seconds), the only thing we have to think about is a local (local at apache1 and apache2) directory where tipimaid_server may write temporary recovery files. So, let's say that /var/log/recovery is such a directory.

Then our command for tipimaid_sender would be

/path/to/tipimaid_sender.py 10.10.10.1 40000 -b 600 -p /var/log/recovery

and as we want to use Apache's piped logging, we have to write this line to Apache's .conf file:

CustomLog "|/path/to/tipimaid_sender.py 10.10.10.1 40000 -b 600 -p /var/log/recovery" vhost_combined

After this, reload your Apache config and after 10 minutes you should see growing logfiles at logger.