Table of Contents:
|
The
Writing Apache Modules with Perl and C
book can be purchased online from O'Reilly
and
Amazon.com.
|
|
Your corrections of the technical and grammatical
errors are very welcome. You are encouraged to help me
improve this guide. If you have something to contribute
please send it
directly to me.
|
Let's talk first about things that bother most web (and non-web) programmers. The bothering things are warning and errors reported by Perl. We are going to learn how to take the best out of both, by turning this obvious to the newbie programmer enemies into our best friends.
[ TOC ]
You have just installed this new CGI script and when you try it out you see the grey screen of death saying ``Internal Server Error''... Or even worse you have a script running on a production server for a long time without problems, when the same grey screen starts to show up occasionally for no apparent reason.
How can we find out what the problem is?
First problem:
You have been coding in Perl for years, and whenever an error occurred in the past it was displayed in the same terminal window that you started the script from. But when you work with a webserver there is no terminal to show you the errors, since the server in most cases has no terminal to send the error messages to.
Actually, the error messages don't disappear, they end up in the
error_log file. It is located in the directory specified by the
ErrorLog directive in httpd.conf. The default setting is generally:
ErrorLog /usr/local/apache/logs/error_log |
So whenever you see "Internal Server Error" it's time to look at this file.
First problem solved!
There are cases when errors don't go to the error_log file. For example, some errors go to the httpd process' STDERR. If you haven't redirected httpd's STDERR then the messages are printed to the console (tty, terminal) from which you executed the httpd. This happens when the server didn't get as far as opening the error_log file for writing before it needed to write an error message.
For example, if you have entered a non-existent directory path in your
ErrorLog directive, the error message will be printed to STDERR. If the error
happens when the server executes a PerlRequire or
PerlModule directive you might also see output sent to STDERR.
You are probably wondering where all the errors go when you are running the
server in single process mode (httpd -X). They go to STDERR. This is because the error logging for all the httpd
children is normally done by the parent httpd. When httpd runs in single
process mode, it has no parent httpd process to perform all the logging.
The output to the terminal includes all the status messages that normally
go to the error_log file.
Finally with a PerlLogHandler you can take away from Apache its control of the error logging process for
all HTTP transactions. If you do this, then you are responsible for
generating and storing the error messages. You can do whatever you like
with the information, (including throwing it away -- don't do it!) and,
depending on how you implement you LogHandler, the ErrorLog directive may have no effect. But you can also do something at this handler
and then return
DECLINED status, so the default Apache LogHandler will do the work as usual.
Second problem:
The usefulness of the error message depends to some extent on the programmer's coding style. An uninformative message might not help you to spot and fix the error.
For example, let's take a function which opens a file passed to it as a parameter. It does nothing else with the file. Here's our first version of the code:
my $r = shift;
$r->send_http_header('text/plain');
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die;
}
open_file("/tmp/test.txt");
|
Let's assume that /tmp/test.txt doesn't exist so the open() will fail to open the file. When
we call this script from our browser, the browser returns an "internal error" message and we see the following error appended to error_log:
Died at /home/httpd/perl/test.pl line 9. |
We can use the hint Perl kindly gave to us to find where in the code the
die() was called. However, we still don't know what filename
was passed to this subroutine to cause the program termination.
If we have only one function call as in the example above, the task of
finding the problematic filename will be trivial. Now let's add two more
open_file() function calls and assume that among the three
files only /tmp/test2.txt exists:
open_file("/tmp/test.txt");
open_file("/tmp/test2.txt");
open_file("/tmp/test3.txt");
|
When you execute the above call, you will see the same error message twice:
Died at /home/httpd/perl/test.pl line 9. Died at /home/httpd/perl/test.pl line 9. |
Based on this error message, can you tell what files your program failed to
open? Probably not. Let's fix it by passing the name of the file to
die():
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die "failed to open $filename";
}
open_file("/tmp/test.txt");
|
When we execute the above code, we see:
failed to open /tmp/test.txt at /home/httpd/perl/test.pl line 9. |
which makes a big difference.
By the way, if you append a newline to the end of the message you pass to
die(), Perl won't report the line number the error has
happened at, so if you code:
open FILE, $filename or die "failed to open a file\n"; |
The error message will be:
failed to open a file |
Which gives you very little to go on. It's very hard to debug with such uninformative error messages.
The warn() function, a kinder sister of die(),
which logs the message but doesn't cause program termination, behaves in
the same way. If you add a newline to the end of the message, the line
number warn() was called at won't be logged, otherwise it
will.
You might want to use warn() instead of die() if
the failure isn't critical. Consider the following code:
if(open FILE, $filename){
# do something with file
} else {
warn "failed to open $filename";
}
# more code here...
|
Now we've improved our code, by reporting the names of the problematic
files, but we still don't know the reason for the failure. Let's try to
improve the warn() example. The -r operator tests whether the file is readable:
if(-r $filename){
open FILE, $filename;
# do something with file
} else {
warn "Couldn't open $filename - doesn't exist or is not readable";
}
|
Now if we cannot read the file we do not even try to open it. But we still see a warning in error_log:
Couldn't open /tmp/test.txt - doesn't exist or is not readable at /home/httpd/perl/test.pl line 9. |
The warning tells us the reason for the failure, so we don't have to go to the code and check what it was trying to do with the file.
It could be quite a coding overhead to explain all the possible failure
reasons that way, but why reinvent the wheel? We already have the reason
for the failure stored in the $! variable. Let's go back to the open_file() function:
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die "failed to open $filename: $!";
}
open_file("/tmp/test.txt");
|
This time, if open() fails we see:
failed to open /tmp/test.txt: No such file or directory at /home/httpd/perl/test.pl line 9. |
Now we have all the information we need to debug these problems: we know
what line of code triggered die(), we know what file we were
trying to open, and last but not least we know the reason, given to us
through Perl's $! variable.
Now let's create the file /tmp/test.txt.
% touch /tmp/test.txt |
When we execute the latest version of the code, we see:
failed to open /tmp/test.txt: Permission denied at /home/httpd/perl/test.pl line 9. |
Here we see a different reason: we created a file that doesn't belong to the user which the server runs as (usually nobody). It does not have permission to read the file.
Now you can see that it's much easier to debug your code if you validate
the return values of the system calls, and properly code arguments to
die() and warn() calls. The open()
function is just one of the many system calls perl provides to your
convenience.
So now you can code and debug CGI scripts and modules as easily as if they were plain Perl scripts that you execute from a shell.
Second problem solved!
[ TOC ]
It's a good idea to keep it open all the time in a dedicated terminal with the help of tail -f or less -S, whichever you prefer (the latter allows you to page around the file, search etc.)
% tail -f /usr/local/apache/logs/error_log |
or
% less -S /usr/local/apache/logs/error_log |
So you will see all the errors and warning as they happen.
Another tip is to create a shell alias, to make it easier to execute the above command. In tcsh you would do something like this:
% alias err "tail -f /usr/local/apache/logs/error_log" |
For bash users the command is:
% alias err='tail -f /var/log/apache/error.log' |
and from now on in the shell you set the alias in, executing
% err |
will call tail -f /usr/local/apache/logs/error_log. Since you want this alias to be available to you all the time, you should put it into your .tcshrc file or its equivalent. For bash users this is .bashrc, or you can put it in /etc/profile for use by all users.
If you cannot access your error_log file because you are unable to telnet to your machine (generally the case with some ISPs who provide user CGI support but no telnet access), you might want to use a CGI script I wrote to fetch the latest lines from the file (with a bonus of colored output for easier reading). You might need to ask your ISP to install this script for general use. See Watching the error_log file without telneting to the server .
[ TOC ]
Just like errors, Perl's mandatory warnings go to the error_log file, if the they are enabled. Of course you have enabled them in your development server, haven't you?
The code you write lives a dual life. In the first life it's being written, tested, debugged, improved, tested, debugged, rewritten, tested, debugged. In the second life it's just used.
A significant part of the script's first life is spent on the developer's machine. The other part is spent on the production server where the creature is supposed to be perfect.
So when you develop the code you want all the help in the world to help you spot possible problems, and that's where enabling warnings is a must. Whenever you see an error or warning in the error_log, you want to get rid of it. That's very important.
Why?
If there are warnings, your code is not clean. If they are waved away, expect them to come back on the production server in the form of errors, when it's too late.
If each invocation of a script generates more than about five lines of warnings, it will be very hard to catch real problems. You just can't see them among all the other warnings which you used to think were unimportant.
On the other hand, on a production server, you really want to turn warnings off. And there are good reasons for that:
There is no added value in having the same warning showing up, again and again, triggered by thousands of script invocations. If your code isn't very clean and generates even a single warning per script invocation, on the heavily loaded server you will end up with a huge error_log file in a short time.
The warning elimination phase is supposed to be a part of the development process, and should be done before the code goes live.
In any Perl script, not just under mod_perl, enabling runtime warnings has a performance impact.
mod_perl gives you a very simple solution to this warnings saga, don't enable warnings in the scripts unless you really have to. Let mod_perl control this mode globally. All you need to do is put the directive
PerlWarn On |
in httpd.conf on your development machine and the directive
PerlWarn Off |
on the live box.
If there is a piece of code that generates warnings and you want to disable
them only in this code, you can do that too. The Perl special variable $^W allows you dynamically to turn on and off warnings mode. So just put the
code into a block, and disable the warnings in the scope of this block. The
original value of $^W will be restored upon exit from the block.
{
local $^W=0;
# some code that generates innocuous warnings
}
|
Unless you have a really good reason, for your own sake the advice is avoid this technique.
Don't forget the local() operand! If you do, setting $^W will affect all the requests handled by the Apache child that changed this variable. And
for all the scripts it executes, not just the one which changed $^W!
The diagnostics pragma can shed more light on errors and warnings, as you will see in a
moment.
[ TOC ]
This module extends the terse diagnostics normally emitted by both the Perl
compiler and the Perl interpreter, augmenting them with the more verbose
and endearing descriptions found in the perldiag manpage. Like the other pragmata, it affects the compilation phase of your
scripts as well as the execution phase.
To use in your program as a pragma, merely invoke
use diagnostics; |
at or near the start of your program. This also turns on -w mode.
This pragma is especially useful when you are new to perl, and want a better explanation of the errors and warnings. It's also helpful when you encounter some warning you've never seen before, e.g. when a new warning has been introduced in an upgraded version of Perl.
You may not want to leave diagnostics mode On for your production server. For each warning, diagnostics mode generates ten times more output than warnings mode. If your code
generates warnings, with the diagnostics pragma you will use disk space much faster.
diagnostics mode adds a large performance overhead in comparison with just having
warnings mode On. You can see the benchmark results in the section 'Code Profiling Techniques'.
[ TOC ]
Sometimes a httpd process might hang in the middle of processing a request, either because there is a bug in your code (e.g. the code is stuck in a while loop), it gets blocked by some system call or because of a resource deadlock) or for some other reason. In order to fix the problem we need to learn what circumstances the process hangs in (detection), so we can reproduce the problem and after than to discover why there is problem (diagnostics).
[ TOC ]
Sometimes you can find a process hanging because of some kind of the system
problem. For example if the processes was doing some disk IO operation it
might get stuck in uninterruptible sleep ('D' disk wait in ps(1) report, 'U' in top(1)) which indicates that either something is broken in
your kernel or that you're using NFS. Or and you cannot kill -9 this process.
Another process that cannot be killed with kill -9 is a zombie process ('Z' disk wait in ps(1) report, <defunc> in top(1)), in which case the process is already dead and
Apache didn't wait on it properly.
In the case of disk wait you can actually get the wait channel from ps(1) and look it up in your kernel symbol table
to find out what resource it was waiting on. It might point the way to what
component of the system was misbehaving if the problem occurred frequently.
[ TOC ]
Deadlock is the situation where, for example, two processes, say X and Y, need two resources, A and B to continue. X holds onto A and Y holds onto B. There is no possibility for Y to continue before X releases A. But X cannot release A before it gets Y.
Look at the following example. Your process has to gain a lock on some
resource (e.g. a file) before it continues. So it makes an attempt, and if
that fails it sleep()s for a second and increments a counter:
until(gain_lock()){
$tries++;
sleep 1;
}
|
Because there are many processes competing for this resource, or perhaps
because there is a deadlock, gain_lock() always fails. The
process is hung.
Another situation that you may very often encounter is exclusive lock starvation. Generally there are two lock types in use: SHARED locks, which allow many processes to perform READ operations simultaneously, and EXCLUSIVE locks. The latter permits access only by a single process and so makes a safe WRITE operation possible.
You can lock any kind of resource, although in our examples we will talk about files.
If there is a READ lock request, it is granted as soon as the file becomes unlocked or immediately if it is already READ locked. The lock status becomes READ on success.
If there is a WRITE lock request, it is granted as soon as the file becomes unlocked. Lock status becomes WRITE on success.
Normally it is the WRITE lock request which is the most important. If the file is being READ locked, a process that requests to write will poll until there are no reading or writing process left. However, lots of processes can successfully read the file, since they do not block each other from doing so. This means that a process that wants to write to the file (first obtaining an exclusive lock) never gets a chance to squeeze in. The following diagram represents a possible scenario where everybody can read but no one can write:
[-p1-] [--p1--]
[--p2--]
[---------p3---------]
[------p4-----]
[--p5--] [----p5----]
|
Let's look at some real code and see it in action. The following script
imports flock() related parameters from the Fcntl module, and opens a file that will be locked. It then defines and sets two
variables: $lock_type and $lock_type_verbose. These are set to
LOCK_EX and EX respectively if the first command line argument ($ARGV[0]) is defined and equal to <EM>w</EM>. This indicates that this
process will try to gain a <EM>WRITE</EM> (exclusive) lock. Otherwise the
two are set to <CODE>LOCK_SH</CODE> and <SH for a SHARED (read) lock.
Once the variables are set, we enter the infinite while(1) loop that attempts to lock the file by the mode set in $lock_type. It report success and the type of lock that was gained, then it sleeps
for a random period between 0 and 9 seconds and unlocks the file. The loop
then starts from the beginning.
lock.pl
-------------------
#!/usr/bin/perl -w
use Fcntl qw(:flock);
$lock = "/tmp/lock";
open LOCK, ">$lock" or die "Cannot open $lock for writing: $!";
my $lock_type = LOCK_SH;
my $lock_type_verbose = 'SH';
if (defined $ARGV[0] and $ARGV[0] eq 'w'){
$lock_type = LOCK_EX;
$lock_type_verbose = 'EX';
}
while(1){
flock LOCK,$lock_type;
# start of critical section
print "$$: $lock_type_verbose\n";
sleep int(rand(10));
# end of critical section
flock LOCK, LOCK_UN;
}
close LOCK;
|
It's very easy to see WRITE process starvation if you spawn a few of the above scripts simultaneously. Start the first few as READ processes and then start one WRITE process like this:
% ./lock.pl r & ; ./lock.pl r & ; ./lock.pl r & ; ./lock.pl w & |
You see something like:
24233: SH 24232: SH 24232: SH 24233: SH 24232: SH 24233: SH 24231: SH 24231: SH 24231: SH |
and not a single EX line... When you kill off the reading processes, then the write process
will gain its lock. Note that as this is a rough example, I used the
sleep() function. To simulate a real situation you need to use
the Time::HiRes module, which allows you to choose more precise intervals to sleep.
The interval between lock and unlock is called a Critical Section, which should be kept as short as possible (in terms of the time taken to execute the code, and not in terms of the number of lines of code). As you just saw, a single sleep statement can make the critical section long.
To summarize, if you have a script that uses both READ and WRITE locks and the critical section isn't very short, the writing process might be starved. After a while a browser that initiated this request will timeout the connection and abort the request, but it's much more likely that user will press the Stop or Reload button before that happens. Since the process in question is just waiting, there is no way for Apache to know that the request was aborted. It will hang until the lock is gained. Only when a write to a client's broken connection is attempted will Apache terminate the script.
[ TOC ]
It's not so easy to detect hanging processes. There is no way you can tell
how long the request is taking to process by using plain system utilities
like ps() and top(). The reason is that each
Apache process serves many requests without quitting. System utilities can
tell how long the process has been running since its creation, but this
information is useless in our case, since Apache processes normally run for
extended periods.
However there are a few approaches that can help to detect a hanging process.
If the process hangs and demands lots of resources it's quite easy to spot
it by using the top() utility. You will see the same process
show up in the first few lines of the automatically refreshed report. But
often the hanging process uses few resources, e.g. when waiting for some
event to happen.
Another easy case is when some process thrashes the error_log, writing millions of error messages there. Generally this process uses
lots of resources and is also easily spotted by using top().
There are other tools that report the status of Apache processes.
Both tools provide counters of processed requests per Apache process.
You can watch the report for a few minutes, and try to spot any process which has the same number of processed requests while its status is 'W' (waiting). This means that it has hung.
But if you have fifty processes, it can be quite hard to spot such a process. Apache::Watchdog::RunAway is a hanging processes monitor and terminator that implements this feature and should be used to solve this kind of problem.
If you've got a real problem, and the processes hang one after the other,
the time will come when the number of hanging processes is equal to the
value of MaxClients. This means that no more processes will be spawned. As far as the users
are concerned your server is down. It is easy to detect this situation,
attempt to resolve it and notify the administrator using a simple crontab
watchdog that requests some very light script periodically. (See
Monitoring the Server. A watchdog.)
In the watchdog you set a timeout appropriate for your service, which may be anything from a few seconds to a few minutes. If the server fails to respond before the timeout expires, the watchdog has spotted trouble and attempts to restart the server. After a restart an email report is sent to the administrator saying that there was a problem and whether or not the restart was successful.
If you get such reports constantly something is wrong with your web service and you should revise your code. Note that it's possible that your server is being overloaded by more requests that it can handle, so the requests are being queued and not processed for a while, which triggers the watchdog's alarm. If this is a case you may need to add more servers or more memory, or perhaps split your single machine across a cluster of machines.
[ TOC ]
Given the process id (PID), there are three ways to find out where the server is hanging.
Deploying the Perl calls tracing mechanism. This will allow to spot the location of the Perl code that has triggered the problem.
Using the system calls tracing utilities, like strace(1) or
truss(1). This approach reveals low level details about a
potential misbehavior of some part of the system.
Using an interactive debugger, like gdb(1). When the process
is stuck, and you don't know what it was doing just before it has got
stuck, with gdb you can attach to this process and print its calls stack,
to reveal where the last call was made from. Just like with strace or truss
you see the system call trace and not the Perl calls.
[ TOC ]
To see where an httpd is ``spinning'', try adding this to your script or a startup file:
use Carp ();
$SIG{'USR2'} = sub {
Carp::confess("caught SIGUSR2!");
};
|
The above code asigns a signal handler for the USR2 signal. This signal has been chosen because it's least likely to be used by
the other parts of the server.
We check the registered signal handlers with help of Apache::Status. What we see at http://localhost/perl-status?sig is :
USR2 = \&MyStartUp::__ANON__ |
MyStartUp is the name of the package I've used in mine
startup.pl.
After applying this server configuration, let's use this simple code
example, where sleep(10000) will emulate a hanging process:
debug/perl_trace.pl
-------------------
$|=1;
print "Content-type:text/plain\r\n\r\n";
print "[$$] Going to sleep\n";
hanging_sub();
sub hanging_sub {sleep 10000;}
|
We execute the above script as
http://localhost/perl/debug/perl_trace.pl, we have used $|=1;
and printed the PID with $$ to learn what process ID we want to work with.
No we issue the command line, using the PID we have just saw being printed to the browser's window:
% kill -USR2 PID |
And watch this showing up at the error_log file:
caught SIGUSR2!
at /home/httpd/perl/startup/startup.pl line 32
MyStartUp::__ANON__('USR2') called
at /home/httpd/perl/debug/perl_trace.pl line 5
Apache::ROOT::perl::debug::perl_trace_2epl::hanging_sub() called
at /home/httpd/perl/debug/perl_trace.pl line 4
Apache::ROOT::perl::debug::perl_trace_2epl::handler('Apache=SCALAR(0x8309d08)')
called
at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm
line 140
eval {...} called
at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm
line 140
Apache::Registry::handler('Apache=SCALAR(0x8309d08)') called
at PerlHandler subroutine `Apache::Registry::handler' line 0
eval {...} called
at PerlHandler subroutine `Apache::Registry::handler' line 0
|
We can clearly see that the process ``hangs'' in the code executed at line
5 of the /home/httpd/perl/debug/perl_trace.pl script, and it was called by the hanging_sub() routine defined
at line 4.
[ TOC ]
Depending on the operating system you should have one of the truss
or strace utilities available. The usage is simple:
% truss -p PID |
or
% strace -p PID |
Replace PID with the process number you want to check on.
Let's write a program that hangs, and deploy strace to find the point it hangs at:
hangme.pl
---------
$|=1;
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
while(1){
$i++;
sleep 1;
}
|
The reason this simple code hangs is obvious. It never breaks from the while loop. As you have noticed, it prints the PID of the current process to the browser. Of course in a real situation you cannot use the same trick. In the previous section I have presented a few ways to detect the runaway processes and their PIDs.
I save the above code in a file and execute it from the browser. Note that
I've made STDOUT unbuffered with $|=1; so I will immediately see the process ID. Once the script is requested, the
script prints the process PID and obviously hangs. So we press the 'Stop'
button, but the process continues to hang in this code. Isn't apache
supposed to detect the broken connection and abort the request? Yes and No,
you will understand soon what's really happening.
First let's attach to the process and see what it's doing. I use the PID the script printed to the browser, which is 10045 in this case:
% strace -p 10045 [...truncated identical output...] SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0 SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0 SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0 nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0 time([940973834]) = 940973834 time([940973834]) = 940973834 [...truncated the identical output...] |
It isn't what we expected to see, is it? These are some system calls we don't see in our little example. What we actually see is how Perl translates our code into system calls. Since we know that our code hangs in this snippet:
while(1){
$i++;
sleep 1;
}
|
We "easily" figure out that the first three system calls implement the $i++, while the other other three are responsible for the
sleep 1 call.
Generally the situation is the reverse of our example. You detect the hanging process, you attach to it and watch the trace of calls it does (or the last few commands if the process is hanging waiting for something, e.g. when blocking on a file lock request). From watching the trace you figure out what it's actually doing, and probably find the corresponding lines in your Perl code. For example let's see how one process "hangs" while requesting an exclusive lock on a file exclusively locked by another process:
excl_lock.pl
---------
use Fcntl qw(:flock);
use Symbol;
if ( fork() ) {
my $fh = gensym;
open $fh, ">/tmp/lock" or die "cannot open /tmp/lock $!";
print "$$: I'm going to obtain the lock\n";
flock $fh, LOCK_EX;
print "$$: I've got the lock\n";
sleep 20;
close $fh;
} else {
my $fh = gensym;
open $fh, ">/tmp/lock" or die "cannot open /tmp/lock $!";
print "$$: I'm going to obtain the lock\n";
flock $fh, LOCK_EX;
print "$$: I've got the lock\n";
sleep 20;
close $fh;
}
|
The code is simple. The process executing the code forks a second process, and both do the same thing: generate a unique symbol to be used as a file handler, open the lock file for writing using the generated symbol, lock the file in exclusive mode, sleep for 20 seconds (pretending to do some lengthy operation) and close the lock file, which also unlocks the file.
The gensym function is imported from the Symbol module. The
Fcntl module provides us with a symbolic constant LOCK_EX. This is imported via the :flock tag, which imports this and other flock() constants.
The code used by both processes is identical, therefore we cannot predict
which one will get its hands on the lock file and succeed in locking it
first, so we add print() statements to find the PID of the
process blocking (waiting to get the lock) on a lock request.
When the above code executed from the command line, we see that one of the processes gets the lock:
% ./excl_lock.pl 3038: I'm going to obtain the lock 3038: I've got the lock 3037: I'm going to obtain the lock |
Here we see that process 3037 is blocking, so we attach to it:
% strace -p 3037 about to attach c10 flock(3, LOCK_EX |
It's clear from the above trace, that the process waits for an exclusive lock. (Note, that the missing closing parentnheses is not a typo!)
As you become familiar with watching the traces of different processes, you will understand what is happening more easily.
[ TOC ]
Another approach to see a trace of the running code is to use a debugger
such as gdb (the GNU debugger). It's supposed to work on any platform which supports
the GNU development tools. Its purpose is to allow you to see what is going
on inside a program while it executes, or what it was doing at the moment it crashed.
To trace the execution of a process, gdb needs to know the process id (PID) and the path to the binary that the
process is executing. For Perl code it's /usr/bin/perl (or whatever is the path to your Perl), for httpd processes it will be the
path to your httpd executable.
Here are a few examples using gdb.
Let's go back to our last locking example, execute it as before and attach to the process that didn't get the lock:
% gdb /usr/bin/perl 3037 |
After starting the debugger we execute the where command to see the trace:
(gdb) where
#0 0x40131781 in __flock ()
#1 0x80a5421 in Perl_pp_flock ()
#2 0x80b148d in Perl_runops_standard ()
#3 0x80592b8 in perl_run ()
#4 0x805782f in main ()
#5 0x400a6cb3 in __libc_start_main (main=0x80577c0 <main>, argc=2,
argv=0xbffff7f4, init=0x8056af4 <_init>, fini=0x80b14fc <_fini>,
rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffff7ec)
at ../sysdeps/generic/libc-start.c:78
|
That's not what we expected to see and now it's a different trace.
#0 tells us the most recent call that was executed, which is a C language
flock() implementation. But the previous call (#1) isn't print(), as we would expect, but a higher level of
Perl's internal flock(). If we follow the trace of calls what
we actually see is an Opcodes tree, which can be better presented as:
__libc_start_main
main ()
perl_run ()
Perl_runops_standard ()
Perl_pp_flock ()
__flock ()
|
So I would say that it's less useful than strace, since if there are several flock()s it's almost impossible
to know which of them was called. This problem is solved by strace, which shows the sequence of the system calls executed. Using this
sequence we can locate the corresponding lines in the code.
(META: the above is wrong - you can ask to display the previous command executed by the program (not gdb)! What is it?)
When you attach to a running process with debugger, the program stops
executing and control of the program is passed to the debugger. You can
continue the normal program run with the continue command or execute it step by step with the next and step commands which you type at the gdb prompt. (next steps over any function calls in the line, while step steps into them).
C/C++ debuggers are a very large topic and beyond the scope of this
document, but the gdb man page is quite good and you can try info
gdb as well. You might also want to check the ddd (Data Display Debbuger) which provides a visual interface to gdb and other debuggers. It even knows how to debug Perl programs!
For completeness, let's see the gdb trace of the httpd process that's still
hanging in the while(1) loop of the first example in this section:
% gdb /usr/local/apache/bin/httpd 1005
(gdb) where
#0 0x4014a861 in __libc_nanosleep ()
#1 0x4014a7ed in __sleep (seconds=1) at ../sysdeps/unix/sysv/linux/sleep.c:78
#2 0x8122c01 in Perl_pp_sleep ()
#3 0x812b25d in Perl_runops_standard ()
#4 0x80d3721 in perl_call_sv ()
#5 0x807a46b in perl_call_handler ()
#6 0x8079e35 in perl_run_stacked_handlers ()
#7 0x8078d6d in perl_handler ()
#8 0x8091e43 in ap_invoke_handler ()
#9 0x80a5109 in ap_some_auth_required ()
#10 0x80a516c in ap_process_request ()
#11 0x809cb2e in ap_child_terminate ()
#12 0x809cd6c in ap_child_terminate ()
#13 0x809ce19 in ap_child_terminate ()
#14 0x809d446 in ap_child_terminate ()
#15 0x809dbc3 in main ()
#16 0x400d3cb3 in __libc_start_main (main=0x809d88c <main>, argc=1,
argv=0xbffff7e4, init=0x80606f8 <_init>, fini=0x812b33c <_fini>,
rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffff7dc)
at ../sysdeps/generic/libc-start.c:78
|
As before we can see a complete trace of the last executed call.
As you have noticed, I still haven't explained why the process hanging in
the while(1) loop isn't aborted by Apache. The next section covers this.
[ TOC ]
When a user presses a STOP or RELOAD button, Apache could detect this via the SIGPIPE signal (Broken pipe). It could then halt the script execution and perform
all the cleanup stuff it has to do. But the SIGPIPE will be triggered only when the process attempts to send some data to the
client browser via the broken connection. If the script is doing some
lengthy operation, without writing anything to the client, it won't be
stopped until that operation is completed and an attempt is made to send at
least one character the client.
Apache >= 1.3.6 does not catch SIGPIPE anymore, and modperl can do the job much better.
Since Apache version 1.3.6:
$r->print returns true on success, false on failure (broken connection).
If you want a similar to the old SIGPIPE behaviour, simply configure:
PerlFixupHandler Apache::SIG |
When Apache's SIGPIPE handler is used, Perl may be left in the middle of it's eval context,
causing bizarre errors during subsequent requests are handled by that
child. When Apache::SIG is used, it installs a different SIGPIPE handler which rewinds the context to make sure Perl is back to normal
state, preventing these bizarre errors.
[ TOC ]
Let's use the knowledge we have acquired to trace the execution of the code and see all the events as they happen.
Let's take a little script that obviously ``hangs'' the server:
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$i++;
sleep 1;
}
|
The script gets a request object $r by shift()ing it from the @_
argument list passed by the handler() subroutine. (This magic
is done by Apache::Registry). Then the script sends a Content-type
header, telling the client that we are going to send some plain text.
We print out a single line telling us the id of the process that handles this request, which we need to know in order to run the tracing utility. Then we flush Apache's buffer. (If we don't flush the buffer we will never see the line printed. That's because our output is shorter than the buffer size and the script intentionally hangs, so the buffer won't be auto-flushed as the script hangs at the end.)
Then we enter an infinite loop, which just increments a dummy variable and sleeps for a second.
Running strace -p PID, where PID is the process ID as printed to the browser, we see the following output
printed every second:
SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0 SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0 SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0 nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0 time([941281947]) = 941281947 time([941281947]) = 941281947 |
Let's leave strace running and press the STOP button. Did anything change? No, the same trace printed every second. Which
means that Apache didn't detect the broken pipe.
Let's try to write a NULL \0 character to the client so the broken pipe will be detected as soon the Stop button is pressed:
while(1){
$r->print("\0");
last if $r->connection->aborted;
$i++;
sleep 1;
}
|
We add a print() statement to print a NULL character and then
we check whether the connection was aborted. If it was, we break from the
loop.
We run this script and strace on it as before, but we see that it still doesn't work. The trouble is we aren't flushing the buffer. After printing the NULL, add $r->rflush():
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$r->print("\0");
$r->rflush;
last if $r->connection->aborted;
$i++;
sleep 1;
}
|
Watch strace's output on the running process and then press the
Stop button, you will see:
SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0
nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0
time([941284358]) = 941284358
write(4, "\0", 1) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) ---
select(5, [4], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
time(NULL) = 941284358
write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81
gettimeofday({941284359, 39113}, NULL) = 0
times({tms_utime=9, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 41551400
close(4) = 0
SYS_174(0xa, 0xbffff4e0, 0xbffff454, 0x8, 0xa) = 0
SYS_174(0xe, 0xbffff46c, 0xbffff3e0, 0x8, 0xe) = 0
fcntl(18, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}
|
Apache detects the broken pipe as you see from this snippet:
write(4, "\0", 1) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) --- |
Then it stops the script and does all the cleanup work, like access logging:
write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81 |
In the access_log file we can see the file descriptor of the logfile in this process (17).
Let's see how can we make the code more general-purpose:
Apache::SIG helps us, use this configuration setting in
httpd.conf:
PerlFixupHandler Apache::SIG |
Now the following script doesn't need to check for aborted connections.
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$r->print("\0");
$r->rflush;
$i++;
sleep 1;
}
|
Apache::SIG installs the SIGPIPE handler, which stops the script's execution for us when it sees the broken
pipe. This setting affects all processes of course.
If you would like to log when a request was cancelled by a SIGPIPE in your
Apache access_log, you must define a custom LogFormat in your httpd.conf, like so:
PerlFixupHandler Apache::SIG
LogFormat "%h %l %u %t \"%r\" %s %b %{SIGPIPE}e"
|
If the server has noticed that the request was cancelled via a
SIGPIPE, then the log line will end with 1, otherwise it will just be a dash.
[ TOC ]
This is a critical issue with aborted scripts.
What happens to locked resources? Will they be freed or not? If not, scripts using these resources and the same locking scheme will hang, waiting for this resource to be freed.
Under mod_cgi this was a problem only if you happened to use external lock
files for lock indication, instead of using flock(). If the
script was aborted between the lock and the unlock code, and you didn't
bother to write cleanup code to remove old dead locks then you were in big
trouble.
With mod_cgi you can create an END block, and put the cleanup code there:
END{
# some code that ensures that locks are removed
}
|
When the script is aborted, Apache will run the END blocks.
If you use flock() things are much simpler, since all opened files will be closed. When the
file is closed, the lock is removed as well and all the locked resources
will be freed. There are systems where flock(2) is
unavailable, and for those you can use Perl's emulation of this function.
With mod_perl things are more complex. Because the processes don't exit
after processing a request, files won't be closed unless you explicitly
close() them or reopen with the open() call,
which first closes a file. Let's see what problems we might encounter, and
possible solutions for them.
[ TOC ]
First I want to make a little detour to discuss the "critical section" issue.
Let's start with a resource locking scheme. A schematic representation of a proper locking technique is as follows:
1. lock a resource
<critical section starts>
2. do something with the resource
<critical section ends>
3. unlock the resource
|
If the locking is exclusive, only one process can hold the resource at any given time, which means that all the other processes will have to wait, and this code snippet becomes a so called bottleneck. That's why the section of the code where the resource is locked is called critical and you must make it as short as possible.
In a shared locking scheme, where many processes can concurrently access the resource, if there are processes that sometimes want to get an exclusive lock it's also important to keep the critical section as short as possible.
The code below uses a shared lock, but has a poorly-designed critical section:
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "filename" or die "$!";
flock $fh, LOCK_SH;
# start critical section
seek $fh, 0, 0;
my @lines = <$fh>;
for(@lines){
print if /foo/;
}
# end critical section
close $fh; # close unlocks the file
|
The code opens the file for reading, locks and rewinds to the start, reads all the lines from the file and prints out the lines that contain the string foo. Note that the file remains open and locked while the loop executes.
We can optimize the critical section this way:
Once the file has been read, we have all the information we need from it. The loop might take some time to complete. We don't need the file to be open while the loop executes, because we don't access it inside the loop. If we close the file before we start the loop, we will allow other processes to have access to the file if they need it, instead of blocking them for no reason.
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "filename" or die "$!";
flock $fh, LOCK_SH;
# start critical section
seek $fh, 0, 0;
my @lines = <$fh>;
# end critical section
close $fh; # close unlocks the file
for(@lines){
print if /foo/;
}
|
This is another very similar script, but now using an exclusive lock. It reads in a file and writes it back, prepending a number of new text lines to the head of the file.
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# start critical section
seek $fh, 0, 0;
my @add_lines =
(
qq{Complete documentation for Perl, including FAQ lists,\n},
qq{should be found on this system using `man perl' or\n},
qq{`perldoc perl'. If you have access to the Internet, point\n},
qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
);
my @lines = (@add_lines, <$fh>);
seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
# end critical section
close $fh; # close unlocks the file
|
First let's see how the code works. I will discuss why I use the
Symbol module to generate the file handles in the next section.
Since we want to read the file, modify and write it back, without anyone
else changing it on the way, we open it for read and write with the help of +>> and lock it with an exclusive lock. You cannot safely accomplish this task
by opening the file first for read and then reopening for write, since
another process might change the file between the two. (You could get away
with +<, see
perldoc -f open or the perlfunc manpage for more information about the open() function.)
Next the code prepares the lines of text it wants to prepend to the head of
the file, and assigns them and the content of the file to the
@lines array. Now we have our data ready to be written back to the file, so we
seek() to the start of the file and truncate() it
to zero size. In our example the file always grows, so in this case there
is no need to truncate it, but if there was a chance that the file might
shrink then truncating would be necessary. However it's good practice to
use truncate(), as you never know what changes your code might
undergo in the future. The truncate() operation does not carry
any significant performance penalty. Finally we write the data back to the
file and close it, which unlocks it as well.
Did you notice that we created the text lines to be prepended as close to the place of usage as possible? This is good ``locality of code'' style, but it makes the critical section longer. In such cases you should sacrifice style, in order to make the critical section as short as possible. An improved version of this script with a shorter critical section looks like this:
use Fcntl qw(:flock);
use Symbol;
my @lines =
(
qq{Complete documentation for Perl, including FAQ lists,\n},
qq{should be found on this system using `man perl' or\n},
qq{`perldoc perl'. If you have access to the Internet, point\n},
qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
);
my $fh = gensym;
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# start critical section
seek $fh, 0, 0;
push @lines, <$fh>;
seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
# end critical section
close $fh; # close unlocks the file
|
There are two important differences. Firstly, we prepare the text lines to
be prepended before the file is locked. Secondly, instead of creating a new array and copying
lines from one array to another, we append the file directly to the @lines array.
[ TOC ]
Let's get back to the main issue of this section, which is safe resource locking.
Unless you use the Apache::PerlRun handler that does the cleanup for you, if you don't make a habit of closing
all the files that you open you will encounter lots of problems. If you
open a file but don't close it, you will have file descriptor leakage.
Since the number of file descriptors available to you is finite, at some
point you will run out of them and your service will fail.
This is bad, but you can live with it until you run out of file descriptors
(which will happen much faster on a heavily used server). But this is
nothing compared to the trouble you will give yourself if you lock, but
forget to unlock or close your locked files. Since close()
always unlocks the file, you don't have to unlock files explicitly.
But a locked file will stay locked after your code has terminated!
Any other process requesting a lock on the same file (or resource) will wait indefinitely for it to become unlocked. Since this will not happen until the server reboots, all these processes will hang.
Here is an example of such a terrible mistake:
open IN, "+>>filename" or die "$!"; flock IN, LOCK_EX; # do something # quit without closing and unlocking the file |
Is this safe code? No - we forgot to close the file.
So let's add the close():
open IN, "+>>filename" or die "$!";
flock IN, LOCK_EX;
# start critical section
# do something
# end critical section
# close and unlock the file
close IN;
|
Is it safe code now? Unfortunately it is not. There is a chance that the
user may abort the request (for example by pressing his browser's
Stop or Reload buttons) during the critical section. The script will be aborted before it
has had a chance to close() the file, which is just as bad as
if we forgot to close it.
There are a few approaches we can take to solving this problem. If you are
running under Apache::Registry and friends, the END
block will perform the cleanup work for you. You might use END in the same way for scripts running under mod_cgi, or in plain Perl
scripts. Just add the cleanup code to this block and you are safe. Since
under mod_perl the END blocks will not be executed after the completion of a request, but only
when an Apache child process exits, then if you are writing your own
handlers you will need to use the register_cleanup() function
to supply cleanup code similar to that used in END blocks instead of using END blocks. We will see a few examples later.
Of course, if the same child executes the same section of code of the same
script, the open() call on the same file handle will first
close() the file. But this will happen only if it's the same
filehandle, which is correct if you use the scalar variable like
IN, OUT. As you will see in a moment if you use Symbol or <IO::*> modules, a unique filehandle will be generated every time
-- you get a file desriptor leakage and the file will be not unlocked in
case it was locked.
On Linux OS you can use the lsof(1) utility to list open files
and the processes who have opened them. On FreeBSD you would use the
fstat(1) utility.
Now I want to show you a much easier safe locking solution.
Although it might not be obvious, the problem we have encountered is
actually the fact that file handles like IN are global variables. If we could make them lexically scoped, all our
worries would go away. You know that lexically scoped (with the
my() operand) variables are automatically destroyed when they
go out of scope. When the program quits, all the lexical variables will be
destroyed since they only have file scope. When a variable holding an
opened file descriptor is destroyed, the file will automatically be closed
and unlocked.
So if you use this technique to work with files, you even don't have to explicitly close the files! (Of course if you recall the critical section discussion above, you will still want to make sure that you close them as soon as possible.) In addition to the benefits of safe file handling, having the file handlers lexically scoped will protect you from names collisions. If you use global file handles, e.g when you have to open more than one file, you always have to make sure you don't use the same filehandle name somewhere else in the code in case it might still be associated with an open file. To emphasize the risk of collisions think of subroutine that opens a file for you:
sub open_file{
my $filename = shift;
open FILE, ">$filename" or die "$!";
return \*FILE;
}
|
my $fh1 = open_file("/tmp/x");
my $fh2 = open_file("/tmp/y");
print $fh1 "X";
print $fh2 "Y";
|
This code doesn't do what you think it should do. Instead of writing the
character X to /tmp/x file and Y to /tmp/y, what you see after running this script is that /tmp/x is empty and /tmp/y
contains a XY string. Why is that? Because you have used the same global variable FILE twice, and when you called open_file() for a second time it
opened a different file using the same variable. Since
open_file() always returns a reference to the same global file
handle variable, both $fh1 and $fh2 point to it.
There is another way. As you saw earlier we can generate unique, lexically
scoped file handles with the Symbol module.
Symbol::gensym() creates an anonymous glob and returns a reference to it. Such a glob
reference can be used as a file or directory handle. Here is how you can
use it:
use Symbol; my $fh = gensym; open $fh, "+>>filename" or die "$!"; flock $fh, LOCK_EX; # do something |
Now the file will be always unlocked after processing the request.
Instead of using close(), you might use a block:
use Symbol;
{
my $fh = gensym;
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# do something
}
# the file will be automatically closed and unlocked at this point
|
But this is perhaps not so obvious to the reader of the code, so you might
want to avoid this last technique and put in an explicit
close().
You can also use the IO::* modules, such as IO::File or
IO::Dir. These are much bigger than the <Symbol> module, and worth using for files or directories only if you are
already using them for the other features which they provide. As a matter
of fact, these modules use the Symbol module themselves. Here are some examples of their use:
use IO::File;
my $fh = IO::File->new(">filename");
# the rest is as before
|
and:
use IO::Dir;
my $dh = IO::Dir->new("dirname");
|
Under perl 5.6 Symbol.pm-like functionality is a built-in feature, so you can do:
open my $fh, "> filename"; |
and $fh will be automatically vivified as a valid filehandle, so you don't need to
use the Symbol module anymore.
[ TOC ]
Finally, let's look at the case where we need special clean up code. As you have seen, we solved the problem of accidentally leaving file handles lying around by lexically scoping them. There are however, situations where you absolutely must write cleanup code. A tied dbm file is a good example.
A reminder: a dbm file is a simple database, which allows you to store
pairs of keys and values in it. As of this writing, Berkeley DB is the most
advanced dbm implementation, it allows you to store key/value pairs using
the HASH, BTREE and RECNO algorithms. The BerkeleyDB
module provides a Perl interface to Berkeley DB versions 2 and 3, while the DB_File module handles the older Berkeley DB, version 1. Refer to the DB_File man page for more information.
With the help of the TIE interface, working with dbm files is very simple because they are represented in Perl as simple hash variables. They behave almost exactly like hashes.
In order to access a dbm file you have to tie it first:
use Fcntl qw(O_RDWR O_CREAT);
use DB_File;
my $filename = "/tmp/mydb";
my %hash;
tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
|
The first argument to tie() is the hash variable to which we
want the dbm file to be tied. The remaining arguments are: the name of the
module that provides the interface (DB_File in this case); the name of our dbm file; Fcntl flags; file permissions; and
finally the interface method to be used (DB_HASH, DB_BTREE or DB_RECNO).
From now on we use %hash to read from and write to the dbm file, like this:
$hash{foo} = "Larry Wall";
my $name = $hash{foo};
|
The only wrinkle is that when we modify the hash (by assigning some values
to it) the changes are not written immediately to the file. They are cached
to improve performance. The cache buffers are flushed in the following
circumstances: when they become full; when the sync() method
is called on the database handle; and when the hash is untied (closed). So
be aware that if the program quits abnormally, the dbm file might be
corrupted.
To untie the dbm file, simply call:
untie %hash; |
To gain the access to the sync() method, you should retrieve
the database handle which is returned by the tie() method:
my $dbh = tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
|
Now you can flush the cache with:
$hash{foo} = "Larry Wall";
$dbh->sync;
|
Important: If you have saved a copy of the object returned from
tie(), the underlying database file will not be closed until
both the tied variable is untied and all copies of the saved object are
destroyed.
We do this as follows:
undef $dbh; untie %hash; |
Of course, you have to lock the dbm file exactly like any other resource if some script modifies its contents. Refer to Locking dbm handlers for more information.
Okay, enough introduction, let's get to the point. Since both %hash
and $dbh are lexically scoped variables, they will always be destroyed, even if you
forgot to untie() them or if the request was aborted before
the untie() function was called.
Suppose that you want to have the benefit of mod_perl's persistent global
variables in each process and to use this feature to create persistent dbm
hashes. You tie() them only once per Apache child process,
thus saving the time which it would take to tie() and
untie() them for each client request. Assuming that you
remember that you must flush the cache buffers (with the
sync() method) when you modify the hash that represents the
dbm file, the idea is a good one. Let's code it...
We declare $dbh and %hash as global variables, then pull in the
Fcntl module and import the symbols we are going to use. Actually we need only LOCK_EX from the tags provided by :flock. We pull in the DB_File and Symbol modules:
use strict;
use vars qw($dbh %hash);
use Fcntl qw(:flock O_RDWR O_CREAT);
use DB_File;
use Symbol;
# Send the I<Content-type> header of plain text type and tell the
# user the PID of the process that's serving the request:
my $r = shift;
$r->send_http_header('text/plain');
$r->print("PID $$\n");
# The location of the dbm file and its lock file:
my $filename = "/tmp/mydb";
my $lockfile = "$filename.lock";
# Generate a unique anonymous glob, store it in a lexically scoped
# variable C<$fh>, and lock the file. This in turn advisory locks
# the dbm file (which will be safely tied):
my $fh = gensym;
open $fh, ">$lockfile" or die "Cannot open $lockfile: $!";
flock $fh, LOCK_EX;
|
# Other copies of this script which wish to access the following # code have to acquire the lock file first. Since it's an exclusive # lock, only one copy of the script will be able to tie the dbm # file. |
$dbh ||= tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
|
This code snippet demands some explanation.
$a ||= $b; |
is the same as:
$a = $a || $b; |
The boolean test || (logical OR) doesn't care about undefined values, since undef is false in Perl. So what it does is this:
If $a is true, leave it unmodified. Otherwise test $b.
If $b true, assign the value of $b to $a.
If $b is false, $a stays undefined.
Note that 0 and "" (the empty string) are both defined, but they are false values! Refer to the perlop(1) manpage for more information
about the || operator.
Back to our tie() snippet. For each mod_perl process, when
this code is executed for the first time, the $dbh variable is undefined. Therefore the right-hand part of the statement will
be executed, tie()ing the dbm file. On every subsequent
invocation of the code by that same process, $dbh will contain a database handle. This is considered by Perl to be a true value, so the tie() call will not be executed, eliminating the
overhead of the call to tie().
Now we fill the dbm file with random key/value pairs. Each invocation of
the code will either generate a new key/value pair or, if an existing key
is returned by rand(), override an old one.
$hash{int rand 10} = (qw(a b c d))[int rand 4];
$dbh->sync();
|
The most important part of the code is to flush the modifications to the dbm.
# unlock the db close $fh; |
Now it's safe to unlock the dbm file. Please refer to Locking dbm handlers to learn why you should use a dbm's file descriptor to lock itself. To cut a long story short, if you don't you may corrupt your dbm file.
After we leave the critical section, we can take our time and print out the current contents of the dbm file.
# print the contents of the the dbm file
print map {"$_ => $hash{$_}\n"} sort keys %hash;
|
Here is the same code with fewer comments:
use strict;
use vars qw($dbh %hash);
use Fcntl qw(:flock O_RDWR O_CREAT);
use DB_File;
use Symbol;
my $r = shift;
$r->send_http_header('text/plain');
$r->print("PID $$\n");
my $filename = "/tmp/mydb";
my $lockfile = "$filename.lock";
my $fh = gensym;
open $fh, ">$lockfile" or die "Cannot open $lockfile: $!";
# must lock the db file before opening it
flock $fh, LOCK_EX;
$dbh ||= tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
# fill the dbmfile with random key/value pairs
$hash{int rand 10} = (qw(a b c d))[int rand 4];
# sync the DB
$dbh->sync();
# unlock the db
close $fh;
# print the contents of the the dbm file
print map {"$_ => $hash{$_}\n"} sort keys %hash;
|
Well, if you run this code, you pretty soon figure out that this code
doesn't do what we thought it would. What happens is that each process
keeps its own copy of the %hash and modifies it. When the process calls the sync() method, the
dbm file is updated with the contents of the private %hash of this process. If a request happens to be served by a process that hasn't
yet tie()d the %hash, the dbm file will be initialized to the value of the %hash used by the process that last called sync() on the dbm file.
But if it is handled by a process that has already tied the %hash, it won't read the contents of the dbm file but will use its private value
of the
%hash instead.
In reality things are even more complicated. The above scenario is true
only when the hash file is smaller than the buffer size of the dbm file.
When it becomes bigger than the buffer, its contents are flushed. When you
do keys %hash, all the keys will be brought from the dbm file which causes the process
to read the values saved by the previous sync() calls and
automatic flushes caused by buffer overflow.
Which creates a whole big mess with the data and makes the whole idea is useless.
But if you have followed me this far, let's see what else is wrong with
this code. It's the sync() call. If the script somehow stops
before sync() is called, the dbm will be unlocked because $fh
is lexically scoped. But it won't be properly sync()ed, which
at some point will corrupt the dbm file.
The solution is simple. Write an END block to sync the file:
END{
# make sure that the DB is flushed
$dbh->sync();
}
|
Under mod_perl, the above will work only for Apache::Registry
scripts. Otherwise execution of the END block will be postponed until the process terminates. If you write a
handler in the Perl API use the register_cleanup() method instead. It accepts a reference to a subroutine as an argument:
$r->register_cleanup(sub { $dbh->sync() });
|
Even better would be to check whether the client connection has been aborted. If you don't check, the cleanup code will always be executed and for normally terminated scripts this may not be what you want:
$r->register_cleanup(
# make sure that the DB is flushed
sub{
$dbh->sync() if Apache->request->connection->aborted();
}
);
|
So in the case of END block usage you would use:
END{
# make sure that the DB is flushed
$dbh->sync() if Apache->request->connection->aborted();
}
|
Note that if you use register_cleanup() it should be called at the beginning of the script, or as soon as the
variables you want to use in this code become available. If you use it at
the end of the script, and the script happens to be aborted before this
code is reached, there will be no cleanup performed.
For example CGI.pm registers the cleanup subroutine in its new() method:
sub new {
# code snipped
if ($MOD_PERL) {
Apache->request->register_cleanup(\&CGI::_reset_globals);
undef $NPH;
}
# more code snipped
}
|
There is another way to register a section of cleanup code for Perl API
handlers. You may use PerlCleanupHandler in the configuration file, like this:
<Location /foo>
SetHandler perl-script
PerlHandler Apache::MyModule
PerlCleanupHandler Apache::MyModule::cleanup()
Options ExecCGI
</Location>
|
Apache::MyModule::cleanup() performs the cleanup, obviously.
[ TOC ]
A similar situation to Pressed Stop button disease happens when the browser times out the connection (is it about 2 minutes?).
There are cases when your script is about to perform a very long operation
and there is a chance that its duration will be longer than the client's
timeout. One example is database interaction, where the DB engine hangs or
needs a long time to return the results. If this is the case, use $SIG{ALRM} to prevent the timeouts:
$timeout = 10; # seconds
eval {
local $SIG{ALRM} =
sub { die "Sorry timed out. Please try again\n" };
alarm $timeout;
... db stuff ...
alarm 0;
};
die $@ if $@;
|
It was recently discovered that local $SIG{'ALRM'} does not restore the original underlying C handler. This was fixed in
mod_perl 1.19_01 (CVS version). As a matter of fact none of the
local $SIG{FOO} signals restores the original C handler - read
Debugging Signal Handlers ($SIG{FOO}) for a debug technique and a possible workaround.
[ TOC ]
Your server is up and running, but something appears to be wrong. You want to see the numbers to tune your code or server configuration. You just want to know what's really going on inside the server.
How do you do it?
There are a few tools that allow you to look inside the server.
[ TOC ]
This is a very useful module. It lets you watch what happens to the Perl parts of the server. You can see the size of all subroutines and variables, variable dumps, lexical information, OPcode trees, and more.
[ TOC ]
This configuration enables the Apache::Status module with its minimum feature set. Add this to httpd.conf:
<Location /perl-status>
SetHandler perl-script
PerlHandler Apache::Status
order deny,allow
#deny from all
#allow from
</Location>
|
If you are going to use Apache::Status it's important to put it as the first module in the start-up file, or in httpd.conf:
# startup.pl use Apache::Status (); use Apache::Registry (); use Apache::DBI (); |
If you don't put Apache::Status before Apache::DBI, you won't get the Apache::DBI menu entry in the status. For more about
Apache::DBI see Persistent DB Connections.
[ TOC ]
There are several variables which you can use to modify the behaviour of Apache::Status.
This single directive will enable all of the options described below.
When you are browsing symbol tables, you can view the values of your
arrays, hashes and scalars with Data::Dumper.
With this option On and the Apache::Peek module installed, functions and variables can be viewed in Devel::Peek style.
With this option On and the B::LexInfo module installed, subroutine lexical variable information can be viewed.
With this option On and B::Deparse version 0.59 or higher (included in Perl 5.005_59+), subroutines can be
``deparsed''.
Options can be passed to B::Deparse::new like so:
PerlSetVar StatusDeparseOptions "-p -sC" |
See the B::Deparse manpage for details.
With this option On, text-based op tree graphs of subroutines can be
displayed, thanks to B::Terse.
With this option On and the B::TerseSize module installed, text-based op tree graphs of subroutines and their size
can be displayed. See the B::TerseSize docs for more info.
With this option On and the B::TerseSize module installed, ``Memory Usage'' will be added to the Apache::Status main menu. This option is disabled by default, as it can be rather cpu
intensive to summarize memory usage for the entire server. It is strongly
suggested that this option only be used with a development server running
in -X mode, as the results will be cached.
Remember to preload B::TerseSize with:
PerlModule B::Terse |
When StatusDumper (see above) is enabled, another link "OP Tree
Graph" will be present with the dump if this configuration variable is set to On.
This requires the B module (part of the Perl compiler kit) and the
B::Graph module version 0.03 or higher to be installed along with the `dot' program.
Dot is part of the graph visualization toolkit from AT&T: http://www.research.att.com/sw/tools/graphviz/.
WARNING: Some graphs may produce very large images, and some graphs may
produce no image if B::Graph's output is incorrect.
There is more information about Apache::Status in its manpage.
[ TOC ]
Assuming that your mod_perl server listens on port 81, fetch http://www.myserver.com:81/perl-status
Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 process 187138, running since Thu Nov 19 09:50:33 1998 |
Below all the sections are links when you view them through /perl-status
Signal Handlers Enabled mod_perl Hooks PerlRequire'd Files Environment Perl Section Configuration Loaded Modules Perl Configuration ISA Tree Inheritance Tree Compiled Registry Scripts Symbol Table Dump |
Let's follow, for example, PerlRequire'd Files. We see:
PerlRequire Location /home/perl/apache-startup.pl /home/perl/apache-startup.pl |
From some menus you can move deeper to peek into the internals of the server, to see the values of the global variables in the packages, to see the cached scripts and modules, and much more. Just click around...
[ TOC ]
Sometimes when you fetch /perl-status and look at the Compiled
Registry Scripts you see no listing of scripts at all. This is correct: Apache::Status shows the registry scripts compiled in the httpd child which is serving
your request for /perl-status. If the child has not yet compiled the script you are asking for,
/perl-status will just show you the main menu.
[ TOC ]
The Status module allows a server administrator to find out how well the server is performing. An HTML page is presented that gives the current server statistics in an easily readable form. If required, given a compatible browser this page can be automatically refreshed. Another page gives a simple machine-readable list of the current server state.
This Apache module is written in C. It is compiled by default, so all you have to do to use it is enable it in your configuration file:
<Location /status>
SetHandler server-status
</Location>
|
For security reasons you will probably want to limit access to it. If you have installed Apache according to the instructions you will find a prepared configuration section in httpd.conf: to enable use of the mod_status module, just uncomment it.
ExtendedStatus On
<Location /status>
SetHandler server-status
order deny,allow
deny from all
allow from localhost
</Location>
|
You can now access server statistics by using a Web browser to access the page http://localhost/status (as long as your server recognizes localhost:).
The details given by mod_status are:
[ TOC ]
This module is covered in the section ``Apache::* Modules''
[ TOC ]
See Sometimes it Works Sometimes it does Not
[ TOC ]
When the code doesn't perform as expected, either never or just sometimes, we say that the code needs debugging. There are several levels of debugging complexity.
The basic level is when Perl terminates the program during the compilation phase, before it tries to run the resulting byte-code. This usually happens because there are syntax errors in the code, or perhaps a module is missing. Sometimes it takes quite an effort to solve these problems, since code that uses Apache CORE modules generally won't compile when executed from the shell. We will learn how to solve syntax problems in mod_perl code quite easily.
Once the program compiles and begins to run, there might be logical
problems, when the program doesn't do what you thought you had programmed
it to do. These are somewhat harder to solve, especially when there is a
lot of code to be inspected and reviewed, but it's just a matter of time.
Perl can help a lot, for example to locate typos, when we enable warnings.
For example, if you wanted to compare two numbers, but you omitted the
second '=' character so that you had something like if $yes = 1 instead of if $yes == 1, it warns us about the missing '='.
The next level is when the program does what it's expected to do most of
the time, but occasionally misbehaves. Often you find that
print() statements or the Perl debugger can help, but
inspection of the code generally doesn't. Often it's quite easy to debug
with print(), but sometimes typing the debug messages can
become very tedious. That's where the Perl debugger comes into its own.
While print() statements always work, running the perl
debugger for CGI scripts might be quite a challenge. But with the right
knowledge and tools handy the debug process becomes much easier.
Unfortunately there is no one easy way to debug your programs, as the
debugging depends entirely on your code. It can be a nightmare to debug
really complex code, but as your style matures you can learn ways to write
simpler code that is easier to debug. You will probably find that when you
write simpler clearer code it does not need so much debugging in the first
place.
One of the most difficult cases to debug, is when the process just terminates in the middle of processing a request and dumps core. Often when there is a bug the program tries to access a memory area that doesn't belong to it. The operating system halts the process, tidies up and dumps core (it creates a file called core in the current directory of the process that was running). This is something that you rarely see with plain perl scripts, but it can easily happen if you use modules written in C or C++ and something goes wrong with them. Occasionally you will come across a bug in mod_perl itself (mod_perl is written in C), that was in a deep slumber before your code awakened it.
In the following sections we will go through in detail each of the problems presented, thoroughly discuss them and present a few techniques to solve them.
[ TOC ]
While developing code we often make syntax mistakes, like forgetting to put a comma in a list, or a semicolon at the end of a statement.
(Even after a block, where a semicolon is not required, it may be better to put one in: there is a chance that you will add more code later, and when you do you might forget to add the now required semicolon. Similarly, more items might be added later to a list; unlike many other languages, Perl has no problem when you end a list with a redundant comma.)
One approach to locating syntactically incorrect code is to execute the
script from the shell with the -c flag. This tells Perl to check the syntax but not to run the code
(actually, it will execute
BEGIN, END blocks, and use() calls, because these are considered as occurring outside the execution of
your program). (Note also that Perl 5.6.0 has introduced a new special
variable, $^C, which is set to true when perl is run with the -c flag; this provides an opportunity to have some further control over BEGIN and
END blocks during syntax checking.) Also it's a good idea to add the -w switch to enable warnings:
perl -cw test.pl |
If there are errors in the code, Perl will report the errors, and tell you at which line numbers in your script the errors were found.
The next step is to execute the script, since in addition to syntax errors there may be run time errors. These are the errors that cause the "Internal Server Error" page when executed from a browser. With plain CGI scripts it's the same as running plain Perl scripts -- just execute them and see that they work.
The whole thing is quite different with scripts that use Apache::* modules which can be used only from within the mod_perl server environment. These scripts rely on other code, and an environment which isn't available when you attempt to execute the script from the shell. There is no Apache request object available to the code when it is executed from the shell.
If you have a problem when using Apache::* modules, you can make a request to the script from a browser and watch the
errors and warnings as they are logged to the error_log file. Alternatively you can use the Apache::FakeRequest module.
[ TOC ]
Apache::FakeRequest is used to set up an empty Apache request object that can be used for
debugging. The Apache::FakeRequest
methods just set internal variables with the same names as the methods and
return the value of the internal variables. Initial values for methods can
be specified when the object is created. The print method prints to STDOUT.
Subroutines for Apache constants are also defined so that you can use
Apache::Constants while debugging, although the values of the constants are hard-coded rather
than extracted from the Apache source code.
Let's write a very simple module, which prints "OK" to the client's browser:
package Apache::Example;
use Apache::Constants;
sub handler{
my $r = shift;
$r->send_http_header('text/plain');
print "You are OK ", $r->get_remote_host, "\n";
return OK;
}
1;
|
You cannot debug this module unless you configure the server to run it, by calling its handler from somewhere. So for example you could put in httpd.conf:
<Location /ex> SetHandler perl-script PerlHandler Apache::Example </Location>
Then after restarting the server you could start a browser, request the location http://localhost/ex and examine the output. Tedious, no?
But with the help of Apache::FakeRequest you can write a little script that will emulate a request and return the
output.
#!/usr/bin/perl |
use Apache::FakeRequest ();
use Apache::Example ();
my $r = Apache::FakeRequest->new('get_remote_host'=>'www.foo.com');
Apache::Example::handler($r);
|
when you execute the script from the command line, you will see the following output:
You are OK www.foo.com |
[ TOC ]
Perl has no problem with the line numbers and file names for modules that
are read from disk in the normal way, but modules that are compiled via
eval such as Apache::Registry and Apache::PerlRun
confuse it.
META: Isn't PERL_MARK_WHERE=1 is a default now?
If you compile with the experimental PERL_MARK_WHERE=1, then even for this kind of module Perl will show you almost the exact line which triggered the error.
There are compiler directives to reset its counter to some value that you decide. You can always pepper your code with these to help you locate the problem. At the beginning of the line you could write something of the form:
#line nnn label |
For example:
#line 298 myscript.pl or #line 890 some_label_to_be_used_in_the_error_message |
The '#' must be in the first column, so if you cut and paste from this text you must remember to remove any leading white space.
The label is optional - the filename of the script will be used by default. This directive sets the line number of the following line, not the line the directive is on. You can use a little script to stuff every N lines of your code with these directives, but then you will have to remember to rerun this script every time you add or remove code lines. The script:
#!/usr/bin/perl
# Puts Perl line markers in a Perl program for debugging purposes.
# Also takes out old line markers.
die "No filename to process.\n" unless @ARGV;
my $filename = shift;
my $lines = 100;
open IN, $filename or die "Cannot open file: $filename: $!\n";
open OUT, ">$filename.marked"
or die "Cannot open file: $filename.marked: $!\n";
my $counter = 1;
while (<IN>) {
print OUT "#line $counter\n" unless $counter++ % $lines;
next if /^#line /;
print OUT $_;
}
close OUT;
close IN;
chmod 0755, "$filename.marked";
|
Another way of narrowing down the area to be searched is to move most of the code into a separate modules. This ensures that the line number will be reported correctly.
To have a complete trace of calls add:
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;
|
[ TOC ]
The universal debugging tool across nearly all platforms and programming
languages is printf() or the equivalent output function. This
can send data to the console, a file, an application window and so on. In
perl we generally use the print() function. With an idea of
where and when the bug is triggered, a developer can insert
print() statements in the source code to examine the value of
data at certain stages of execution.
However, it is rather difficult to anticipate all possible directions a
program might take and what data to suspect of causing trouble. In
addition, inline debugging code tends to add bloat and degrade the
performance of an application and can also make the code harder to read and
maintain. And you have to comment out or remove the debugging
print() calls when you think that you have solved the problem.
But if later you discover that you need to debug the same code again, you
need at best to uncomment the debugging code lines or, at worst, to write
them again from scratch.
Let's see a few examples where we use print() to debug some
problem. In one of my applications I wrote a function that returns the date
that was one week ago. Here it is:
print "Content-type: text/plain\r\n\r\n";
print "A week ago the date was ",date_a_week_ago(),"\n";
# return a date one week ago as a string in format: MM/DD/YYYY
####################
sub date_a_week_ago{
my @month_len = (31,28,31,30,31,30,31,31,30,31,30,31);
my ($day,$month,$year) = (localtime)[3..5];
for (my $j = 0; $j < 7; $j++) {
$day--;
if ($day == 0) {
$month--;
if ($month == 0) {
$year--;
$month = 12;
}
# there are 29 days in February in a leap year
$month_len[1] =
(($year % 4 or $year % 100 == 0) and $year % 400 )
? 28 : 29;
# set $day to be the last day of the previous month
$day = $month_len[$month - 1];
} # end of if ($day == 0)
} # end of for ($i = 0;$i < 7;$i++)
return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
}
|
This code is pretty straightforward. We get today's date and subtract one from the value of the day we get, updating the month and the year on the way if boundaries are being crossed (end of month, end of year). If we do it seven times in loop then at the end we should get a date that was a week ago.
Note that since locatime() returns the year as a value of
current_four_digits_format_year-1900 (which means that we don't have a century boundary to worry about) then if
we are in the middle of the first week of the year 2000, the value of year
returned by localtime() will be 100 and not 0 as you might mistakenly assume. So when the code does $year-- it becomes 99 and not
-1. At the end we add 1900 to get back the correct four-digit year format.
(This is all correct as long as you don't go to the years prior to 1900)
Also note that we have to account for leap years where there are 29 days in February. For the other months we have prepared an array containing the month lengths.
Now when we run this code and check the result, we see that something is
wrong. For example, if today is 10/23/1999 we expect the above code to print 10/16/1999. In fact it prints 09/16/1999, which means that we have lost a month. The above code is buggy!
Let's put a few debug print() statements in the code, near the
$month variable:
sub date_a_week_ago{
my @month_len = (31,28,31,30,31,30,31,31,30,31,30,31);
my ($day,$month,$year) = (localtime)[3..5];
print "[set] month : $month\n"; # DEBUG
for (my $j = 0; $j < 7; $j++) {
$day--;
if ($day == 0) {
$month--;
if ($month == 0) {
$year--;
$month = 12;
}
print "[loop $i] month : $month\n"; # DEBUG
# there are 29 days in February in a leap year
$month_len[1] =
(($year % 4 or $year % 100 == 0) and $year % 400 )
? 28 : 29;
# set $day to be the last day of the previous month
$day = $month_len[$month - 1];
} # end of if ($day == 0)
} # end of for ($i = 0;$i < 7;$i++)
return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
}
|
When we run it we see:
[set] month : 9 |
It is supposed to be the number of the current month (10), but actually it is not. We have spotted a bug, since the only code that
sets the
$month variable consists of a call to localtime(). So did we find a
bug in Perl? let's look at the manpage of the localtime()
function: % perldoc -f localtime Converts a time as returned by the time
function to a 9-element array with the time analyzed for the local time
zone. Typically used as follows:
# 0 1 2 3 4 5 6 7 8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime(time);
|
All array elements are numeric, and come straight out of a struct tm. In particular this means that C<$mon> has the range C<0..11> and C<$wday> has the range C<0..6> with Sunday as day C<0>. Also, C<$year> is the number of years since 1900, that is, C<$year> is C<123> in year 2023, and I<not> simply the last two digits of the year. If you assume it is, then you create non-Y2K-compliant programs--and you wouldn't want to do that, would you? [more info snipped] |
Which reveals to us that if we want to count months from 1 to 12 and not 0
to 11 we are supposed to increment the value of $month. Among other interesting facts about locatime() we also see
an explanation of
$year, which as I've mentioned before is set to the number of years since 1900.
We have found the bug in our code and learned new things about
localtime(). To correct the above code we just increment the
month after we call localtime():
my ($day,$month,$year) = (localtime)[3..5];
$month++;
|
Now let's see some code including conditional and loop statements.
for my $i (1..31)
if( $day > 20) {
}
|
META: continue (unfinished)!!!
[ TOC ]
Sometimes you need to peek into complex data structures, and trying to
print them out can be tricky. That's where Data::Dumper comes to our rescue. For example if we create this complex data structure:
$data =
{
array => [qw(a b c d)],
hash => {
foo => "oof",
bar => "rab",
},
};
|
How do we print it out? Very easily:
use Data::Dumper; print Dumper \$data; |