Debugging workers - gregtampa/PHP-Daemon GitHub Wiki

Debugging Workers

Version 2.0

Introduction

You can probably skip this section if you're experienced in building long running applications and are familiar with common multi-processing techniques.

For many of you, building an app with PHP Simple Daemon (PHPSD) that includes background workers may be your first experience building a long-running, multi-process application. This is a quite different experience than building normal PHP-based websites where state is only shared between requests in cookies, sessions and databases. This difference is compounded because you're not just building a long-running application. If you're building a daemon you're quite often building a piece of infrastructure. First, we should cover some basics:

When you run an application from the commandline (using, for example, php Examples/PrimeNumbers/run.php) you are not running a daemon. You are running a PHP application at the commandline. It will run within your shell, and if you close your shell or get disconnected, the application will die. To run the app as a daemon, you have to pass-in that option as php Examples/PrimeNumbers/run.php -d. What this actually does is start the application in your shell, just like before, but during the startup process the application forks itself, creating an exact replica of itself. Right after the fork, the newly forked process, which is not tied to your shell, continues the startup and begins running your code. The original process immediately exits and the shell returns to a command prompt. The best way to monitor what's happening with your app in daemon mode is to just tail -f the log file.
The most important difference between a long-running server app like those built with PHPSD and normal PHP web apps is the event loop. In a normal PHP web app, your code starts at line 1 of index.php and continues until it reaches the end of your script. Long-running apps (all of them, from your web browser to web servers to operating systems) stay running because of an infinite 'while(true) { ... }' loop. Your application basically runs inside this loop. To shut down the application, you break from the event loop. In PHPSD you don't have to worry about creating or breaking the event loop. It's done in the base class and can be manipulated in your code using the techniques described in the Getting Started guide.
PHP does not expose any ability to create [threads](http://en.wikipedia.org/wiki/Thread_(computing\)) or do multi-threaded programming. This is the primary reason PHP is not optimal for server-side applications like those built with PHPSD. However, sometimes you need to use PHP even if it's not optimal. Maybe you have a large library of PHP code, or you need to share the database models your web apps use, etc. So nothing built with PHPSD will ever be "multi-threaded."

However, the ability to create background tasks is vital to a server app that needs to keep its main event loop responsive and not blocked on things like number crunching, database access, or API usage. This is done using multiple processes, available to you using the Task API and Worker API. Forking a background process is computationally expensive, and forking too many too quickly can easily create a Denial Of Service situation on the server. For example, it's a bad idea to use the Task API every iteration of a 1 second loop interval. Actual capacity varies based on server and OS version, but you can probably assume that forking more than once every 30 seconds is a bad idea, and more than once a minute is questionable.
When you fork, you are creating a replica of all the memory, the variables, the call stack, all the application state. That's important: Memory is not shared between processes, it's duplicated. Processes run isolated from one another. In our case, when we create worker processes, we need to coordinate and control these processes so they can all work together. The OS gives us several tools for this inter-process communication (IPC). You can read about the tools PHPSD uses for IPC in the How Workers Work guide.

Monitoring and Testing

As you build your applications, and before you realize you even have a defect, you need to be able to run the application, manipulate it, and watch its output. I use a few techniques and best-practices to accomplish this during development:

We use the $this->log() method to log events, inputs and outputs. A tail -f on the log file to watch in realtime and analysis afterwards gives you a wealth of actual performance data from your application. Your app-level log messages are interspersed with those written by the library. They all include important context information like the date and time, the process ID (PID) that logged the event, the worker it was running (if any), and the message. It's written in a column format that is easy for humans to read and also, as fixed-width columns, it's easy for you to process the logs and extract whatever information you need.

Also helpful is the PHP Error Log. If you use the error handlers provided with PHPSD, you may see that in the PHP Error Log, the PID is logged along with the message, filename and line number. Also, a stack trace is logged when fatal-errors happen.

If you happen to write any useful scripts for log manipulation, put them in the /scripts directory and send a Pull request!
You can very easily use custom signals to manipulate your daemon. You can see the Daemon.php script in the Examples/PrimeNumbers application for an example. By using $this->on(ON_SIGNAL, ...); you can easily create your own signal handlers that do anything you'd like. From the command line, sending a signal is done using kill -[signal] [pid]. So to send a SIGUSR2 signal to your daemon you'd first find the PID (most easily by a ps ux | grep php or by looking at the log file that you have open in a tail -f) and, supposing the PID is 9876, you'd run kill -12 9876.

You can see how we use signals in the PrimeNumbers application to trigger jobs to be sent to a worker. Note that when we receive a signal, we set a flag, and that flag is checked the next time your execute() method is called (In that case it's once every 3 seconds). The reason we don't just call the worker right inside the closure we pass to $this->on() is because if we were to do that, the signal handler wouldn't actually be returned until you're finished with the work. In PHP signal handlers are non re-entrant. For that reason, we want to do as little as possible inside the signal handler itself.
In all applications, sending a SIGUSR1 (kill -10) signal will print a block of current runtime details to the log file. When using the Worker API, each worker in your application will also respond to the signal and dump current runtime stats for each method and its cumulative error counts.

Here is an example from the PrimeNumbers application:

---------------------------------------------------------------------------------------------------
Worker Runtime Statistics
---------------------------------------------------------------------------------------------------
   Method Duration      Status           Mean     Median      Count
   ================================================================
   primes_among         Called        0.02893    0.00000        285
   primes_among         Running       0.00106    0.00086        285
   sieve                Called        0.00010    0.00000         96
   sieve                Running       3.36055    2.92625         96

   Error Type      Count
   =====================
   Identifier          0
   Corruption          0
   Catchall            0

It's also helpful to invest some time as you develop your daemon to create your own scripts and tooling that you can use to accomplish monitoring and interaction. Some basic examples are included in the /scripts directory. For example, if you use the run scripts there to run the application you can then also use the signal_console.php app to easily send signals without having to lookup the current PID.

Remember, this is a piece of your infrastructure. Investments in scripts that will help you develop and run the daemon in production is time well spent.

The Debug Console

Monitoring and interaction exists to help you spot defects. So what do you do when you've found one? At first, you start by using normal debugging techniques. Maybe you run it under XDebug. But that only works for the parent process -- not helpful when dealing with workers. And then you start tracing: Adding $this->log() and echo statements here and there. Then you run the daemon (like PrimeNumbers) with 2 workers created that each have 2-3 processes and you break down in tears. Debugging multi-threaded and multi-process applications is notoriously difficult. To make the task easier PHPSD ships with an integrated debug console.

Setting Up Your Environment

Before I start the debug console, I arrange 5 terminal windows or panes on my screen. You can see my preferred setup in the attached image but you should find the combination of tools and visual arrangement that makes you most productive.

Debug Windows

You can read more about the signal console and shm console in the Debug Tools documentation.

The Debug Console

The debug console works by inserting break points in several places:

Before a worker process is forked.
At the point where a worker function or method is called.
At each point where a message is passed to or from a worker process:
- When a message is sent from the daemon to a worker to call a method in the worker
- When a message is sent from the worker back to the daemon that it has begun running the call.
- When a message is sent from the worker back to the daemon that the call is complete.
After the receiving process decodes each of those messages.

At each break point a prompt is displayed, this prompt was displayed when the ExampleWorkers daemon called the PrimeNumbers::sieve() method.

[PrimeNumbers 5027 D] [2] Call to PrimeNumbers->sieve() >

This prompt can be read as:

A breakpoint in the PrimeNumbers worker.
In the 5027 PID
In the Daemon process (W is used to denote a breakpoint in a worker process)
The call ID is [2]. Note: Call IDs will be reused after some time. Also note that "2" is actually the first call.
The description of the breakpoint is a Call to PrimeNumbers->sieve()

Many commands are available at the prompt. When you enter help they will be displayed.

Informative Options

help Show the help
types Show a table that will help you decipher what Message types and Call Status types mean.
status Show information about the current process.

Breakpoint Flow Control Options

y Continue to the next breakpoint
n Interrupt at this breakpoint (Will raise an exception).

Information and Introspection Options

show N You can use the Call ID (2) from the prompt to display any shared memory that exists for this call. In most cases this will be empty, as shared memory is copied into the local process as soon as possible. But it will let you see the call object that's passed between the Daemon and the Workers.
show local N This will show you the call object in the process memory.
trace This will display a current stack trace
show args This will display any args sent to the current method. This is used to see internal worker state, to see the args sent to a worker call, use the show local N command.

Debugging Options

indent Y|N When indent is turned-on (default) it will group related debug messages in columns so you can easily read them.
end This will turn-off the debugger but keep the daemon running.
skip This will turn-off the break-point you're currently at for the duration of the debugging session.
skipfor N This will turn-off ALL break-points for N seconds as you specify. I find this very useful to run the daemon normally for several minutes and then enter a debugging session.

Daemon Manipulation

signal N Will send the N signal to the daemon process
eval [code] Will eval whatever code you pass-in in the current scope.
call [method] Will call a worker method. Only available if the current worker is an Object (like PrimeNumbers in the ExampleWorkers daemon).
cleanipc This will clear any pending messages and remove any call objects in shared memory. This will require confirmation. It should not be used on a production server because it could clear resources used by other applications.
shutdown Initiate a graceful shutdown of the daemon (it will shut down at the end of the current iteration of the event loop)
kill Kill all running daemon processes - Useful if there's runaway forking or some other error condition and you need to stop it.

Using the Console

It's important to understand that the console itself is not running in its own process. You can only issue commands when your daemon is at a breakpoint.

In this way it's useful to add a simple signal handler to your daemon, and use the signal console as a way to manipulate your daemon when it's not at a breakpoint.

You can repeat the last command you sent by pressing the "up" arrow and submitting. It will display the ASCII "up" character, not the last command you types, but the daemon interprets that correctly. You can only use this to replay the last command, though: A command history is not kept the way you're familiar with at a bash prompt.