Laravel Queues in Production

2019-04-20

The Laravel queue and event system is one of the most powerful features of the framework and indispensable if you want to build a modern web application.

The queue provides a performant means to run background tasks that don’t block your HTTP lifecycle, allowing you to serve up a responsive application to your customers. Typically, you would use the queue to send emails, collect metrics or generate reports, amongst a number of other tasks that are either time consuming, prone to failure or can be done in an asynchronous manner that don’t impede the user’s workflow.

Having now used the Laravel event system to queue up listeners as well as background jobs for the last few years, I’ve written up some gotchas and best practices for others to learn from.

Use Multiple Queues and Prioritize Them

By default, Laravel assigns all jobs and event listeners to the default queue. This is typically fine for a low volume web application and quick jobs or event listeners.

However, sometimes you will need to prioritize jobs on the queue so that they are executed before others.

As an example, let’s consider a project planning board like Jira or Asana. When a user is logged into the application, we want to push events down the websocket using the queue. Similarly, the queue is also used to send emails to notify users of changes to the project card/issue.

In the above example, the websocket queue should take priority over emails, as the user is currently logged into the application and would like real time updates while they are interacting with the project planning board, instead of jumping back and forth between their email client to find out every time a co-worker has made an edit or an update.

There are a couple ways to accomplish this in Laravel. You can dispatch the job like so

WebsocketMessenger::dispatch()->onQueue(‘websocket’);

Or update the class itself with a ‘queue’ variable.

class WebsocketMessenger implements ShouldQueue
{

    /**
     * The name of the queue the job should be sent to.
     *
     * @var string|null
     */
    public $queue = 'websocket';
}

I would strongly recommend using enums for both of the above examples so that you can easily rename queues in the future with a single line change.

/**
 * Class Queues
 *
 * References which queue a queueable a class belongs on.
 *
 * @package App\Utils\Enums
 */
abstract class Queues
{

    /**
     * The queue for websocket events/jobs.

     * @var string
     */
    const WEBSOCKET = 'websocket';
}

WebsocketMessenger::dispatch()->onQueue(Queues::WEBSOCKET);

You then prioritize the queue:work command like so.

php artisan queue:work --queue=websocket,email,default

The queue will process everything in the websocket queue before attempting to process the email queue and finally anything on the default queue.

Starvation

The above code example has a potential problem with resource starvation if your websocket queue is always processing events/jobs, preventing the emails from ever sending because it cannot get a free executor slot.

This can be difficult to spot in production without correct monitoring and may take the form of your Redis instance eventually exhausting all memory and throwing an error in your logs like follows.

Laravel-queue-worker INFO   OOM command not allowed when used memory > 'maxmemory'.

As an aside, if you don’t have production alarms set up for Redis memory utilization, you should.

If you chose not to use Laravel Horizon, which will prevent the starvation problem, or some other appropriate solution that will automatically scale your queue runners based on queue size, you can work around this problem through editing your supervisor config like follows to guarantee there is always at least one exectuor running for each queue.

[program:laravel-worker-queue]
process_name=%(program_name)s_%(process_num)02d
command=php artisan queue:work --queue=websocket,email,default --daemon
numprocs=4

[program:laravel-worker-websocket]
process_name=%(program_name)s_%(process_num)02d
command=php artisan queue:work --queue=websocket --daemon
numprocs=1

[program:laravel-worker-email]
process_name=%(program_name)s_%(process_num)02d
command=php artisan queue:work --queue=email --daemon
numprocs=1

[program:laravel-worker-default]
process_name=%(program_name)s_%(process_num)02d
command=php artisan queue:work --queue=default --daemon
numprocs=1

I’ve ignored additional configuration parameters in the supervisor config as those will be specific to your system.

Supervisor

If you are using supervisor to wrap php artisan queue:work, as suggested in the Laravel documentation, you need to be careful to monitor your production system for supervisor failures.

A simple way to do this is to have your logging services send out a notification anytime it finds a string like gave up or entered FATAL state, too many start retries too quickly.

Failure to catch this error can result in your job queue not being processed.

Admittedly this is an exceptionally rare occurrence, but I’ve seen it happen when our multi availability zone database had to swap over from one zone to the other, resulting in all Laravel database connections being dropped for a couple seconds. During these few seconds, supervisor attempted to restart the process and eventually gave up. The database was offline for less than 5 seconds, an almost inconsequential amount of time, but we didn’t detect the supervisor issues for another couple hours due to poor job queue telemetry (since rectified).

To save yourself a similar headache, you can wrap your job queue in a bash script that delays the restarts.

Docker

It might be tempting to install supervisor within your docker image and have it manage the Laravel queue.

Don’t do that.

Docker works best when it runs a single isolated process. It’s much more difficult to catch misbehaving supervisor process that fan out multiple queue workers when they are within a docker container.

Your container orchestration system will bring the container back up if the queue runner exits, and you have the added benefit of being able to precisely specify the memory and CPU allocation for each job queue process within Kubernetes to better solve the bin packing problem.

Instead of using supervisor, you are better off setting up a CMD instruction in your Dockerfile like follows.

CMD php artisan queue:work --queue=default --sleep=1 --tries=5 –-daemon

Furthermore, I would strongly recommend doing some profiling on your jobs so you can better set memory and CPU limits on your kubernetes resources.

Having statsd collect memory_get_peak_usage(true) over time will allow you make better utilization of your server resources.

If you are using Redis for both your cache and queue, you should run them on separate connections and instances.

The first reason is partially outlined above. If you get into a scenario where you have process starvation on the job queue, you may run out of Redis memory, causing both the cache and queue to become inoperable.

The second reason is that for some unforeseen reason, you may one day need to flush your application cache. Perhaps there are some keys that are not expiring which are causing issues, or you have a bug in your code which is causing extreme writes to the cache. You can now safely flush the cache without losing unprocessed jobs.

In short, you will get better isolation and performance.

Config Changes

If you are running a multi-tenant system, you might be dynamically loading different config values within the job queue classes that you are currently executing.

This can be a huge source of unexpected bugs because Laravel wont reboot the framework while running the queue in --daemon mode, thus not resetting the config to the default configured values like would occur during a normal HTTP request/response lifecycle.

You have two solutions here:

Manually reset the config before the job ends. However you have to be careful about exceptions being thrown prior to the “reset”, thus never hitting your reset block and causing further state issues.
Don’t run the queue in --daemon mode which incurs a performance hit.

Logging

Here is a simple hack that I find makes it easier to debug system issues.

Laravel by default will store failed jobs in the database, however you may find it easier to dig through the logs for this information, so you can place the following code in your AppServiceProvider.

/**
 * Log failed queue events so that we have more system visibility.
 *
 * @link https://laravel.com/docs/5.8/queues#dealing-with-failed-jobs
 */
Queue::failing(function (JobFailed $event) {
    \Log::error('Queue Failure', [
        'connectionName' => $event->connectionName,
        'queue' => $event->job->getQueue(),
        'exceptionMessage' => $event->exception->getMessage(),
        'exception' => $event->exception->getTrace(),
    ]);
});

Be careful about logging personally identifiable information with the above solution; especially if you are forwarding your logs to an external logging provider.

Conclusion

I would encourage everyone to give the Laravel Queue and Horizon documentation a thorough read.

The queue system is surprisingly powerful and can be used in high throughput environments.

Reach out if you have any questions.