Well, after having written a huge anti-spam system, it now time to solve the reverse problem, sending out huge amounts of email. Only kidding, but the idea of scaling email sending using PHP is quite interesting.
The reason this has been relivant in the last two weeks is two fold, first off, my slow and sometimes painfull rewrite of mtrack has got to the point of looking at email distribution. Along with this I have a project that needs to distribute press releases, and track responses. Since both projects now use the same underlying component framework (Pman.Core and Pman.Base). It seemed like an ideal time to write some generic code that can solve both issues.
Classic mailing, you press, we send...
I've forgotton how many times I've written code that sends out email, pretty much all of it tends to be of the varient, that the user of the web applicaiton presses a button, then the backend code generates one or many emails, and sends it out. Most frequently using SMTP to the localhost mailserver.
In most cases this works fine. You might run into trouble if your local mailserver is down or busy, but for the most part it's a very reliable way to send out less than 10 emails in one go.
Queues and bulk sending
One of my associates makes a tiny amount of money by offering the service of sending out newsletters and news about bar's and restaurants, to do this he bought a commercial PHP package, which I occasionally have the annoying task of debugging and maintaining. What is interesting about this package are the methods it uses to send out email. Basically once you have prepared a mailout, and selected who it goes to, it creates records in a table that goes something like this:
User X | Mailout Y
123 | 34
124 | 34
There are two methods to then send these mailouts, first is via the web interface, that uses a bit of ajax refresh to keep loading the page and send out a number of emails in on go (eg. 10 at a time). or there is the cron version that periodically runs and tries to send out all the mails in that table.
This method always sends to the localhost mailserver, and let's that sort out the bounces, queuing, retry etc. It has a tendancy to be very slow , and use up a huge amount of memory if sending out huge volumes of email. Most of it get's stuck in the mailserver queue, and the spammer has no real idea if the end users might have recieved it. If the mailserver get's stuck or blocked, the messages can often sit in the queue until they expire 2 days later, by which time the event at the bar may have already occurred.
The MTrack way
I'm not sure if I mentioned before, but I was intreged by the method used by mtrack when I first saw it. For those unaware of what mtrack is, it's a issue tracker based on trac. One of it's jobs is to send out emails to anyone 'watching' a project/ bug etc.
From my understanding of what mtrack was doing (the original code has long been forgotten and removed now). Is that it set up a 'watch' list, eg. Brian is watching Project X, and Fred is watching Issue 12.
When Issue 12 changed, or someone committed something to Project X, no actual email was sent at that moment. This obviously removed a failure point on the commit or bug update, and if you had 100's of people watching an issue (like launchpad) for example, this would prevent the server hanging while it was busy sending all the emails.
The unfortunate downside was that to make the notifications work a cron job was required, this cron job had to hunt and find all the changes that had occurend and cross reference that with all the people who may have been watching those issues. The code for which was mindblowingly complex, and i suspect was a port of the original trac code.
As somebody who basically looks at really complex conditional code and wonders 'is that really the best way to do this', I really had to come up with an alternative.
Bulk mailing done right....
So to solve my issues with mtrack and the other project, I devised a system that was both simple and fast at the same time. Here's the lowdown.
First off, for both the Mtrack and mailout system, they both generate the distribution list when the web application user pushes the button. So for Mtrack, somebody updates the ticket (adding a comment for example). And the controller for the ticket page basically does a few things
a) If you have modified the ticket owner (or developer) make sure they are on the 'watch list' or subscribers. b) ask the watch list (the dataobject code for core_watch) to generate a list of people to notify (in our core_notify table), and make sure we do not send an email to the person filling in the form (as he knows he just did that and does not need to be reminded..) For the other mailout system, It also just generates elements in the core_notify table, actually since the database table for the distribution targets different in that application, we actually have a seperatea table called XXXX_notify, and using the joy's of DB_DataObject and Object orientation, that class just extends the core_notify table, from what I rembember the only bit of code in that class is var $__table = 'XXXX_notify', since the links.ini handles the reference table data. And now for the really cool part, sending the mails out. Obviously this is done via cron jobs (so as not to distrupt the user interface). The backend consists of two parts (pretty much how a mailserver works.). The first is the queue runner. This basically runs through the notify table, and makes a big list of it's of what to send out. This uses the ensureSingle() feature of HTML_FlexyFramework, to ensure only one instance of the queue can be running at once.
Then rather than sequentially sending each email, it basically proc_open's a new PHP process to send each email. This enables the queue to send concurrently many emails, rather than relying on a single pipleline. The code monitors these sub processes, and ensure that only a fixed number are running at the same time. We do not want to look to much like a spammer to our ISP..
Now to actually test all this....