Your message dated: Thu, 08 May 1997 01:16:19 +0200
> >The Internet side is typically about half the total volume of the AOL
> >mail system as a whole.
> Internal mail from AOL user 1 to AOL user 2 doesn't go through SMTP and I
> don't see how it is relevant to this discussion, other than conveniently
> providing the 23M of deliveries you were missing :-)
The problem is that the Internet side of the equation is not the
limiting factor here. If everything else goes smoothly, the Internet
side can *easily* handle that kind of load, without breaking a sweat.
I'm confident it could handle twice or even three times that load
without too much difficulty, if there weren't problems elsewhere.
But, there are problems elsewhere, and to protect the mail system
as a whole, we have to use the Internet side to aggressively filter
out the illegitimate mail (since virtually all illegitimate mail comes
from the Internet, and illegitimate mail as a whole poses more of a
risk to the system due to certain aspects of its nature)
> > Now, how many million messages did you receive yesterday?
> Receive, not that many. There are the bounces of course, but like any
> other large mailing list shop, we receive a lot less than we send.
You use the measure you want, we'll use the measure we want.
> imagine AOL is the opposite and receives a lot more than it sends.
Typically, we recieve about twice as much as we send. That's
still a boatload of mail that we send, but since the expansion
factor per outbound email message is low (on average, about two
recipients per message), we don't get your economies of scale with
hundred or thousands of recipients from a single message all served
by the same set of MXes, etc....
Nevertheless, our outbound system is not the problem here.
> Anyway, I think what you really want to know is the number of SMTP
> transactions that we've made, regardless of the recipient count, right?
Not exactly. Since virtually all mail that is sent is
transmitted as soon as it is recived by the other end, it makes
very little use of connection caching.
Since much of the expense of SMTP transactions is setting up and
tearing down the connection (otherwise connection caching wouldn't
be an issue), connection caching typically only comes into play
when someone is delivering a large number of previously queued
messages, perhaps from a mailing list.
Since you've got extremely large economies of scale due to
large numbers of recipients typically served by a set of MXes,
and you've surely optimized your delivery to make maximum use of
connection caching, the actual number of SMTP connections you make
is actually probably quite small. Considerably smaller than the
total number of message envelopes delivered, which is probably
considerably smaller than the total number of recipients per message.
Making SMTP connections is relatively cheap, since you can
choose whether or not to have another queue runner fork off (if
you pre-fork off a set of worker processes, then if you've got
any worker processes that are currently idle), you can control
how much of a load on your system that sending messages presents.
You can also control how long you wait for various sorts of things
to happen before you time out, so that you deliver large quantities
of mail to fast servers in a very short period of time, while slower
servers end up getting relegated to the bottom of the list.
Since receiving mail is inherently interrupt-driven, and you
can't force the other end to make use of connection caching (you can
only sit there and wait to accept multiple messages per connection
if the other end chooses to send them that way), what I want to
count is the total number of SMTP connections you receive per day.
Everything else is superfluous.
Programming something like delivering large quantities of mail
out of a queue is relatively easy, since you don't have to accept
connections (or not, if the system load is too high) and then pass
them off to child/worker processes, a mechanism that is inherently
fork/exec style, but which can be programmed (with no small amount of
difficulty) in a pre-fork/worker process style.
However, that is at least as hard, if not harder, than
writing a program to solve an inherently recursive problem in a
non-recursive manner. At least, programmers doing that sort of
work have an extensive body of pre-existing work that shows how to
use stacks to simulate recursion, so that there's relatively little
"new" stuff that has to be done to "unroll" an inherently recursive
process into an iterative one.
> I have no intention of either giving up my day job, moving to the US or
> joining AOL, nor do I see any reason why this would be necessary in order
> to accomplish the stated goals. Nevertheless, I was making a serious
> business proposal.
I will pass on to the mail systems development management that
you want to re-write the AOL Internet mail gateway system using
LSMTP. If that's something we can do in parallel with our other
efforts (and without a great deal of support required to teach you
how the back end works), then they might be willing to listen.
However, I am not in development, and applying development
solutions to operational problems is not a method I have available
to me. Only the development folks can decide whether or not that
is a solution they can support (However, I think it unlikely, given
how thinly they're already stretched).
We did previously look at using PMDF as the basis for our gateway
system, but rejected it once we realized what the API was, and the
amount of programming that would be required on our part to get the
messages out of their proprietary internal database and into ours.
If we're going to do that level of programming anyway, we might as
well write the thing from scratch.
> I see that Matt Korn is still your VP of Operations, so it
> looks like I may be preaching to the choir :-)
He keeps remarking on occasion how much mail could be handled by
VM SMTP, but he's changed his tune a bit since we found a bug in that
code with regards to the way it handles MX RRs. Especially since
we were forced to work that out the hard way, as no one at IBM
was willing to work directly with us, and the IBM customers they
were willing to work with didn't have enough information about the
problem to describe it sufficiently well.
After we'd sufficiently "black-boxed" the thing, and worked with
these customers applying multiple rounds of "Okay, we've installed
this patch to our mainframe, does it work now? No....", we finally
got that one worked out.
We've also pointed out to him how expensive mainframes still
are, how much power, space, and cooling they require, and how many
of them we'd require to do the job. Besides, we'd be replacing one
sort of mainframe with another (as a part of the overall system),
and we know that mainframes inherently do not scale to the size of
the operation we have today, much less where we need to be.
Brad Knowles MIME/PGP: KnowlesB @
Senior Unix Administrator <http://www.his.com/~brad/>