Great Circle Associates List-Managers
(June 1998)

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Spam Filtering and Messy Details.
From: "Ronald F. Guilmette" <rfg @ monkeys . com>
Date: Tue, 09 Jun 1998 14:58:38 -0700
To: spamtools @ abuse . net, List-Managers @ GreatCircle . COM

[This message is being cross-posted to both the spamtools mailing list
 and also to the List-Managers mailing lists, since it really has a
 lot of relevance to both audiences.]

On the spamtools mailing list, Hugh Browner wrote:

>In theory that is correct, but the reality tends to be messier...

I think that's a nice capsule summary of the majority of what we folks
here attempt to do vis a vis filtering.  Time and time again we see that
the devil is in the details.

For example, the first and perhaps most onerous detail that most of us
have encountered when trying to filter out ``bulk'' junk E-mail is that
it is (or at least can be, in the hands of a real professionla spammer),
for all intents and purposes, indistinguishable from legitimate opt-in
mailing list traffic.

I feel sure that I am not alone among the subscribers to this list in
having spent more than a little time mentally wrestling with this rather
large impediment to effective spam filtering.

Today, I would like to say a word or two about that, and throw out what
may perhaps be a new idea and see what comments it might elicit.

Think about this for a second... The simplest way to filter out all junk
E-mail on any given system which supports a substantial number of different
and independent E-mail accounts/user-IDs would be to write a little routine
in your MTA (or add a front end for your MTA) which simply keeps track of
the number of messages incoming from a given IP address within the past N
minutes (where `N' is some modest number like 30, 60, 120, or 180).  If the
number of ``recent'' incoming message from a given IP address exceeds some
fixed limit (say for example 10% or 20% of the total number of independent
E-mail accounts on your system), then you have your MTA simply block all
further port 25 connects from the given IP address for a period of five
days (after which time most kinds of SMTP clients that might have been
attempting to send mail to your system will have given up in disgust).

This is a near perfect solution to the E-mail spam problem _except_ that
it implies that you _are_ going to accept that first group of spams to that
first 10% or 20% of your user base (which is probably acceptable if you
consider that you are buying protection for the other 80 or 90%) _and_ 
except for the fact that this strategy immediately causes you to lose
essentially all traffice to your users from legitimate opt-in mailing
lists.  (Obviously, this latter bit of collateral damage is certainly
_not_ acceptable to the majority of mail administrators and end users.)

So is there a way to make this work?  I think that the answer is ``perhaps''.

Imagine the following scenario for the evolution of the Internet and for
the evolution of legitimate opt-in mailing lists.

Some party (or parties) declares itself to be a sort-of central clearing
house for information on legitimate mailing lists.  Let's just call it
`'.  That party operates some sort of a server which allows
other sites to obtain (either via push or via pull, whatever seems to make
the most sense) a full and current list of legitimate opt-in mailing lists
_and_ their associated *public encryption keys*.  (Now stay with me here.
It isn't as bad as you think, and I'm *not* going to propose any sort of
exhorbitantly complex Rube Goldberg mechanism... just something simple
based upon public-key encryption technology.)

So anyway, now we have this hypothetical non-profit organization which is
maintaining a server that provides lists of legit opt-in mailing lists and
their associated public keys.  Other sites can ``subscribe'' to this service
and they will receive a full updated dump of the legitimate mailing lists
data base, say, every night, perhaps via ordinary E-mail (push) or perhaps
via some other mechiansm.  (This is just implementation details, so let's
not even dwell on the specifics of how this information gets distributed
to partitipating sites.  That can be worked out later.)

Now, each participating legitimate opt-in mailing list, in exchange for the
privledge of being listed as such by this ``clearing house'' organization
(and by its data base server) must agree to the following simple conditions:

     o	Subscriptions to the list in question are done on a a strictly
	opt-in basis, and adequate records are kept at the home site of
	the mailing list to verify that this is indeed the case when and
	if any questions arise regarding individual subscriptions.

     o	All messages distributed by the mailing list in question must carry
	one additional new header.  The name selected for this header isn't
	terribly important, so let's just call it the `X-Opt-In:' header.

	For each message distributed by/from the mailing list in question,
	the X-Opt-In: header would contain two argument fields, i.e.:

	    X-Opt-In: <official-list-handle> <encrypted-message-timestamp>

Thats the sum total of the requirements for participating mailing lists.

Now, some elaboration...

The <official-list-handle> would be a globally unique designator for the
list in question.  It would be a simple string of letters, digits and hyphens
issued to the list admin by the central clearing house. t the time the list
is registered with the central clearing house.  This mailing list handle
would be very similar in both sprit and function to a NIC handle... i.e. a
unique designator for the list which would remain the same even if the list
in question is relocated to a different server or if the envelope return
address on the outgoing list messages must change in some way.

The <encrypted-message-timestamp> would be just the string in the Date:
header of the outgoing message, but encrypted with the *private key* which
belongs to the list owner/administrator.  (This private key would have to
be kept secret by the list administrator.)

On the receiving end, sites which receive mail to some large number of their
users (say 20 or more) at about the same time from a given IP address would
check the message after the first 20 for the presence of a valid X-Opt-In:
header.  If that is not present, or if such a header is present but the
decryption of the <encrypted-message-timestamp> part of the X-Opt-In: header
(using the public key known to be associated with the relevant mailing list)
fails to match the string in the Date: header, then the message is rejected,
hopefully at the SMTP/MTA level, _before_ it even gets written to disk on
the receiving system.

So anyway, that's the idea in a nutshell.  This would completely thwart mass
spamming from illegitimate (opt-out) mailing lists.  Spammers would still be
able to spam, but their spam messages would not even be seen by the vast
majority of their intended recipients.  (Imagine if the spammer could only
spam a maximum of 20 people on each of,,,, etc., etc.  This would pretty much take all of the fun and all
of the potential profit out of E-mail spamming, and I think that after awhile
they would all just give up.)

Potential issues/problems:

     o	All of the mechanism for generating and attaching the new X-Opt-In: 
	header would have to be integrated into all of the most commonly
	used mailing list administration packages.  It would have to be
	pretty much of a nearly-no-brainer for the mailing lists admins to
	use this new (optional) feature.  Basically, they mailing list
	admin packages would have to allow the list admins to just enter
	their list handles and their private keys into their mailing list
	configuration files, and then the list management packages would
	have to automatically do all the rest of the work.

     o	Where do you get a nice public-key encryption mechanism that *isn't*
	going to cost people money in the form of royalties.  (If mailing
	list admins have to pay _anything_ for this new feature then they
	will just balk and they will _never_ buy into it or implement it.)

	It turns out that this isn't such a big problem as you might think.
	RSA data systems owns patents on one form of public-key encryption
	technology and based on the reports that I have seen, then tend
	strongly towards the butthead end of the spectrum when it comes to
	being agressively litigous in trying to enforce their somewhat
	questionable patents in this area, but fortunately, there's a whole
	'nother implementation of public-key encryption technology that's
	available royalty-free, thanks to the U.S. Guberment, so that could
	be used instead. (I'm already using this alternative stuff in my own
	commercial package, and I ain't gonna be payin' anybody a dime for
	the privledge.)

     o	Fear and loathing (or, alternatively, fear uncertainty and doubt, aka
	the FUD factor).

	Essentially all of the major traditions and mechanisms of the Internet 
	are designed to _avoid_ reliance upon any sort of central authority
	whenever possible and at all costs.  In those rare instances where
	there has seemed to be no viable alternative to having some central
	coordinating authority (e.g. in the case of the domain name space
	and/or the IP address space) people have generally reacted with a
	conbination of fear and loathing (with the emphasis clearly on
	loathing) to the ways in which the powers of the central authority
	or (or are not) exercised.  The result is often great raging debates
	and ongoing drawn-out conflicts.

	The central mailing list information clearing house would likewise
	and inevitably become the subject of heated debate and, most likely,
	raging hostility.  Spammers would of course represent the first wave
	of hostility towards such an entity/organization, but that is not
	something that I personally would give a rat's ass about.  (Let 'em
	complain.  Who cares?)  More serious issues would inevitably be raised
	with respect to the decisions and judgement calls that this organiza-
	tion would have to make regarding which lists are and which lists are
	not ``legitimate'' opt-in mailing lists.  But I believe that in the
	case of this kind of a central authority, the exercise and potential
	abuses of power would be entirely less of a concern than in the cases
	of (say) the Internic or ARIN/RIPE/APNIC because sites could easily
	get the same sorts of information from other other, competing pro-
	viders of this same sort of service.  In short, I believe that it
	would be far more likely for this sort of an ``information service''
	to evolve to the point where there would be perhaps three or more
	major independent providers of this information... exactly like there
	are now three major credit bureaus... than it would be for something
	like an Internic or an ARIN to get some meaningful competition from
	alternative providers of the type of information that _they_ provide.

	In short, I am not too worried about this postulated ``central clearing
	house'' for mailing list information becoming an abusive, egotistical,
	unresponsive monopoly in the same way that (some say) the Internic has
	done.  It just seems to me that such an evolution/outcome would be
	much less likely in this case.  Still, there are those who will decry
	_any_ kind of singular authority if it has any significant impact upon
	the day-to-day functioning of the Internet, so whoever runs this thing
	would need to have a VERY thick skin.

So?  Any comments?

Let's say that I set this all up next week.  Just suppose.  Can I see a show
of hands of how many of you mailing list administrators would actually sign
up, get an official handle for your mailing list and get a public/private
encryption key pair?

If nobody on the planet thinks this is a good idea, then I guess I'll drop
the whole thing.  But this _is_ a viable, complete, and long-term solution
the whole bulk spam problem.  If only legit people can successfully send out
E-mail in bulk, then the days of spam (and spammers) will be gone forever.

-- Ron Guilmette, Roseville, California ---------- E-Scrub Technologies, Inc.
-- Deadbolt(tm) Personal E-Mail Filter demo:
-- Wpoison (web harvester poisoning) - demo:

Indexed By Date Previous: Re: HTML email
From: Christine Code <cmc @ ferret . net>
Next: Re: Spam Filtering and Messy Details.
From: dattier @ wwa . com (David W. Tamkin)
Indexed By Thread Previous: Steering an OT back on topic
From: Mike Nolan <nolan @ celery . tssi . com>
Next: Re: Spam Filtering and Messy Details.
From: dattier @ wwa . com (David W. Tamkin)

Search Internet Search