Problems with inbound email – source IP addresses

Once “the Internet” discovers a new SMTP service, it will be hammered by the spammers' botnets.

History Lesson

In the Old Times, every email was wanted, and every connection that was made was precious. Even spam, which was invented in 1978 (by a salesman). Then in 1994, large-scale spam was invented (by a lawyer, of course; because there were no laws on the subject, no laws were being broken) ... and by 1996 the idea of 'reputation' for a server was developed and the first blacklists of IP addresses were developed.

Of course, as any infosec practitioner should be able to tell you, maintaining blacklists is fundamentally impractical. Unfortunately for the Internet, maintaining whitelists of trusted senders was also impractical, as more and more companies sprang up on the Internet running their own mailservers.

Some of these new mailservers were configured in the old default co-operative manner – they didn't care about where a message came from, they only cared about trying to deliver it. This original position was no longer tenable, and became shamed as an “Open Relay”, and this damaged their reputation so that no-one else accepted their messages ... and when companies reputations are threatened, they sometimes fight back. Some of the blacklist operators went out of business as a result.

In the early 2000s, email content analysis was an increasingly popular approach, but did not put an end to the rising tides of spam. Laws were passed, and enforcement in many countries caught up with some egregious offenders; but as spam is effectively free to send, and some users still receive, read and interact with it (for a multitude of reasons), it continues. I'm not going to discuss content analysis here, we don't have time for it.

As we move into the early 2020s, the techniques for sending spam have changed from abusing the pre-existing and broadly legitimate email infrastructures, to sending via 'botnets' – vast collections of computers and other internet-connected devices that have been compromised by attackers and without the real owners' permissions are used to attack the rest of the Internet.

The other main sending technique is more invidious – email services belonging to real people are compromised, generally via phishing. Then the compromised accounts are used to send outbound email – sometimes in a targeted fashion to existing contacts of the user, but also sometimes just a large-scale attack on un-related destinations.

Types of spam

A non-exhaustive selection of spam types ...

Many people are somewhat wary of email coming from an unknown sender, but conversely much more trusting of email that claims to come from an existing correspondent. With the use of compromised accounts for sending email, the trust mechanism is even easier exploit.

Techniques for defence

IP blacklisting / reputation scoring

Blacklisting IP addresses was discussed above, and attempts to enumerate 'known to be bad' sources. Unfortunately this doesn't help you with 'not yet known' sources – although the absence of a 'known to be good' signal might sound like it is helpful.

However, reputation lists suffer from some issues. The range of IP addresses is large for IPv4, and immense for IPv6. Therefore your blacklist can grow to unmanageable size very quickly, and you're left considering when you can remove entries from it, which only increases the size of the 'not yet known' category.

Ideally, your blacklist should be a co-operative effort with other people listing IPs that they have found to be bad, in order to reduce the 'not yet known' group. In practice, commercial providers running anti-malware services aggregate their observations and create these lists, but they tend not to sell them independently from their services.

And the whitelist approach has some deficiencies, which only SPF has attempted to allay; one unexpected problem is that so many companies have outsourced their email handling to a small list of global suppliers, and “everyone is using gmail” means that you have to assume that some compromised accounts will be sending spam from the very sources that you wanted to whitelist.

It ends up being a bit like the Battle of Wits from The Princess Bride ...

All I have to do is divine from what I know of your IP address. Are you the sort of person who would send spam from his own server, or from his enemy's? Now, a clever man would send the spam from his own server, because he would know that only a great fool would accept a connection from an unknown sender. I'm not a great fool, so I can clearly not accept the IP address in front of you. But you must have known I was not a great fool; you would have counted on it, so I can clearly not accept the IP address in front of me.

IP greylisting

It has been observed that the software used by botnets to send email doesn't tend to have the same approach to 'reliability' as a normal mailserver – if there is a problem in delivery, botnets tend to just assume the worst and move on to the next target, whereas a legitimate email server will hang on to the message and retry repeatedly over time, giving the far end the opportunity to fix their presumably broken servers.

As such, if you start every conversation with a previously unseen server with an immediate disconnection, you are relying on legitimate mailservers to retry later (when you will accept their connection) and on botnets to give up on you and not come back.

There are two failures associated with this – large services like gmail maintain a pool of many outbound email servers, and the next retry will probably not come from the same IP address anyway. So you have to reject your way through most of their servers before any get accepted, and if your greylist list is itself time-bounded (to prevent it growing uncontrollably) you might never be able to reliably accept email from such a source.

The other failure is that although that one single message might have been avoided, the botnet operators simply don't care, and your email address will stay on their lists, and they'll still attempt all their other deliveries to you anyway.

The final issue affects everyone – timeliness of delivery. Greylisting doesn't allow you to specify when a message should be retried, it is up to the sending server. Many people are getting used to the idea that email is fast – go to a website, fail to login, fill in the 'forgotten password' form, receive the email with the link to re-enable access ... wait. Retry. Wait again. Curse. Go do something else ... and eventually the reset email turns up, but you're busy now with a different task ... this time-scale isn't controllable by you, and the sending server doesn't tell you what it will be doing. The users suffer.

SPF, DKIM, DMARC, DNSSEC

These technologies all try to provide methods to 'prove' that a given connection is legitimate, and that you as the receiving server should accept the message they represent. They all need a form of mutually-agreed “authority”, and in this case they rely on the Domain Name System, which seems like a reasonable decision, because without the DNS you probably aren't able to understand email addresses anyway. But with the full 'commercial' or 'criminal' value of email being established by the amount of spam there is in the world, you need to make sure that the data in your DNS entry is secure against tampering, and for that you need DNSSEC. But uptake of DNSSEC over the past 6 years hasn't been terribly widespread, so you can't take a position to reject email that doesn't rely on it being present ...

And they're all basically forms of content analysis, so they're out-of-scope for this blog post, which is long enough already!

Conclusions so far ...

Because there is so much spam being sent, an email server needs low-cost techniques to reduce as much of the load as possible. Rejecting a connection to your email server before you have to process the content is valuable.

IP reputation lists are mostly commercial now, but only available if you use the associated product to handle your email.

IP Greylisting is still reasonably effective, but burdens the end-user with unpredictable delays to inbound messages.

Effectiveness – based on my observations running two large email sites over the last 10 years or so, both techniques are broadly 80% effective, but it is difficult combining anything with a commercial product without increasing complexity.