Fixing exim's content scanning
Exim is a very flexible MTA and even if I consider myself a Postfix guy after 7
months I've gotta say I appreciate its flexibility and power. Very
interesting is the built-in perl support, which allows you to define any kind
of function to be used even to change Exim's config/behavior at runtime. One of
the things I use that for is some custom content filtering.
It's outside of the scope of this article to explain Exim's ACLs, and the
documentation has got an entire
chapter about it, altho it's not necessary to understand the rest of this
post.
Introducing the case scenario
From my exim4.conf:
...
acl_smtp_data = acl_check_data
...
begin acl
...
acl_check_data:
...
deny demime = *
condition = ${perl{checkContent}\
{/var/spool/exim4/scan/$message_id}}
message = Invalid content
...
The acl_smtp_data is where you want to do content inspection, the common place
to hook up antispam and antivirus. In english the above means: unpack the
message, check if it contains forbidden content and in case it does deny it
logging an "Invalid content" message.Demime is what unpacks the message in a suitable format for scanning. From the manual:
The demime condition unpacks MIME containers in the message. It detects errors in MIME containers and can match file extensions found in the message against a list. Using this facility produces files containing the unpacked MIME parts of the message in the temporary scan directory. IThen the checkContent perl function is executed, with /var/spool/exim4/scan/$message_id passed as a parameter. That path is where demime unpacks messages. If the function returns 1 the message is denied, otherwise the next acl is considered.
The problem[s]
Content inspection wasn't properly working with html messages so I started looking into it and found a bug which I promptly fixed. Being the bug in the perl function I had the problem of telling exim to reload the file without killing all the current connections. The init script supports reload, so I used it, and got back a bunch of errors... the joy of missing semicolons, but at least that confirmed the perl file was actually read and reloaded. Or so I thought. Because of the evidence of the file being re-read it took me a while before deciding that maybe exim was lying to me. Bottom line a restart was necessary, even tho exim is supposed to start a new perl interpreter for each process, so I'm still not entirely sure why that's been the case.But uncovering this problem brought to light another issue: headers weren't checked, Subject being the easiest one to slip forbidden content through. Subject is passed as part of the DATA session, so I was already in the right place. Looking at the code I noticed an if checking for files' mime-type before executing the checkContent() on them, and wondered if that might been the problem. Unfortunately I managed to find absolutely 0 documentation on demime and how it works, so all I was left with were the source code and strace. Two hrs of strace later I have an .eml and a .com file, the former containing all the email, headers included, and the .com only the body of the message. And I knew that the .eml is never checked.
The if was checking for files whose type matched "text", which is reasonable to avoid scanning binary files we aren't interested into. So I used the utility file to see what the system thought the .eml file was, ang got text/mail as expected. But then if that's the mime-type the if would matched and the file checked!
Knowing how much perl loves to do things in its own creative way, I went checking what this File::MMagic module was doing, and surprise!
NAME
File::MMagic - Guess file type
SYNOPSIS
use File::MMagic;
use FileHandle;
$mm = new File::MMagic; # use internal magic file
# $mm = File::MMagic->new('/etc/magic'); # use external magic file
# $mm = File::MMagic->new('/usr/share/etc/magic'); # if you use Debian
$res = $mm->checktype_filename("/somewhere/unknown/file")
So by default perl uses its own thing rather a system wide standard!
Brilliant. According to perl that was a message/rfc822, which might even
be more correct from some points of view, but that isn't the problem. The
problem is having something non standard to be defined as default.Fixed that, fixed the headers checking.