Using the Server-side Spam Filtering Service

by Clevin Wong

Spam, or junk mail, refers to unsolicited commercial e-mail (UCE) and unsolicited bulk e-mail (UBE). Example includes unsolicited advertisements. Spam spreads everywhere on the Internet nowadays and create problems to most of the e-mail users. Although e-mail companies and standards bodies are trying to work on new ways to deal with this problem, receiving junk mail is still unavoidable with today's e-mail standards and technology. All parties are trying their best to cut down the number of spam:

    • Governments are starting some efforts to ban spam originating from within the territory by setting up laws.
       
    • Companies have built mail filters in some e-mail clients such as Outlook Express and Eudora.
       
    • Tools are available for e-mail service providers to build server-side mail filters.
       
    • Users are aware of the problem and are taking precautions to reduce the chance of getting their e-mail addresses in the hands of spammers.

In order to help users to alleviate the annoying problem of spam, the Computing Services Centre (CSC) launched the Server-side Junk Mail Filtering service on the staff e-mail server on 24 May 2004. Server-side Junk Mail Filters can be regarded as a special kind of e-mail tool that would allow e-mail users to discard unwanted e-mail messages, i.e. junk or spam messages. This kind of e-mail tool operates according to a set of user-defined filter rules on the server-side.

There are three main types of filter rules in this service that can help you filter junk mail:

  1. Spam Auto-Filtering
     

    How It Works
     
    1. A message comes into the central e-mail server from an "outside source".
       
    2. On the incoming mail gateway, the e-mail is scanned by a spam-detection software. In order to minimize the chance of losing any legitimate mail and to let individuals decide what they consider spam, no mail is dropped by the server. Instead, each e-mail is tagged with the "spam level", which is a number representing the likelihood that the e-mail is spam, in its mail headers.
       
      The "spam level" is indicated in the "X-Spam-Level" header line of an e-mail as shown in the example below:
       
      Return-path: <n.987.29123456@classylvialters.com>
      Received: from conversion-daemon.cityu.edu.hk by cityu.edu.hk
      (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003))
      id <0HZ500901GR600@cityu.edu.hk> (original mail from n.987.29123456@classylvialters.com)
      for cctest@cityu.edu.hk; Thu, 10 Jun 2004 23:30:48 +0800 (CST)
      Received: from donald.cityu.edu.hk (donald.cityu.edu.hk [144.214.2.113])
      by cityu.edu.hk (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003))
      with ESMTP id <0HZ50038ZHR227@cityu.edu.hk> for cctest@cityu.edu.hk; Thu,
      10 Jun 2004 23:30:39 +0800 (CST)
      Received: from amy.cityu.edu.hk (amy.cityu.edu.hk [144.214.5.79])
      by donald.cityu.edu.hk (8.12.11/8.12.11) with ESMTP id i5BFTLPI022570 for
      <cctest@mail.cityu.edu.hk>; Thu, 10 Jun 2004 23:30:38 +0800 (HKT)
      Received: from conversion-daemon.mailgw1.cityu.edu.hk by mailgw1.cityu.edu.hk
      (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003))
      id <0HZ500I01HFZ4N@mailgw1.cityu.edu.hk>
      (original mail from n..987.29123456@classylvialters.com)
      for cctest@cityu.edu.hk; Thu, 10 Jun 2004 23:29:22 +0800 (CST)
      Received: from mail121.pareatingleep.com
      (mail121.pareatingleep.com [66.114.250.121])
      by mailgw1.cityu.edu.hk (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8
      2003)) with SMTP id <0HZ500LE5HOQPA@mailgw1.cityu.edu.hk> for
      cctest@cityu.edu.hk; Thu, 10 Jun 2004 23:29:20 +0800 (CST)
      Date: Thu, 10 Jun 2004 22:30:11 -0700
      From: "|:Liquid:|:Nutrition:|" <robert.hastings@classylvialters.com>
      Subject: 30 day supply at no charge - great tasting liquid nutrition.
      To: cctest@cityu.edu.hk
      Message-id: <20040610223123.lmpinzvwdo@classylvialters.com>
      MIME-version: 1.0
      Content-type: multipart/alternative; boundary="Boundary_(ID_nV2caK4KzSNLsKEQs8wlMA)"

      X-Spam-Level: xxxxxxxx (8.518)
      X-Spam-Tests: date_in_past_06_12, excuse_1, html_30_40, html_font_face_bad,
      html_font_face_odd, html_image_only_10, html_message, html_web_bugs,
      marketing_partners, no_cost, supplies_limited
      Original-recipient: rfc822;cctest@cityu.edu.hk

       
      A greater value (e.g. 10) means that the e-mail has a higher probability of being spam. Conversely, a smaller value (e.g. 0) means that the e-mail has a relatively lower probability of being spam.
       
    3. If you enable this function, you need to choose your personal spam threshold. The spam threshold is an integer ranging from 3 to 10. A lower number (e.g. 3) filters more mail while a higher number (e.g. 10) filters less mail. The system will automatically move the e-mail with spam level greater than or equals to your personal spam threshold to the AUTO-PURGE folder. Messages inside the AUTO-PURGE folder will be automatically purged after 30 days. Only one filter can be added for this filter type.
     
  2. Set up Personal Whitelist. On top of this setting, you can set up your whitelist to specify the e-mail that you always accept.
     
    Setting up a Whitelist filter will accept matched incoming messages to the INBOX folder when its header line such as "From:", "To:" or "Subject:" contains/exact matches/wildcard matches (case-insensitive) a phrase. The wildcard matches uses the characters "*" and "?". "*" matches zero or more characters while "?" matches a single character. For example, "f*t" matches "ft", "fit", "foot" or "flight"; while "f?t" matches "fit" or "fat". The phrase must not contain double-byte characters (e.g. Chinese). More than one filter can be added for this filter type.
     
  3. Set up Personal Blacklist. You can also setup your blacklist to specify the e-mail that you always reject.
     

    Setting up a Blacklist filter will either discard or move matched incoming messages into the AUTO-PURGE folder. The matching rules of the header line are the same as those mentioned in (2) above. Messages inside the AUTO-PURGE folder will be automatically purged after 30 days. More than one filter can be added for this filter type.

If you choose to enable spam filtering, you should review the contents of your AUTO-PURGE folder regularly to ensure that legitimate e-mail messages are not accidentally filtered as spam. You will also need to empty your spam folder regularly to prevent the filtered messages from using up your e-mail quota.

Note that each mail filter will be allocated an order number starting from one as the highest priority for matching incoming messages. The ordering of a filter can be moved up or down. For the whitelists and blacklists, users can define multiple filter rules. However, for the Spam Auto-Filtering rule, one can only enable or remove the rule and this rule is always the last one in the filtering order list. Mail filters will be applied to incoming messages from top to bottom. When a given mail filter is matched to an incoming message, the system would not continue to match any other lower order filter with the message. Furthermore, users can delete unused filters using the same Web interface.

In the current phase, the junk mail filter service is available to all staff. This service may be extended to other users in the next phase. For details of this service, please refer to the online user guide. Should you have any enquiries on the new settings above, please contact the CSC Help Desk.