Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

SpamAssassin: A practical guide to integration and configuration
SpamAssassin: A practical guide to integration and configuration
SpamAssassin: A practical guide to integration and configuration
Ebook602 pages3 hours

SpamAssassin: A practical guide to integration and configuration

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Written specifically for busy network and system administrators, the book is a detailed and practical guide to implementing the right antispam solution for your network and your business requirements. You'll go from a detailed walk through of initial set up, to advanced configuration options like Bayesian filtering, listing, rewriting, and rules. The book shows how to optimize SpamAssassin for all major mail servers and clients. If you are a network or system administrator and you're either using or evaluating SpamAssassin, this book will increase your understanding and transform your productivity.
LanguageEnglish
Release dateSep 27, 2004
ISBN9781847190062
SpamAssassin: A practical guide to integration and configuration
Author

Alistair McDonald

Alistair McDonald is a freelance IT consultant based in the UK. He has worked in IT for over 15 years and specializes in C++ and Perl development and IT infrastructure management. He is a strong advocate of open source, and has strong cross-platform skills. He prefers vim over vi, emacs over Xemacs or vim, and bash over ksh or csh. He is very much a family man and spends as much time as possible with his family enjoying life.

Related to SpamAssassin

Related ebooks

Information Technology For You

View More

Related articles

Reviews for SpamAssassin

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    SpamAssassin - Alistair McDonald

    Table of Contents

    SpamAssassin

    Credits

    About the Author

    About the Reviewers

    Introduction

    What This Book Covers

    What You Need for Using This Book

    Conventions

    Reader Feedback

    Customer Support

    Downloading the Example Code for the Book

    Errata

    Questions

    1. Introducing Spam

    Defining Spam

    Definitions

    The History of Spam

    Spammers

    The Costs of Spam

    Costs to the Spammer

    Costs to the Recipient

    Spam and the Law

    Summary

    2. Spam and Anti-Spam Techniques

    Spamming Techniques

    Open Relay Exploitation

    Collecting Email Addresses

    Hiding Content

    Statistical Filter Poisoning

    Unique Email Generation

    Trojanned Machines

    Anti-Spam Techniques

    Keyword Filters

    Open Relay Blacklists (ORBLs)

    ISP Complaints

    Statistical Filters

    Email Header Analysis

    Non-Spam Content Tests

    Whitelists

    Email Content Databases

    Sender Validation Systems

    Sender Policy Framework (SPF)

    Spam Filtering Services

    Collect and Forward

    Collect and Return

    Send and Forward

    Choosing an Anti-Spam Service Provider

    ISP-Provided Services

    Anti-Spam Tools

    SpamAssassin

    How SpamAssassin Works

    Easy to Use

    Techniques Used by SpamAssassin

    Summary

    3. Open Relays

    Email Delivery

    Open Relay Tests

    Automated Open Relay Testers

    Manual Open Relay Testing

    MTA Configuration

    Sendmail

    Sendmail Versions 8.9 and Above

    Sendmail Versions Below 8.9

    Postfix

    The mynetworks Configuration Directive

    The relay_domains Configuration Directive

    Exim

    Exim Configuration Parameters

    qmail

    Summary

    4. Protecting Email Addresses

    Websites

    Alternative Character Representations

    JavaScript

    Usenet

    Trojan Software

    Mailing Lists and Archives

    Registration for Websites

    Tracking Email Address Usage

    Sendmail Plus Technique

    Rogue Employees

    Employees

    Business Cards and Promotional Material

    How Spammers Verify Email Addresses

    Web Bugs

    Summary

    5. Detecting Spam

    Content Tests

    Header Tests

    DNS-Based Blacklists

    Statistical Tests

    Message Recognition

    URL Recognition

    Examining Headers

    Faked Headers

    Reporting Spammers

    Valid Bulk Email Delivery

    Summary

    6. Installing SpamAssassin

    Building from Source

    Prerequisites

    Checking Current Configuration

    Installing Perl

    Installing CPAN

    Testing for a C Compiler

    Using CPAN

    Installing by Hand

    Resolving Build Failures

    Packaged Distributions

    RPM

    Debian

    Gentoo

    Other Formats

    Windows

    Verifying the Installation

    Upgrading

    Uninstalling

    Uninstalling from Source

    Using CPANPLUS

    Other Packages

    Uninstalling on Windows

    SpamAssassin Components

    Executables

    Perl Modules

    Documentation

    Summary

    7. Configuration Files

    Configuration Files

    Standard Configuration

    Site-Wide Configuration

    User-Specific Configuration

    Rule Files

    Rules

    Scores

    Summary

    8. Using SpamAssassin

    SpamAssassin as a Daemon

    Creating a User Account

    SpamAssassin and Procmail

    Testing for Procmail

    Obtaining and Installing Procmail

    Configuring Procmail

    MTA Configuration

    sendmail

    Postfix

    Exim

    qmail

    Configuring User Accounts

    Site-Wide Procmail Usage

    Integrating SpamAssassin into the MTA

    Sendmail

    Sendmail Milter Support

    MIMEDefang

    Postfix

    Exim

    qmail

    Testing and Troubleshooting

    Check the MTA

    Further Diagnosis

    Rejecting Spam

    Summary

    9. Bayesian Filtering

    Scoring

    Training

    Confirming Operation

    Filter Training

    User Involvement

    Local Users

    Unlearning

    Auto-learn Thresholds

    Bayesian Database Files

    Removing a Bayesian Database

    Sharing a Bayesian Database

    Disabling Bayesian Filtering

    Summary

    10. Look and Feel

    Headers

    Changing Headers

    Creating Headers

    Removing Headers

    Reports

    Enabling and Disabling Reports

    Changing Reports

    Subject Rewriting

    Summary

    11. Network Tests

    RBLs

    SURBLs

    SpamAssassin 2.63

    Vipul's Razor

    Installing Razor

    Configuring Razor

    Configuring SpamAssassin

    Testing Razor

    Pyzor

    Installing Pyzor

    Configuring Pyzor

    Configuring SpamAssassin

    Testing Pyzor

    Pyzor Headers

    DCC

    Installing DCC

    Configuring SpamAssassin

    Testing DCC

    DCC Headers

    Spamtraps

    Choosing a Spamtrap Address

    Baiting the Spamtrap

    Configuring the Email Account

    Summary

    12. Rules

    Writing Rules

    Rules Performance

    Meta Rules

    Writing Positive Rules

    Examples of Positive Rules

    Rawbody Rules

    Using a Corpus to Test Rules and Scoring

    Corpus Development

    The Public Corpus

    Testing SpamAssassin on a Corpus

    Examining Hit Frequencies

    Using Other Rulesets

    Summary

    13. Improving Filtering

    Whitelists and Blacklists

    Manual Whitelisting and Blacklisting

    Whitelisting Domains

    The Auto-Whitelist

    Resolving Incorrect Classifications

    Examining Messages

    Changing the Spam Threshold

    Re-weighting Test Scores

    Increasing the Score of Spam Emails

    Coping with False Positives

    Bayesian Unlearning and Relearning

    Character Sets and Languages

    Disallowing Languages

    Disallowing Character Sets

    Summary

    14. Performance

    Bottlenecks

    Memory

    CPUs

    Disk I/O

    Network I/O

    Determining Bottlenecks

    Performance Improvement Methodology

    Using the SpamAssassin Daemon

    Integrating SpamAssassin into the MTA

    Omitting Messages

    Large Messages

    Disabling Tests

    Running Network-Based Tests First

    Razor, Pyzor, and DCC

    Using Additional Machines

    Faster File Locking

    Using SQL

    Requirements

    MySQL

    Configuration

    Spamd with SQL

    SQL for User Preferences

    Adding New User Preferences

    Displaying User Preferences

    Altering User Preferences

    Deleting User Preferences

    Testing if SQL User Preferences Are Being Used

    Preference Precedence

    SQL for Bayesian Databases

    Testing if the SQL Bayesian Database Is Being Used

    The Auto-Whitelist Database

    Testing if the SQL Auto-Whitelist Database Is Being Used

    Summary

    15. Housekeeping and Reporting

    Separating Levels of Spam

    Detecting When SpamAssassin Fails

    Spam and Ham Reports

    Spam Counter

    Keeping Statistics Over a Period of Time

    Determining SpamAssassin Processing Time

    Summary

    16. Building an Anti-Spam Gateway

    Choosing a PC Platform

    Choosing a Linux Distribution

    Installing Linux

    Configuring Postfix

    Accepting Email for the Domain

    Mail for the root User

    Basic Spam Filtering with Postfix

    Forwarding Email to the Original Email Server

    Reloading Postfix

    Testing Postfix

    Installing Amavisd-new

    Installation from Package

    Installing Prerequisites

    Installing from Source

    Creating a User Account for Amavisd-new

    Configuring Amavisd-new

    Configuring Postfix to Run Amavisd-new

    Configuring External Services

    Firewall Configuration

    Backups

    Testing

    Going Live

    Summary

    17. Email Clients

    General Configuration Rules

    Microsoft Outlook

    Microsoft Outlook Express

    Mozilla Thunderbird

    Qualcomm Eudora

    Summary

    18. Choosing Other Spam Tools

    Spam Policies

    Evaluating Spam Filters

    Configuring the Second Filter

    Using a Single Machine

    Using Separate Machines

    Sendmail

    Postfix

    Exim

    qmail

    Other Techniques

    Greylisting

    SPF

    Sender Validation

    Summary

    A. Glossary

    Index

    SpamAssassin

    Alistair McDonald


    SpamAssassin

    A Practical Guide to Configuration, Customization, and Integration

    Copyright © 2004 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, Packt Publishing, nor its dealers or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First edition: September 2004

    Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.

    ISBN 1-904811-12-4

    www.packtpub.com

    Cover Design by www.visionwt.com

    Credits

    Author

    Alistair McDonald

    Additional Material

    Chris Santerre

    Technical Reviewers

    Kevin Peuhkurinen

    Chris Santerre

    Commissioning Editor

    Louay Fatoohi

    Technical Editors*

    Deepa Aswani

    Ashutosh Pande

    Layout*

    Ashutosh Pande

    Indexers*

    Niranjan Jahagirdar

    Ashutosh Pande

    Proofreader

    Chris Smith

    Cover Designer

    Helen Wood

    * Services Provided by Editorialindia.com

    About the Author

    Alistair McDonald is the founder and Managing Director of InRevo Ltd, an IT consultancy based in Berkshire, UK. He worked for several large corporations before founding InRevo in 1994. The company offers security, email, and other IT consultancy, as well as bespoke development.

    Alistair is a developer specializing in C++ and Perl. When first introduced to Perl, he described it as a whole new level of flexibility. Alistair got involved with the role of email administrator for one of InRevo’s clients, and subsequently honed his skills setting up servers for InRevo.

    Alistair lists his favorite open-source projects as GNU Emacs, the Linux kernel, the Gentoo Linux distribution, the Perl language, SpamAssassin, and Postfix. He is also a big fan of xplanet and xscreensaver for eye candy.

    Alistair is very much a family man, and enjoys spending time with his wife and two children in and around Berkshire, where they have lived for the past ten years.

    I can recall getting my first spam email. This was in the mid nineties, when CompuServe provided Internet email addresses for the first time. I had heard of spam, but not experienced it. Strangely, that first spam made me feel that I’d come of age in the Internet, but the second, third and fourth spams soon made me realize what an inconvenience spam was. Back then, I did not realize how much spam would affect the Internet, and how much effort would be put into solving it. I guarded future email addresses until I started using SpamAssassin.

    I hope that this book assists fellow system administrators to install and configure SpamAssassin. It really is a great solution to spam and takes very little time to set up.

    Writing this book has not been a solo effort, and several people deserve special mentions.

    First, my wife Louise, who has put in many long nights critically examining drafts and improving my use of English, while single-handedly bringing up two very lively children. Despite her attempts to eradicate all commas from the text, one or two may remain.

    Several friends and colleagues have commented on draft chapters and contributed ideas and inspiration. I would like to take this opportunity to thank them publicly for their efforts. They are: Paul Serjeant, Ian Haycox, Colin Jenkins, and Jamie O’Shaughnessey.

    During the writing of this book, I had the misfortune to spend much of my time away from home. This was made bearable as much of the time was spent with my parents. I’d like to thank them for looking after me so well, and I’d also like to apologize for being such an appalling and antisocial house guest at times.

    Of course, there are many more people to thank. All the SpamAssassin developers, past and present, should be congratulated for creating such an effective tool. Their work is based on the many developers of the Perl language, another great free software project. Hats off to all of you for your hard work and ingenuity.

    Finally, a big thank you to the Trade Router team, for all their inspirational comments. Keep having five a day!

    I wrote this book on a Dell laptop running Gentoo Linux. I used vmware to install a total of seven different virtual machines for testing—four separate Gentoo configurations for Sendmail, Postfix, Exim, and Gmail, a Windows 2000 installation, a RedHat 9 installation, and a Debian installation, installed from the wonderful Knoppix CD.

    This book is dedicated to my children, Imogen and Keir— So lively during the day, and so peaceful at night.

    About the Reviewers

    Kevin Peuhkurinen lives in rural Ontario and works as a network security analyst for a financial institution in Toronto, where his incessant Open Source evangelism often annoys his co-workers. When not fighting spam he likes to ride large motorcycles and go lure coursing with his two Irish Wolfhounds. He can be reached on the SpamAssassin-users mailing list and is always happy to help out others.

    Chris Santerre is a System Administrator working in Providence, Rhode Island. He started the SpamAssassin Rule Emporium (SARE) at www.rulesemporium.com, which hosts custom rulesets for SpamAssassin. He created a ruleset called BigEvil that looks for known spammer URLs in an email. He is also a content provider for www.SURBL.org. Chris continues to work with the SARE ‘ninjas’ to update SARE rules for SpamAssassin, and keep it as fresh as possible. He also encourages everyone to go see a live professional ice hockey game!

    Introduction

    SpamAssassin is an open-source spam detector. It is considered the best of breed, and is used by many large organizations and also as the basis for commercial services and products.

    SpamAssassin is free to download, install, and use, and is very customizable, configurable, and scalable to large architectures. It can be installed in one afternoon, but rewards further time spent on improving the detection rate.

    This book provides a complete guide to the installation, configuration, and customization of SpamAssassin. It also discusses the history of Spam and the various techniques used to combat it. It includes detailed instructions for the most popular Mail Transport Agents (MTAs): Sendmail, Postfix, Exim, and Qmail. It also includes details on installing SpamAssassin on Windows, and adding a separate spam filter to an existing email infrastructure, such as Microsoft Exchange.

    Most spam detection systems use only one or two methods of detecting spam. SpamAssassin uses many, and is extensible, allowing users to develop their own rules to identify spam. New techniques to identify spam, such as Sender Policy Framework (SPF) can be added to SpamAssassin by developing them as modules. Users or System Administrators can configure almost every aspect of SpamAssassin, leading to exceptional success rates in detecting spam.

    SpamAssassin is Open Source, which means that the program code is freely available for others to examine and modify. SpamAssassin is developed, documented, and supported by a team of volunteers who give their time freely.

    What This Book Covers

    This book has three main areas or sections. The first section discusses spam, spammers, and anti-spam techniques. The second section discusses SpamAssassin basics, including obtaining, installing, and configuring SpamAssassin. The final section describes techniques to improve the spam detection of SpamAssassin, and to improve the performance of a SpamAssassin installation.

    Chapter 1 introduces spam and provides some definitions of terms used in this book.Chapter 2 discusses various spam detection techniques used by spam detection engines and the techniques developed by spammers to subvert them.

    Chapter 3 discusses open relays, historically the source of much spam, and includes information on how to check that an existing email server cannot be abused by spammers. It also describes how to rectify an MTA that is acting as an open relay. Chapter 4 describes how spammers collect email addresses and provides solutions to publish email addresses on websites without making them targets for spam.Chapter 5 discusses the mechanics of detecting spam.

    Chapter 6 gives detailed instructions on how to install SpamAssassin on Unix, Linux, and Windows platforms, including obtaining and installing any prerequisite packages that SpamAssassin requires.

    Chapter 7 provides a brief run through the SpamAssassin configuration files, and provides a foundation for the remaining chapters. Chapter 8 discusses how to integrate SpamAssassin with the MTA, or invoke it using procmail. A variety of strategies are discussed, to suit the needs of different organizations.

    Chapter 9 covers the use of SpamAssassin’s Bayesian filter, a tool that learns from spam emails and can improve detection rates dramatically.

    SpamAssassin is incredibly flexible, and Chapter 10 discusses how SpamAssassin can alter emails to mark them as spam.Chapter 11 covers adding external Network Tests which utilize databases of known spam emails to improve spam detection rates.

    Chapter 12 provides a description of SpamAssassin’s rules, and describes how rules can be written, tested, and scored.

    Chapter 13 covers methods to improve the detection rate of SpamAssassin, including whitelists and blacklists.

    Chapter 14 describes how to improve the performance of a SpamAssassin installation.

    Chapter 15 describes some useful reports and utilities that an administrator can use to streamline the running of a SpamAssassin installation.

    Chapter 16 has a complete description of how to create a spam filtering gateway—this covers installing Linux and SpamAssassin, and configuring them all to filter email and forward the non-spam (or ‘ham’) to the existing email server.

    Chapter 17 describes how to configure several major email clients to filter email based on the tags that SpamAssassin places in emails.

    Finally, Chapter 18 discusses the advantages, disadvantages, and options available when adding an additional spam filter to an existing SpamAssassin installation.

    What You Need for Using This Book

    SpamAssassin and all the tools it uses are available for download from the Internet. Perl, the main prerequisite, is included in all major Linux distributions and available for most Unix-like operating systems. It can be downloaded from http://www.perl.org/get.html. The Perl CPAN module is normally used to install SpamAssassin; all that is required is an Internet connection.

    This book covers integrating with four of the most popular MTAs—Sendmail, Postfix, Exim, and Qmail. MTA integration is only a small part of this book, and most of this book will be relevant no matter which MTA is in use. SpamAssassin can be integrated with most MTAs.

    Conventions

    In this book you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

    There are three styles for code listings. Code words within text are shown as follows: Rather than get the contents of myFile with the getContents() method, we construct a new CmsXmlControlFile object.

    If we have a block of code, it will be set as follows:

    #!/usr/bin/perl -w

    # spamlogfileparser.pl - parse /var/log/messages and calculate statistics

    use strict;

    # declare variables

    my (@ham, @spam, %seen);

    When we wish to draw your attention to a particular part of a code block, the relevant lines will be made bold:

    #!/bin/sh

    # check_process.sh check a process is running

    RECIPIENT=postmaster@mycompany.com

     

    if [[ $1 = ]]; then

    New terms and important words are introduced in a bold-type font. Words that you see on the screen—in menus or dialog boxes, for example—appear in the text as follows: Clicking the Next button moves you to the next screen.

    Tip

    Tips, suggestions, or important notes appear in a box like this.

    Any command-line input and output is written as follows:

    mysql> create table books (name char(100), author char(50));

     

    Query OK, 0 rows affected (0.03 sec)

    Reader Feedback

    Feedback from our readers is always welcome. Let us know what you think about this book, what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

    To send us general feedback, simply drop an e-mail to <feedback@packtpub.com>, making sure to mention the book title in the subject of your message.

    If there is a book that you need and would like to see us publish, please send us a note in the Suggest a title form on www.packtpub.com or e-mail .

    If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

    Customer Support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the Example Code for the Book

    Visit http://www.packtpub.com/support, and select this book from the list of titles to download any example code or extra resources for this book. The code files available for download will then be displayed.

    Note

    The downloadable files contain instructions on how to use them.

    Errata

    Although we have taken every care to ensure the accuracy of our contents, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in text or code—we would be grateful if you would report this to us. By doing this you can save other readers from frustration, and also help to improve subsequent versions of this book.

    If you find any errata, report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the Submit Errata link, and entering the details of your errata. Once your errata have been verified, your submission will be accepted and the errata added to the list of existing errata. The existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

    Questions

    You can contact us at <questions@packtpub.com> if you are having a problem with some aspect of the book, and we will do our best to address it.

    Chapter 1. Introducing Spam

    Spam is an often-used term, but as with many terms, it means different things to different people. This chapter defines the term 'spam' as used in this book and reviews its history. By examining the economics and costs involved with spam, we will explain why spam has become so invasive to modern computing. Finally, we will describe the current legal position against spam.

    Defining Spam

    Spam, in computing terms, means something unwanted. It has normally been used to refer to unwanted email or Usenet messages, and it is now also being used to refer to unwanted Instant Messenger (IM) and telephone Short Message Service (SMS) messages. Spam email is unwanted, uninvited, and inevitably promotes something for sale. Often the terms junk email, Unsolicited Bulk Email (UBE), or Unsolicited Commercial Email (UCE) are used to refer to spam email. Spam generally promotes Internet-based sales, but it also occasionally promotes telephone-based or other methods of sales too.

    People who specialize in sending spam are called spammers. Companies pay spammers to send emails on their behalf, and the spammers have developed a range of computerized tools and techniques to send these messages. Spammers also run their own online businesses and market them using spam email.

    The term 'spam email' generally precludes email from known sources, regardless of however unwanted the content is. One example of this would be an endless list of jokes sent from acquaintances. Email viruses, trojan horses, and other malware (short for malicious software) are not normally categorized as spam either, although they share some common

    Enjoying the preview?
    Page 1 of 1