Half Way There....
Now you have a script that will teach Spam Assassin to recognize spam. But the Spam Assassin program won't activate the BAYES rules until it has learned at least 150-200 SPAM and 'ham' (non spam) messages.
The best way to compile that many messages is to have each user 'pre check' their email with HORDE before downloading messages to their computer.
To do this you must disable 'auto-checking' from your mail program (outlook express, etc). Then, open webmail for your account. Open the inbox, then select Folders. Create 2 new folders - SPAM and HAM. You must use those folder names exactly, because that is what the script is searching for.
Now, when you find a message that IS spam in your inbox, MOVE it to the SPAM folder.
And copy a bunch of your 'good' mail messages to the HAM folder. Copy is the best thing here, because the script will purge that folder after each run.
Of course, with HORDE you can look at the contents of these folders. They should have a similar number of messages in them when you start the process. As it runs, however, the spam folder will continue to contain older spam messages. The reason for this is that in the event Cpanel upgrades Spam Assassin, or your bayes-database gets corrupted for any other reason, you want to have a library of about 500 spam messages to 'relearn'. You should go through the SPAM folder every month or so and delete the oldest messages once you have 500 in the folder.
Personally, I try to have 40 HAM's in the folder each time the script runs on my domain - 3 times per week.
You can accumulate MORE spam by modifying the "autodelete" rule from the last section.
If you remove that rule and instead tell the mail filter to forward ALL MESSAGES scoring over 10 to a separate EMAIL address (mine is 'mailtrap'), then logging into the mail trap account every couple of days and moving 'his' messages to 'his' spam folder will help SA learn REAL spam...
If you get a message that is a FALSE POSITIVE - meaning it scored as spam but was not ment to be, make sure you copy that into the HAM folder.
Setting the Cron Job
Click the CRON JOB icon in Cpanel.
Click STANDARD mode. You'll see a screen like this:
Enter your mail address in the 'mail to' box
Set up your CRON job to run every couple of days at a certain time. I set mine up for NOON because a significant portion of SPAM seems to arrive between 11pm and 10 am... and I want to get it while it's fresh... So, if you set yours up to run 3 times a week (hold the CTRL key to select multiple days) your screen will look like this:
Press 'save crontab' button. NOTE - see the path? that is the 'userID' for cpanel that is wiped out there... so replace it with whatever yours is... this is the path to the learning script.
You're done.
If you've followed these instructions and mimicked my installation exactly, you'll have an email from the 'cron daemon' 3 times per week, telling you how many messages it processed:
Code:
Learning SPAM
Processing /home/youraccount/mail/domain-name/john/SPAM
Learned from 13 message(s) (110 message(s) examined).
Processing /home/youraccount/mail/domain-name/mailtrap/SPAM
Learned from 62 message(s) (138 message(s) examined).
Learning HAM
Processing /home/youraccount/mail/domain-name/john/HAM
Learned from 17 message(s) (20 message(s) examined).
Done
And as Bayes kicks in, you'll start seeing stuff like this in the message header...
Code:
Content analysis details: (6.4 points, 4.5 required)
pts rule name description
---- ---------------------- --------------------------------------------------
1.5 RCVD_NUMERIC_HELO Received: contains a numeric HELO
0.2 HTML_MESSAGE BODY: HTML included in message
1.7 BAYES_80 BODY: Bayesian spam probability is 80 to 90%
[score: 0.8364]
3.0 FORGED_RCVD_HELO Received: contains a forged HELO
Notice - without the BAYES score this message would not even have been flagged! We're succeeding in marking up MORE SPAM!
Thanks for listening... I hope you all enjoyed your lesson in 'how to make Spam Assassin work for you'.