The phrase “Oprah!”, an African-American entertainer, is listed as potentially spammy, though the rule is currently inactive. While SpamAssassin’s rules change daily, its default configuration files single out words like “Ivory Coast”, “Nigeria” or “Nigerian government” as spammy. Unlike most commercial offerings, SpamAssassin’s code is open-source and can be reviewed. It is widely used by organizations that maintain their own email servers.
MICROSOFT SPAM FILTER SOFTWARE
SpamAssassin is a spam filter developed by the Apache Software Foundation. Microsoft does not make the training data set of its spam filter available to researchers. Instead, a machine learning algorithm probably identified “Nigeria” as a strong discriminator between spam and non-spam messages. It is unlikely that an Outlook engineer made an explicit rule to mark any message that contains “Nigeria” as spam. Outlook was the only provider where we could identify the words that triggered the spam filter. Spam detectors at other providers did not display the same behavior. Removing the words “loan”, “investment” and “billion” from a similar email resulted in its delivery in the inbox. An excerpt from a speech by Joe Biden on student debt.The same email was delivered to the inbox after removing all instances of “sex” (but leaving just one directed the email to the spam folder). A description of a sex education program.The same email with the word “Nigeria” removed was delivered to the inbox. An internship application from a Nigerian student.The results, which are available online, show that Microsoft Outlook considers the following as spam: All accounts were created specifically for the experiment. In an experiment, AlgorithmWatch sent a few hundred emails to 10 email inboxes at Gmail, Yahoo, Outlook, GMX and LaPoste (the last two are used by millions of Germans and French, respectively).