Tuesday, May 7, 2013

Spam: Stupid Pointless Annoying Messages?

Spam: Stupid Pointless Annoying Messages?
I meant to post this as part of the post on AI, but I felt that some of this information fell out of the scope of that post. So, why the post on spam? Well, it something all of us has run into in some way and it makes a wonderful case study, as you are about to find out.

I’m sure most of you have at one time or another looked at you inbox, found the countless spam messages, and though “why?”. What could these spammers, who go to such extreme lengths as to compromises other’s computers to add to their bot-nets, possibly hope to gain? Well, like most things, it boils down to money. Garry Pejski has an interesting DEFCON presentation on his time as a malware developer. Some reputable company, say GM or IKEA, will want to advertise their product or service in the hopes of attracting more customers. Another reputable middleware company, say Google, creates ads for these companies for a price and offers others a cut of this money if they display these ads on their website or product. Here’s the problem, some unscrupulous individuals look to capitalize on these offers by having others view as many of these ads as possible, whether they would actually buy any of these products or not.
There are other reasons why spammers are doing what they’re doing. Some could be trying to perform a phishing attack. But, here a more interesting case, also coming out of DEFCON, by Grant Jordan on the use of spam to affect the stock market. A spammer buy a stock, spams others to buy that stock, and then sells when it hits its height. Some interesting things brought up in this study: similar looking ads came from same spammer and would usually perform the same. Also, due to the advent of spam catching techniques, text based spam never worked.
Spam filtering is actually a highly intellectual field of computer science. It relies on computer learning, but there are two ways it can be preformed, using classification in the case of supervised learning or using clustering in the case of unsupervised learning. Both use features of a message (tokens that occur, number of capital letters)to add it to a group with similar features. In the case of classification, training data provides examples of the groups (spam and non-spam), while clustering uses input data to define groups and may only require the number of clusters to start the algorithm.

No comments:

Post a Comment