Spam Detection Anyone?

This paper from the MSN research team published in 2004 offers up some insight on what MSN thinks spam is, and how they planned to combat it…

Spam, Damn Spam and Statistics PDF FILE!

Most if not all of the SEO-generated pages exist solely to
(mis)lead a search engine into directing traffic towards the
“optimized” site; in other words, the SEO-generated pages
are intended only for the search engine, and are completely
useless to human visitors.

I’ve linked to it from the research papers section as well.


  1. I read through a paper by MSN and UCLA at the 2006 WWW conference in Edinburgh – some “interesting” results:

    Blogged about some of the results here:

    As with any paper it’s all down to speculation about what actually hits the SERPs in the end, but there are some interesting results like the reasearch labelling 70% of .biz domains as spam (and 37% of .us domains).


  2. DG

    It really is almost impossible to tell what ideas get implemented. I find the papers useful for those instances when someone says, ‘they can’t do that’, or ‘that would never work’, and then the methodology is found in a paper.

    I wonder what percentage of triple hyphenated domains are spam? ;)

  3. Its kinda like forum speculation really – just need to take what you want from it and see what works, although patents, etc tend to have a little more basis in reality! ;)

    Funnily enough I just saw some health campaign advertised on TV the other day using a quadrupal hyphenated domain (slogan) as the main URL – big business catching up…with 2002. :)


  4. WordPress Trackback Spam!!!
    I have installed plugins that prevent comment spams, but this won't prevent trackback to be blocked. I've been spam by many
    MFA websites that most probably is from the same network with trackback, but they are not linking me on their website. May I
    know how do they do it and how do I stop it? Without disabling trackback?
    Thanks, and I'm using WordPress.

