Google algorithm change squashes code geek 'webspam'
More Stackoverflow, please
By Cade Metz • In Media • At 19:16 GMT 28th January 2011
Google has rolled out an update to its search algorithms designed to reduce "webspam", aka "the junk you see in search results when websites try to cheat their way into higher positions in search results or otherwise violate search engine quality guidelines".
In short, says Google principal engineer and search quality guru Matt Cutts, the company's search engine will show more preference to sites that generate original content, as opposed to sites that lift content from elsewhere. Google is pushing back against so-called content farms – at least a little. The algorithm change affects a relatively small number of search results. According to Cutts, searchers will "notice" the change on less than 0.5 per cent of queries.
A week ago, in response to several stories complaining of Google search spaminess, Matt Cutts unloaded a blog post defending the company's search engine. "According to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness," he said. "Today, English-language spam in Google’s results is less than half what it was five years ago, and spam in most other languages is even lower than in English."
But at the same time, Cutts acknowledged a "slight uptick" in spam in recent months, and he said that Google was "evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content." And on Friday, with a post to his personal blog, Cutts announced that this change went live earlier the week.
He said that the change would affect about two per cent of all Google search queries, but that users would actually notice something on less than 0.5 per cent of queries. "It's a pretty targeted launch," he said. "The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content."
In a post to Hacker News, Cutts mentions two programming-centric queries where the change comes into play: "pass json body to spring mvc" and "aws s3 emr pig". Apparently, both were giving preference to a site called efreedom that has copied content from stackoverflow.com, rather than promoting the original stackoverflow links. And now they don't.
"An example would be that stackoverflow.com will tend to rank higher than sites that just reuse stackoverflow.com's content," Cutts said. "Note that the algorithmic change isn't specific to stackoverflow.com though." But he did not give other examples. ®