nav search
Data Center Software Security Transformation DevOps Business Personal Tech Science Emergent Tech Bootnotes BOFH

Google's news algorithm serves up penis pills

This story has: Maths, Google, 'fake' pharma, explicit screenshots...

By Jude Karabus, 12 Jun 2017

+Comment Our Monday here at The Reg's London offices has been cheered to no end by Google News, which has been spitting out odd pharmaceutical-related "journalism" throughout the day.

Over the past few days*, we've received several emails from readers about the issue, and we managed to snap a explicit screenshots ourselves as the catchy, SEO-sweetened titles flashed across Mountain View's news aggregator. Titles such as "Penis after taking levitra professional – Levitra professional mail order telephone number" and "Cialis jelly review – Cialis oral jelly review" have flashed across the screen.

The Google-Translate-esque headlines seemingly all emanate from the website of the The Boyne City Gazette, but in fact redirect to various Canadian pharmacy sites and gambling sites. An expert told us that the redirect is implemented in WordPress. We have asked the website for comment.

At the time of publication, the dodgy links were dominating the "Health" section of Google's News aggregator in the US.

Click to enlarge

In its support pages, the market-dominating search-'n'-ads colossus tells publishers:

In general, Google News aims to promote original journalism, as well as to expose users to diverse perspectives.

It confirms that no "human editors [are] selecting stories or deciding which ones deserve top placement".

Ranking in Google News, apparently, is determined by a number of factors, including:

  • freshness of content
  • diversity of content
  • rich textual content
  • originality of content

It also, like PageRank, looks at several technical factors – including clear conceptual page hierarchy and easy-to-crawl links.

Google last updated its Search Ranking algorithm, PageRank, in April, partly as a reaction to complaints that the search giant was promoting "fake news" in its search results. We don't know when the Google News algo was last updated, although many SEO "gurus" feverishly speculate about this. Google filed this patent in 2012, but we don't know how much it has moved on since then.

Some critics, like mathematician Dr Hannah Fry, have spoken of the tension between algorithms "behind closed doors" and the people behind the data they sift.

How did Penis news-gate happen?

Readers can still visit the Boyne City Gazette's website directly without being redirected to drug and casino sites, it seems.

Techies at The Reg commented: "The way this seems to work is… once every while if you go on that page you get directed to a spam site." Apparently, the trick is "if your referrer is Google News, redirect to spam site."

$ curl -vs -H 'Referer: https://news.google.com/' 'http://boynegazette.com/?f16ru=1802997573'
*   Trying 50.62.120.1...
* Connected to boynegazette.com (50.62.120.1) port 80 (#0)
> GET /?f16ru=1802997573 HTTP/1.1
> Host: boynegazette.com
> User-Agent: curl/7.43.0
> Accept: */*
> Referer: https://news.google.com/
>
< HTTP/1.1 200 OK
< Date: Mon, 12 Jun 2017 13:52:31 GMT
< Server: Apache
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Connection #0 to host boynegazette.com left intact
<html><head><style>html, body, div, iframe {margin:0;padding:0;height:100%;}iframe {display:block;width:100%;border:none;}</style></head><body><div><iframe src="http://shop.medcom.top/search.html?key=wellbutrin xl"></iframe></div></body></html>

Without the 'Google News' referrer, their configuration bits take over and simply redirect to www:

$ curl -vs -H 'Referer: -' 'http://boynegazette.com/?f16ru=1802997573'
*   Trying 50.62.120.1...
* Connected to boynegazette.com (50.62.120.1) port 80 (#0)
> GET /?f16ru=1802997573 HTTP/1.1
> Host: boynegazette.com
> User-Agent: curl/7.43.0
> Accept: */*
> Referer: -
>
< HTTP/1.1 301 Moved Permanently
< Date: Mon, 12 Jun 2017 13:58:17 GMT
< Server: Apache
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Set-Cookie: PHPSESSID=cg531kpfkorbet1hcu2m1s6iu3; path=/
< Location: http://www.boynegazette.com/?f16ru=1802997573
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8
<
* Connection #0 to host boynegazette.com left intact

Alphabet Inc – the multibillion-dollar ads-flinging, services-shilling host, software developer, AI proponent and hardware-maker – was built on algorithms, a search algorithm to be specific, once the Stanford University research project of PhD candidates mssrs Larry Page and Sergey Brin. The algorithm uses the link structure of the web to calculate a "quality" ranking for each page, which is then reflected in results the search engine spits out. It also relies on the authority of some of the source links to create the ranking. The Google News algorithm is understood to be a separate beast to PageRank.

We've asked the Boyne City Gazette for comment and will update if we hear more.

+Comment

While rude headlines on Google News is just an amusing anecdote to a discerning reader like you – who gets their news from websites they know to be reliable (eh, eh?), rather than Facebook, the front page of YouTube or Alphabet's news aggregator – it should give us all some food for thought.

Alphabet Inc is, of course, punting an ever-increasing stack of tools and services: cloud services, office tools and software, and now heavily pushing its machine learning and AI – like the rest of Silicon Valley. Projects like the Cloud Vision API, which classifies and tags photos automatically by detecting your face using an AI-powered visual search tool. It's also helping you block ads on its tremendously popular browser, Chrome.

You could say that clever algorithms, indexing, crawling and search are core competencies of the firm. This sort of hijacking shouldn't be able to work, and definitely not for hours on end.

It's the AGM in just 11-and-a-bit months. Is it time to call in the nuns? ®

*Hattip to the readers who alerted us to the issue, Rob, Thomas and K. At least one says he has been seeing them for weeks but most of the readers wrote in yesterday.

The Register - Independent news and views for the tech community. Part of Situation Publishing