Roses are red, violets are blue, fake-news-detecting AI is fake news, too
Humanity's bulls*** is too much for software
Analysis The viral spread of fake news and “alternative facts” has rocked Western politics. Oxford Dictionaries chose “post-truth” as its word of 2016, and when a society is scolded by a dictionary wielding a hyphenated word, you know you've collectively screwed up.
“The concept of post-truth has been in existence for the past decade, but Oxford Dictionaries has seen a spike in frequency this year in the context of the EU referendum in the United Kingdom and the presidential election in the United States. It has also become associated with a particular noun, in the phrase ‘post-truth politics’,” the Brit word wizards tutted.
Yes, there's always been dodgy facts on the internet, and in newspapers that were read daily by millions. However, misinformation toward the end of 2016 was spreading at an alarming rate, thanks to the greasy tubes of social networks, SEO-doped Macedonian teens, and electorates dying to soak up words that reinforced their political and world view.
Who do we turn to, to end this scourge? Artificial intelligence, right?
Trapped in a perpetual cycle of hype, machine intelligence has been heralded as the miracle cure for society’s woes: cancer, climate change, inequality, crime, you name it. Get a bunch of data, fire up the GPUs, and use deep learning. Voila!
Superintelligent machines needed, please apply here
Dean Pomerleau and Delip Rao, AI tech entrepreneurs, thought so when they tried to launch the Fake News Challenge (FNC). This is a contest that encourages AI researchers to invent algorithms that can filter out clickbait and fabrications from streams of news articles.
Initially, Pomerleau and Rao thought the winning software in their challenge would be able to detect and highlight baseless assertions all by itself with no human intervention. “I made a casual bet with my machine learning friends, and thought it’d be trivial to apply the same techniques used in spam filtering and detecting bogus websites for fake news,” Pomerleau told The Register. “I came into [the Fake News Challenge] naively."
After chatting to more machine-learning experts and journalists, the pair realized identifying deceptive editorial copy was a murky business.
There are simple facts that can be easily verified – such as the height of the Statue of Liberty and the name of the UK Prime Minister. Then there are truths that are harder to prove, such as whether or not something was an accident, or if two leaders really were friends or had secretly fallen out. There are truths that require anonymous sources who need protecting, and there are truths that are covered up and officially denied.
It is difficult for even humans to assess what is real and what isn't, let alone machines: how many people fall for the Borowitz Report in the New Yorker every week, for example? Training machines to pick out complex truths from fiction would be an arduous task, considering there isn't a clean database with a complete list of verified facts.
The system would have to trawl through the entire internet to gain enough knowledge and wisdom to be able to label news as legit or made up. “It would need a very subtle understanding and reasoning of the world to arrive at a conclusion,” said Rao.
Zachary Lipton, a machine learning researcher at the University of California, San Diego, was highly critical of the first version of the contest. Building software to spit out a “boolean fakeness indicator” – a 1 or 0 for a true or false news article – and a confidence score for each URL, would be “problematic,” Lipton wrote in a blog post.
Pomerleau and Rao have since changed their minds, and now believe a fully automated truth labelling system is “virtually impossible” with today's AI and natural language processing abilities. Building a supervised classifier able to tell right from wrong would take super intelligence or even artificial general intelligence, the duo told The Register.
The second version of the competition calls for code that can perform “stance detection” instead. Claims in headlines are tested against the contents of a story. You give the headline and the text beneath to an algorithm, and the output should be one of four categories:
- Agrees: The body text agrees with the headline.
- Disagrees: The body text disagrees with the headline.
- Discusses: The body text discusses the same topic as the headline, but does not take a position.
- Unrelated: The body text discusses a different topic than the headline. It will allow human fact checkers to identify stories that might hold evidence for the arguments needed to inspect the claims made, so they can judge the accuracy of information quickly.
The AI that can do this with the highest degree of accuracy is the winner.
It’s important to note that the winning program won't solve the fake news problem, Lipton said. But it might help to lighten the load on fact checkers, or at least steer readers away from clickbait. “[It’s] better to start with [something] modest but concrete [rather] than magical and infeasible. I think [stance detection] is a strong move in the right direction. It’s also a good opportunity to identify a community of talented researchers committed to worthwhile causes,” he told us.
The number of teams registering for the FNC has shot up since a training dataset was released earlier this month. It’s gone from 72 to 206 coding crews in just under two weeks. A cash prize is on offer although the exact figure is yet to be confirmed, as Pomerleau and Rao are looking for sponsors willing to contribute financially.
What could you do with fifty grand, though?
Another group tackling the same problem is Full Fact, an independent fact-checking organization in the UK.
Armed with a €50,000 grant from Google’s Digital News Initiative, Full Fact was one of three teams to win funding for tackling fake news.
“Fact checking is difficult. Everyone thinks it’s a matter of yes or no, but it’s not that simple. It’s complex, it requires a lot of nuance – something that computers aren’t good at,” said Mevan Babakar, digital products manager at Full Fact.
The human fact checkers over at Full Fact don’t label information simply as true or false either. Evidence supporting and undermining the claim are laid out and it’s up to the public to make up their own minds.
There are levels of complexity to the problem, Babakar explains. “For example, something like population numbers can be checked against data – that’s easy for computers and could be automated. But for claims like ‘the NHS is in crisis’ – that requires interpreting different datasets and meanings, so it’s not something a computer can do.”
“Human fact checkers come with a body of experience; they know the methodology behind the data and they know its limits,” she added.
Full Fact is turning away from the glitz and glamour of AI and machine learning and are instead focusing on customizing Solr, a search engine, with APIs that will collate information on repeated claims made over the internet or television.
The team uses natural language processing with search patterns and queries that monitor the spread of information and locate the primary source needed to judge the accuracy of information.
The search engine will power two Full Fact tools: Trends and Live. Trends works similarly to Google Trends: it reveals who is repeating inaccurate claims, allowing Full Fact to quickly identify the spread of misinformation and asking journalists responsible for the errors to make corrections. Meanwhile, Live allows the charity to flag claims and factcheck them during parliamentary debates, or on TV, in real-time.
“People are much more attuned to misinformation now, because they’ve seen so much of it in such a short amount of time,” Babakar said. But unlike the Fake News Challenge, Full Fact is looking for short-term remedies that can be rolled out in the next six months.
“Machine learning and AI over promises but under delivers. It’s something I believe doesn’t require a five-year PhD in machine learning or neural nets. Some of the technology is pretty basic and we are already seeing it work in Trends and Live.”
“It’s an interesting space, though,” Babakar said. “I wonder how machines will deal with things like satire. One person’s satire can be another person’s fake news.” ®