Lies, damn lies and election polls: Why GE2015 pundits fluffed the numbers so badly
The lessons of shaping a mathematical 'reality'
Whatever you may think about the outcome of last Thursday’s General Election, there is one issue on which public, politicians and pundits alike seem to be broadly united: how badly the opinion pollsters fared. They got it very wrong!
Egregiously so, according to the editor of the Market Oracle, an online financial forecasting service. Nadeem Walayat wrote: “The Guardian and the rest of the mainstream press have effectively wasted tens of millions of £'s on worthless opinion polls.”
Is this fair? Whose fault is it that we have all spent the last five weeks absolutely convinced that the future was multi-party? And does it really matter?
Let’s start with what the polls were telling us. In that key pre-election month, no significant poll was forecasting anything other than a hung parliament. Precise figures varied from poll to poll, but the average was fairly consistent. No party was pulling ahead dramatically in percentage terms; therefore there was no party with an overall majority.
Take the ICM poll of polls, reported in The Guardian on the morning of the election and based on a weighted average of all constituency level polls, national surveys and polling in the regions. This put Tories and Labour within a whisker of one another around the 34 per cent mark.
From that, The Guardian suggested a dead heat, with both parties ending up on 273 seats, the SNP capturing 52, and the Lib Dems hanging on to 27.
There were, in fact, two quite separate failures which have been elided into one by the popular press – and two separate explanations. The first relates to share of vote.
|Party||Predicted votes||Actual votes|
Table 1: The ICM Poll of Polls vote forecast (6 May 2015) vs. actual votes cast (7 May 2015)
The figures above and just below are far less out than you might suppose. UKIP, Lib Dem and Other were all respectably close to the actual result. Neither Lib Dems nor the Greens surprised us by taking 50 per cent of the vote on the day. If such a proposition sounds fanciful, it helps to focus what we might consider to be a truly inaccurate forecast.
ICM Poll of Polls vote forecast (6 May 2015) vs. actual votes cast (7 May 2015)
The real problem lies in the Labour/Tory difference of some six and a half percent.
Lies, damn lies and statistics
There is of course a “statistical” explanation for this. As every polling organisation will patiently explain many times over, forecasts don’t tell you precisely what the result will be. They are subject to a “margin of error”. This, in turn, varies according to how many you have polled, and how close to the actual outcome you want to be.
That is why the small print of many polls includes an explanatory note to the effect that the forecast is “plus or minus x per cent with 95 per cent confidence”. That last bit is critical. It indicates that the pollster is 95 per cent confident, assuming that nothing has gone wrong with the polling process, that the right result will be within the margins they just told you about. Clear? In non-statistical speak, that means they just told you that one time in 20, they will get the result wrong.
As for the range of the margin, that, too, varies according to the number of individuals polled. For a survey polling around 1,000 voters, you end up with a standard error of three per cent, which rather neatly matches the error we got. Just add three per cent to the Tory forecast – and subtract three per cent from the Labour one.
Statistics to seats
The second failure relates to the number of seats relative to votes. In a non-proportional system, as the leaders of the smaller parties have repeatedly pointed out, the proportion of seats only imperfectly represents the proportion of votes. Those parties with greatest geographic spread of votes will fare the worst, which is why the SNP, with support concentrated in one corner of the UK managed to return 56 MPs with 1,454,436 votes (25,972 votes per MP), while UKIP, which drew its support far more widely, achieved just one MP with 3.8 million votes (see below).
|Party||Votes total||Per MP|
Table 2: Votes cast and votes per MP elected
Here, the central issue is the number of seats obtained by the Lib Dems because, in terms of percentage vote, the polls were not far out: where they erred significantly was the way they converted votes to seat numbers.
Table 3: ICM Poll of Polls seats forecast (6 May 2015) vs. exit poll and actual seats gained (7 May 2015)
ICM Poll of Polls seats forecast (6 May 2015) vs. exit poll and actual seats gained (7 May 2015)
How credible are these explanations?
Across several dozen surveys over a period of many weeks, every major polling organisation came to the same conclusion: a dead heat. Many individual polls carried out were bigger than 1,000. Between 4-6 May alone, the Survation/Daily Mirror survey polled 4,000 electors: YouGov/The Sun surveyed over 10,000.
With such consistency, the margin of error argument cannot stand.
Systemic bias? Perhaps - but it's not intentional
As several commenters have already observed, this consistency of error suggests that something “systemic” was going on. That is, there was something about the way the polls were conducted that skewed the forecast.
Over the years, a range of factors have been identified as having the potential to skew results, from the age/gender/class profile of those polled, to the way they are polled (face-to-face or by phone) and even the order in which questions are asked.
Pollsters therefore apply correction factors. Young people tend to vote less than older people: so they upweight the influence of the latter and downplay the former. Those from a lower social class also tend to vote less so there is correction applied there, too.
Over the years, polling organisations have identified what they consider to be the most significant sources of skew and they have developed a range of corrections to apply to their raw data. The problem? First, the nature and direction of the correction required can only be fully understood after the event: so every election forecast is built with one eye on the rear-view mirror.
Worse, as the outcome from your predictive tool is increasingly based not the data going into it, but the transformations applied to that data – the “fudge factors” – the integrity of the tools themselves begins to be called into account.
In this case, one of the largest fudge factors was that of the “shy Tory”: first identified in the 1992 general election, this is the voter unwilling to own to voting conservative. Not only are they unprepared to admit this publically: they also fib to the pollsters.
Attempts are regularly made to compensate for this effect. Yet even this appears to have been insufficient on this occasion.
Telling the truth
Perhaps that gets us closer to the heart of the problem, which is simply that a chunk of voters – perhaps a growing chunk – just aren’t telling the truth any more.
Some may simply not know how they are going to vote in advance. Anecdotal evidence suggests that there was a lot of last minute agonising in the polling booth. Most polls recognise this and do their best to compensate, allocating “don’t knows” according to previously tried and trusted formulae.
And then there is that other thing, the social embarrassment factor, or a very British reluctance to own that one is about to do something rather disreputable. Like have a second slice of pie.
Is it possible that as well as the shy Tory, this last election saw the emergence of the “let’s give Clegg a good kicking” factor?
For while shy Toryness would just about explain the discrepancy between forecast and exit poll, it does not explain the discrepancy between Lib Dem support and seats. The analysis is yet to be done, but it seems plausible that far from having local strengths, it may turn out that where the Lib Dems had sitting MPs, the public were more inclined to vote against them to “teach them a lesson.”
The fact that many may now feel that they took this a bit far is evidenced by the fact that just days after the election there are signs of a Lib Dem bounceback: many, many electors reportedly returning to the fold (too late!) and a rise in party membership of more than 4,000 since the weekend.
Behind all of the above, of course, is the endearing fallacy that voters vote for rational and knowable reasons. Perhaps many do: but in a system where the “noise” – a shift in votes of just one or two percentage points either way - can have a major impact on the outcome, perhaps the truth is that we can never have the level of accuracy that newspapers and assorted columnists pretend to offer.
Does it matter?
In one sense, no. We get the government we vote for.
But in another very real sense, polls do matter. They feed on themselves, influencing the campaign and, critically, influencing the things that politicians promise and therefore nail themselves to during the campaign. The last-minute panic in the Scottish referendum over the possibility of a Yes vote led to promises that are now directly impacting the future of Scotland.
We cannot now know what promises might have been made, or not made, had the polls provided a truer reflection in advance of how the voters were going to vote.
Polls also have a direct and potentially toxic impact upon our finances. In the days before the election UK share prices tumbled and the pound took a major hit against other currencies. All this, righted itself the day after, as the markets, in typical fashion, overshot, allowing Tory voters to celebrate not just an election victory, but some very bankable gains as well.
Still, that may have cost some pension funds – and therefore individual people – some very real money.
The answer, in the end, is that the answer is probably never going to be precisely knowable. Polling organisations will improve their fudge factors and they won’t make the same mistakes next time. They’ll just make different ones instead.
Whereas we, the electorate, will make exactly the same error we always do: we will believe that for once, the polls have got it right – and we will be cross when we discover they haven’t. ®