_{1}

^{*}

Generally, in the 2016 United States Presidential Election, Mr. Trump was underestimated by 6.9% and greater than the margin of error. Against this background, the paper discusses the shortcomings of the existing methods and supports the view that a new polling method is needed when related to “opinions”. Graphs show that the maximum error did not occur at the expected value, nor did the data align with a normal statistical bell-shaped distribution. Vulnerabilities exist with combining fact-based statistical analysis with feeling-based opinions. Basic statistics equations do not cover feeling- based factors, i.e. biases, truthfulness, competency, nonresponse rates, etc. A comprehensive pollster accuracy study showed that the most widely used pollsters had significant biases favoring Democrats over Republicans. The 2016 polling failures illustrate deficiencies in the existing approach supporting the view that a new methodology is needed such as Statistical Error Analysis, On-Line Methodology, and others.

In 1824 Andrew Jackson ran against John Quincy Adams for president. A newspaper conducted a straw poll of about 500 people and Andrew Jackson received 2 out of every 3 votes. The newspaper proclaimed that Andrew Jackson would win. When Jackson received the most popular votes^{1}, the era of the “straw poll” began. In 1936, Literary Digest^{2} mailed out 10 million questionnaires and 2.3 million people responded. They predicted that Alfred Landon would win in a landslide over Franklin D Roosevelt. At the same time, George Gallup conducted another poll where he sent out trained interviewers to demographically representative samples or quotas. Mr. Gallup’s survey correctly projected a Roosevelt victory. This ended the widespread use of straw polls in favor of quota sampling.

Then came the election of 1948, Gallup^{TM}, Ropers^{TM} and Crossley^{TM} all proclaimed that Dewey would defeat Truman. When Harry Truman won the election, the newspapers blamed the quota sampling technique. One unknown study at the University of Michigan conducted a poll based on a “random probability” sampling method. This poll projected Truman would win. This ushered in the statistics methodology currently used today.

The opinion polls relating to Trump were off about the same amount that brought down quota sampling in the 1948 Dewey-Truman fiasco. The following is a general discussion on basic statistics and sampling methods.

Historically, Empires collected data that helped in making important decisions. In the 5th century BC, the Athenians calculated the height of walls by counting the number of bricks in the wall. The generals found that repeating the count several times allowed them to determine the most frequent brick numbers, which were then used to calculate the height of the ladders necessary to scale the walls.

Mathematicians became involved in calculating and formulating probabilities based on the simple coin flip. The frequency of heads and tails were plotted on a graph. Since there were only two choices the majority of the flips fell at the mean. However, the flips also showed lower percentage events such as three heads in a row or five tails in a row, etc. This resulted in a bell-shaped curve with the most frequent numbers located at its peak and the less frequent numbers tapering away from the center.

As shown in

The centerline of a normal distribution is located at a zero standard deviation. One standard deviation away from the center in each direction would contain 68.3% of all potential numbers on either side of the mean. Two standard deviations would contain 95.4% of the numbers and three standard deviations would contain 99.7%.

This is the total probability that the real number is contained within a particular area under the normal distribution curve. It does not mean that the selected number has a 95% chance of being correct. It only suggests that there is a 95% chance that the real number is contained within ± two standard deviations from the mean. The two are vastly different. The probability of a number being the correct number could be 1 in 100 and still be within the 95% total probability.

A margin of error is the amount of error expected in a survey. The smaller the margin of error produces a greater compaction of the probability density around the mean. By knowing the margin, a minimum number of random selections can be calculated to narrow in on the real number. If the margin of error is low, then a larger number of random samples will be necessary.

Calculations for determining the size of the random sample are simple when based on the normal distribution. Two major factors are the margin of error and the confidence interval. For large populations, there are published tables and automatic calculators (Smith, 2017) . For example, assuming the worst-case situation, if a 95% confidence level and a margin of error of ±3% are selected then the calculated size of the sample is 1,068 people. These variables (95% confidence and ±3% margin) are used by most pollsters in political elections.

A core principle in statistics is to select a random sample. Of course, that is easier said than done. For small samples, a unique number is assigned to each member in the domain, and then a random number generator selects the participants. In large populations, pollsters equate the general population to telephone owners. They randomly generate telephone numbers and an interviewer asks questions from those who respond. Although landlines dominated the telephone usage between 1950 and 2000, cell phones now reach a majority of households. However, cell phones have their own demographic and other complications (Blumberg & Luke, 2007) .

An on-line sampling includes direct communications between the pollster and the respondents via smartphones and computers. One polling company predicted a Trump victory using on-line direct communications (Jomeh & Lauter, 2016) . Merely increasing the sample size does not improve the accuracy as shown in by the 1936 Literary Digest study.

^{3}, as set forth in Appendix A, lists opinion poll results from the 50 states taken immediately before the 2016 election. The difference between the actual vote percent and the polling projection percent is the polling error. A positive error means that the poll underestimated the actual vote, and a negative value indicates that the poll overestimated the vote. The polling error data in

The polls underestimated both Clinton (3%) and Trump (6.9%). The average absolute^{4} polling error for Clinton was 3.8% and for Trump was 6.9%. The fact that Trump’s absolute polling error and his average polling error were the same indicates that there were almost no polls that overestimated his support.

The state polling errors relating to Trump varied from −0.4% (MN) to +17.1% (N.D.) with an average underestimating error of 6.9%. His standard deviation was 4.54. The standard deviation is a measure of the variations in the data.

The first dotted vertical line at point 0 represents where the errors on the positive side (underestimated) equal those errors on the negative side (overestimated). This would be the expected result from a neutral and unbiased poll. If the poll acted in a statistical manner, it would follow a bell-shape^{5} similar to the one superimposed on the graph.

In

the polls seriously underestimated Trump’s actual votes. The individual points also failed to follow a bell-shape curve, suggesting that factors other than statistics probably affected the polling results.

Although the Clinton state polls’ mean (average) was 3%, her individual state polling errors varied from −4.6% (S.D.) to +15.9% (VT). The Clinton standard deviation was 3.9.

Data similar to that set forth in

However, the individual data points appeared more comparable with a bell-shape curve. Plots for Kerry and McCain are representative and are shown in

The election of 2004 had an average polling error that underestimated both Kerry (2.1%) and Bush (2.7%). Both had higher state polling errors ranging from −2.5% to +12.1% (Kerry) and −2.1% to +9.4% (Bush). The average absolute polling errors were 2.5% (Kerry) and 2.9% (Bush). The standard deviation was 2.9 for Kerry and 2.4 for Bush.

In 2008, the polls underestimated Obama (2.4%) and McCain (2%). Both candidates had higher state differences ranging from −4.5% to +10.5% (Obama) and

−5.5% to +7.6% (McCain). The average absolute polling error was 3% for Obama and 2.6% for McCain. The standard deviation for Obama was 2.6 and 2.7 for McCain.

The Presidential election of 2012 underestimated Obama by 3% and Romney by 1.5%, although both had higher state deviations i.e. −3% to +9.6% (Obama) and −6.2% to +12.6% (Romney). The average absolute polling error was 3.3% for Obama and 2.5% for Romney. The standard deviation varied from 2.7 for Obama to 3.5 for Romney.

The data and plots suggest that the polls consistently underestimated both candidates for every election. These polls (excluding Trump) showed a mean error between 1.6% and 3% with an average of 2.4%. The Trump opinion polling errors were 2.8 times higher than the average (2.4%) and considerably higher than the margin of error.

The data identify four significant polling problems, i.e. the polls consistently underestimated the candidates’ actual performance; a substantial variation appeared between state polls; the standard deviation increased each year; and an unexpectedly large polling error occurred relating to Trump.

Statistics and probabilities are founded on “facts”. The flipping of a coin to determine heads or tails are facts. Each flip produces a head or a tail, and each flip can be counted. These are ascertainable and indisputable. The throwing of a dice results in a number and each throw can be counted. Cutting a deck of cards is countable and each cut will result in a particular card. These facts have a few things in common:

They are certain; they do not change, and they are verifiable.

“Who are you going to vote for?” is an opinion based on a person’s state of mind for a future event. It is essentially a feeling that cannot be physically or objectively measured and is far from certain. This opinion can change multiple times before the survey is completed. In addition, a responder can lie; and there is no way for the observer to know or correct it. A responder must also understand the question, whereas a fact does not depend on the competence or incompetence of a person. Therefore, treating opinions as facts is a fundamental error.

Marrying fact-based statistics and feeling-based opinion polls seem incompatible if not bizarre. But, the results showed the mergers have been somewhat successful. For example, prior to 2016 all but one of the presidential pre-election polls^{6} since Truman has been within the margin of error (NCPP, 2017) . This success rate is a partial verification of this merger. However, proving cause- and-effect is far more difficult. For example, there is no proof that the prior successes were a result of statistics as opposed to the expertise of the pollsters.

One company conducted a study^{7} (Pollster Accuracy Study) involving 370 different pollsters (Silver et al., 2016) . This study was done before the 2016 election. It showed that the accuracy varied from 1.2% polling error to 23.8%. Some pollsters were within the margin of error 100% of the time (116 companies) and others (42 companies) were always outside the margin of error. Some pollsters (28) never called a race correctly while others (154) had a 100% success rate. One pollster with a 100% success rate called 465 races correctly but only received a C minus rating. This suggests that the poll accuracy was related to the expertise of the pollster as opposed to mathematics.

Exit polls are not the same as opinion polls. An exit poll asks people how they actually voted. This is much closer to a fact rather than an opinion, although it is subject to lying, etc. The exit polling data for the 2016 election showed Clinton had an average absolute exit polling error of 2% and 2.8% for Trump. Both exit polls were well within the margin of error. In contrast, the absolute pre-election opinion polls (Trump 6.9% and Clinton 3.8%) were both considerably higher than projected. This raises a question as to why the two polling results (opinion & exit) were so different in the same wildly contentious election. Historical elections since Truman also show that exit polls have been more accurate thereby negating the 2016 election as an anomaly.

The Pollster Accuracy Study and the exit poll/opinion poll comparison provide a cogent argument that opinion poll accuracy is more related to the pollster’s expertise as opposed to mathematics.

In 2000, the percent of people opting not to respond to polling inquiries was 72% (Kennedy & Deane, 2017) . This increased to 76% in 2004, and then to 84% by 2008. By 2012 it rose to 91% and stayed at that level for the 2016 elections. Many experts questioned whether a random sample could be obtained when 91 percent of the population is excluded. Some assigned a “nonresponse bias” to the polling survey. Others ignored the nonresponse rate contending that the statistical distribution of the whole is the same as the statistical distribution of the portion. Studies investigating response rates as it affects poll results could not find a reliable connection (Kennedy & Deane, 2017; Groves & Peytcheva, 2008) .

In

On the other hand, the standard deviation as shown in

The fact that one cannot prove a connection between polling accuracy and nonresponse rates does not mean that a connection does not exist. The mathematics of random sampling reveals that a relationship must exist, i.e. a 100% nonresponse rate means no random sample.

Some of the state polling data included in

It is possible that the state polling data were skewed by the presence of third party candidates (Cassino, 2016; Smith, 2016) . During the pre-election polling period, third-party candidates provide a convenient way to protest. Protest votes are usually not at a factor at election time. Some pre-election polls limit the choices to the main candidates. However, this invoked strong criticism and some lawsuits. To avoid this, many pollsters present results for both, i.e. one poll for a two-party race and one that includes significant third-party candidates. The presence of third-party candidates provides a good explanation for the opinion polls consistently underestimating the major candidates.

There are many types of biases that can affect opinion polls. This is supported by the Pollster Accuracy Study that showed extremely divergent results depending on who did the survey.

This bias would apply to all elements of the polling methodology, i.e. sample selection, interviewer bias, question bias, response bias, weighing bias, etc. The underestimate/overestimate polling errors suggest a bias, particularly when it goes outside the margin of error. In the 2016 election, Trump was underestimated^{8} by 6.9% which was more than the margin of error. This size of error suggests a fundamental flaw (equating opinion with fact) or severe bias in the polling.

Of the pollsters used in the Pollster Accuracy Study indicated there were 74 pollsters that leaned toward the Democrats, and 27 that leaned to the Republicans. That is a favoring of 2.7 Democrat leaning pollsters for each Republican leaning pollster. But the amount of “mean reverted bias” associated with each of these pollsters was significantly greater, i.e. the Democrat leaning pollsters had a total bias of 63 with an average of 0.84/pollster; whereas the total bias for the Republicans was 7 with an average of 0.25/pollster. Hence, not only were far more pollsters favoring the Democrats the amount or degree of bias by each pollster was much greater. This accuracy study was done months (updated Aug 5, 2016) before the election. A review of the Pollster Accuracy Study indicates that more than 60% of all polling entities are associated with Universities and Academia are 90% Democrat donors (Kiersz & Walker, 2014) , and may be one of the reasons why the Democrat leaning is much higher.

The results of the 2016 election indicate that something was seriously flawed, and a review of the pollsters used in Pollster Accuracy Study exposed a major bias component, both in number and amount.

There is a difference between a sampling bias and a sampling error. The sampling error is a methodology related issue that should be included within the margin of error. The sampling bias is based on a conscious or unconscious sample selection. This point is illustrated by agricultural workers who may not be reachable during the working season. There are 3.2 million farmer/ranch workers in 2012 (USDA, 2014) and farmers lean to the Republican Party by approximately 80% based on the number of donors (Kiersz & Walker, 2014) . This indicates that telephone surveys of farmers may result in a significant under-sam- pling. Similar extension analysis would need to be done for the Mining Industry (90% republication) Construction Industry (65%) Oil & Gas (70%) and Real Estate (60%). Offsetting analysis would have to be done on those favoring Democrats, i.e. Entertainment (90%), Academia (90%), Newsprint (85%), On-line Computer Services (70%), Legal (70%), and Pharmaceuticals (65%). To determine if sampling bias existed with the data in

This hypothesis is that the publication of the opinion poll directly affects the turnout and voter preferences thereby potentially making the poll a self-fulfilling prophesy. However, studies show mixed results. There is evidence that the party gaining in the polls will benefit in a bandwagon type effect (Dahlgaard et al., 2016) . A study of voters exposed and unexposed to the opinion polls showed no statistical difference (Knappen, 2014) . One study is not sufficient to support or negate a relationship.

Another potential problem is SEME (Search Engine Manipulation Effect) by computer search companies (Bing, Google, DuckDuckGo, Chrome, etc.) (Epstein & Robertson, 2017) . Technically, SEME is not a deficiency with the statistical polling methodology. It is more of a direct attack on the voting process. The Epstein et al. study indicates that SEME is much stronger than a bias and could possibly qualify as an interference with the election process itself (Epstein, 2019) . Although SEME is a serious problem, it is not analyzed or discussed further in this article.

Solutions to problems associated with “opinions” in statistical polling methods are not fully studied in this article. These are issues that should be addressed by the professional organizations such as AAPOR (American Association for Public Opinion Research) and the various governmental entities that regulate this area.

A new methodology was discovered while trying to compress polling data. It is based on using error data rather than polling data. The error data is the difference between polling data and actual data. This methodology is covered in a patent application publication (Nelson, 2019) . It had a 92% accuracy of predicting the outcome in the last 4 elections. It had a 100% accuracy in predicting a Trump victory in the 2016 election.

On-Line methods include indirect and unknown communications, i.e. sampling without the knowledge of the person. This includes searching social media and Twitter^{TM} accounts for word frequency (Lampos & Cohn, 2013) , the ratio of positive words to negative words (O’Connor et al., 2010) , tweet counts (Tavabi, 2015) , etc. Businesses have successfully used on-line search queries, word mining, and credit transactions for years. It has been used to accurately calculate public health events, i.e. contagious diseases (Ginsberg et al., 2009) , hurricanes (Vlachos et al., 2004) , earthquakes (Sakaki et al., 2010) , etc.

On-line nonprobability surveys are becoming more popular and more accurate [Kennedy & Caumont 2016; AAPOR 2013]. In the 2012 presidential election, the on-line polls outperformed both telephone and live polls [Silver 2012].

A major concern exists with on-line polling. There will be entities that could use computer programs to plant words, tweets, queries, etc. in a manner to influence the poll outcome. Programs attempting to detect these intrusions can be circumvented, and are usually not prepared until after the intrusion becomes known. In addition, on-line polling may be affected by countries and entities outside the United States.

In the past, every time that a polling scheme is overwhelmingly wrong, a new methodology was adopted. This occurred in 1936 in the election of Franklin Roosevelt where Landon was the polling favorite, in 1948 when Dewey was the favored over Truman, and again in 2016 where almost all polls projected Clinton would win over Trump. In the 2016 election, Mr. Trump was underestimated by 6.9% and higher than the margin of error. Graphs show that the maximum error did not occur at the expected value, nor did the data align with a normal statistical bell-shaped distribution. Major vulnerabilities exist with combining fact- based statistical analysis with feeling-based opinions. Basic statistics equations do not cover feeling-based factors, i.e. biases, truthfulness, competency, nonresponse rates, etc. A comprehensive pollster accuracy study showed that the most widely used pollsters had significant biases favoring Democrats over Republicans. The 2016 polling failures illustrate deficiencies in the existing approach supporting the view that a new polling methodology is needed.

The author declares no employment or grant conflicts of interest regarding the publication of this paper. The Author is the same as the patent Applicant referenced in (Nelson, 2019) .

Nelson, M. D. (2019). Opinion Polls and Statistics Conflict—Time for a Change? Open Journal of Political Science, 9, 652-668. https://doi.org/10.4236/ojps.2019.94040

State | Poll | Poll | Actual | Actual | Polling Error | Polling Error |
---|---|---|---|---|---|---|

Clinton | Trump | Clinton | Trump | Clinton | Trump | |

Alabama | 31 | 53 | 34.6 | 62.9 | 3.6 | 9.9 |

Alaska | 34 | 37 | 37.7 | 52.9 | 3.7 | 15.9 |

Arizona | 42.3 | 46.3 | 45.4 | 49.5 | 3.1 | 3.2 |

Arkansas | 32.8 | 53.2 | 33.8 | 60.4 | 1 | 7.2 |

California | 54.3 | 32 | 61.6 | 32.8 | 7.3 | 0.8 |

Colorado | 43.3 | 40.4 | 47.2 | 44.4 | 3.9 | 4 |

Connecticut | 47.5 | 38.2 | 54.5 | 41.2 | 7 | 3 |

Delaware | 46.5 | 31 | 53.4 | 41.9 | 6.9 | 10.9 |

Florida | 46.4 | 46.6 | 47.8 | 49.1 | 1.4 | 2.5 |

Georgia | 44.4 | 49.2 | 45.6 | 51.3 | 1.2 | 2.1 |

Hawaii | 50.5 | 28 | 62.3 | 30.1 | 11.8 | 2.1 |

Idaho | 26 | 50 | 27.6 | 59.2 | 1.6 | 9.2 |

Illinois | 49 | 37.5 | 55.4 | 39.4 | 6.4 | 1.9 |

Indiana | 38.3 | 49 | 37.9 | 57.2 | −0.4 | 8.2 |

Iowa | 41.3 | 44.3 | 42.2 | 51.8 | 0.9 | 7.5 |

Kansas | 34.6 | 48 | 36.2 | 57.2 | 1.6 | 9.2 |

Kentucky | 36.5 | 51.5 | 32.7 | 62.5 | −3.8 | 11 |

Louisiana | 35.6 | 48.8 | 38.4 | 58.1 | 2.8 | 9.3 |

Maine | 44 | 39.5 | 47.9 | 45.2 | 3.9 | 5.7 |

Maryland | 60.3 | 26.6 | 60.5 | 35.3 | 0.2 | 8.7 |

Massachusetts | 55.7 | 26.3 | 60.8 | 33.5 | 5.1 | 7.2 |

Michigan | 45.4 | 42 | 47.3 | 47.6 | 1.9 | 5.6 |

Minnesota | 39.6 | 45.8 | 46.9 | 45.4 | 7.3 | -0.4 |

Mississippi | 41 | 50 | 39.7 | 58.3 | −1.3 | 8.3 |

Missouri | 39.3 | 50.3 | 38 | 57.1 | −1.3 | 6.8 |

Montana | 31.5 | 46.5 | 36 | 56.5 | 4.5 | 10 |

Nebraska | 29 | 56 | 34 | 60.3 | 5 | 4.3 |

Nevada | 45 | 45.8 | 47.9 | 45.5 | 2.9 | -0.3 |

New Hampshire | 43.3 | 42.7 | 47.6 | 47.2 | 4.3 | 4.5 |

New Jersey | 48.7 | 37 | 55 | 41.8 | 6.3 | 4.8 |

New Mexico | 45.3 | 40.3 | 48.3 | 40 | 3 | -0.3 |

New York | 50.3 | 31.3 | 58.8 | 37.5 | 8.5 | 6.2 |

North Carolina | 45.5 | 46.5 | 46.7 | 50.5 | 1.2 | 4 |

North Dakota | 29 | 47 | 27.8 | 64.1 | −1.2 | 17.1 |

Ohio | 42.3 | 45.8 | 43.5 | 52.1 | 1.2 | 6.3 |
---|---|---|---|---|---|---|

Oklahoma | 32.5 | 52 | 28.9 | 65.3 | −3.6 | 13.3 |

Oregon | 44 | 36 | 51.7 | 41.1 | 7.7 | 5.1 |

Pennsylvania | 46.2 | 44.3 | 47.6 | 48.8 | 1.4 | 4.5 |

Rhode Island | 48 | 36.5 | 55.4 | 39.8 | 7.4 | 3.3 |

South Carolina | 38 | 44.2 | 40.8 | 54.9 | 2.8 | 10.7 |

South Dakota | 36.3 | 47 | 31.7 | 61.5 | −4.6 | 14.5 |

Tennessee | 35.5 | 47 | 34.9 | 61.1 | −0.6 | 14.1 |

Texas | 38 | 50 | 43.4 | 52.6 | 5.4 | 2.6 |

Utah | 27 | 37.4 | 27.8 | 45.9 | 0.8 | 8.5 |

Vermont | 45.2 | 20.5 | 61.1 | 32.6 | 15.9 | 12.1 |

Virginia | 47.3 | 42.3 | 49.9 | 45 | 2.6 | 2.7 |

Washington | 50.3 | 36 | 54.4 | 38.2 | 4.1 | 2.2 |

West Virginia | 30.5 | 53 | 26.5 | 68.7 | −4 | 15.7 |

Wisconsin | 46.8 | 40.3 | 47.9 | 46.9 | 1.1 | 6.6 |

Wyoming | 20 | 56.3 | 22.5 | 70 | 2.5 | 13.7 |

Average | 3 | 6.9 |