Lost in translation: Why pollsters go wrong so often

- The conversational method of data collection such surveys use have inherent limitations. The accuracy of results often depends on the sociopolitical setting of the interview and the interviewer’s ability to analyse responses.
The recently concluded elections in Haryana and Jammu and Kashmir came with their usual collection of opinion and exit polls. The election results gave us one more example of how these polls can be at variance with what people actually choose.
There is a tendency among scholars to believe that this problem applies only to political and opinion polls. It is important to realize, however, that the reasons for some of these failures lie in the very nature of conversational data collection, where data is gathered through a structured process of conversation between a data collector and respondent.
As most sample surveys and India’s population census rely on conversational methods, it is useful to examine the nature of errors that occur in this process.
First, these conversations involve two parties: the interviewer and the respondent. They come into the discussion with very different perspectives and backgrounds. One source of error arises from the fact that the two parties often do not speak the same ‘language.’
Also read: Vivek Kaul on how the Haryana verdict spawned a market for instant analysis
By language, we do not mean merely linguistic considerations, but also the nuances of understanding. Take, for example, the question, ‘are you working?’ Economists and statisticians understand ‘work’ to be any effort that yields pay or profit.
However, in general conversation, the understanding of what constitutes ‘work’ can be very different (for example, it also encompassing notions of ‘decency’ and ‘adequacy’). Thus, young individuals preparing for government jobs may describe themselves as being unemployed, even though they may be earning from part-time tuitions.
In such cases, survey organizations lay emphasis on the interviewer’s ability to parse the language of the respondent, making the accuracy of results dependent on the quality of the interviewers’ training.
A similar problem arises when trying to measure income. Even trained economists and accountants can struggle to define what ‘income’ is for a variety of work settings (consider, for example, today’s evolving conversation on what constitutes income for a social media influencer).
Moreover, it is important to understand that these conversations do not occur in a vacuum. They occur usually in a social setting where even quiet bystanders exert their influence on the outcome of the dialogue.
For instance, women and people belonging to non-dominant communities may hesitate to express their views in a public space. Just as the respondent’s identity matters, so does the setting and identity of the interviewer.
Also read: Congress’ Haryana in-charge Deepak Babaria offers to quit after party loses state elections
The responses garnered can be impacted by factors such as gender, the interviewer’s perceived objective and their political/institutional affiliations, among others.
Professor Vikas Kumar of Azim Premji University analyses the power of such influence in the conduct of the census, in which he observes that outcomes were influenced by the sociopolitical settings in which these conversations took place.
Professor Kumar showed that the Census results in Jammu and Kashmir in 2001 and 2011 and in Nagaland in 2001 were influenced by political mobilization around the survey itself.
In addition to the conversational dimension, there are other challenges. The logistics of data collection pose constraints. Until recently, the National Sample Survey excluded large parts of the Northeast as they were difficult to reach. Even today, parts of the Andaman Islands are formally excluded from conversational data collection to protect primitive tribal groups.
Likewise, there is the difficulty of conducting a conversation with the very affluent—not just the super-rich (as popularly conceived), but also occupants of posh gated colonies. Rising non-response rates pose a challenge to surveyors all over the world.
There is a view that technology can be used to mask the public character of conversations and increase respondent openness. However, attempts at anonymization by using technologies such as telephones or the internet create their own sources of bias.
For instance, access to households may not equal access to all members of that household. Further, the proliferation of telemarketing, spam calls and scams has led to a decrease in people’s willingness to participate in telephonic conversations with unknown third parties. Similar challenges confront the use of the internet for surveys.
As the ability to conduct conversational surveys has improved, the resultant explosion of surveys by a variety of agencies, both public and private, has created its own issues.
Also read: Exit poll results today: How accurate pollsters were for Haryana, J&K in past elections?
To illustrate, there is an apocryphal story that villagers in Palampur were the subjects of so many socioeconomic studies and had become so adept at responding that they would ask visiting scholars, “Aap questions MPhil ke liye poochh rahe ho, ya PhD ke liye?’ (are you asking for an MPhil project or for a PhD?) and tailor their answers accordingly.
Understanding the limitations of conversational data collection allows a better interpretation of outcomes. We must also recognize the value of non-conversational data, whose scale and scope has increased enormously in recent years.
This includes sources such as administrative data (for example, data collected by administrative agencies in the course of their work) and transactional data (which includes data from e-commerce platforms and payment interfaces such as UPI, FASTag, etc).
Commercial entities quickly realized the value of this data and use it extensively, but there continues to be a reluctance on the part of governments and policy analysts to engage with it.
To a certain extent, this is an unfortunate legacy of our early success in using conversational data generated by sample surveys to fill the data gaps created by a weak post-colonial administrative structure.
Developments over the last few decades in both state capacity and technology have made the biases of this historical legacy obsolete. It is time for us to revisit the weights we assign different forms of data used in social analyses.
