Minding the gaps in India's data infrastructure

24 Oct 2019

The national discourse can ill-afford the danger of being hijacked by the poor quality of data.

Last week demographers from around the world gathered in Delhi to mark 25 years of National Family Health Surveys (NFHS). It was both a celebratory and sombre moment. Policymakers and researchers celebrated tremendous achievements of four rounds of the NFHS since 1992-93; these have provided data on Indian families and allowed for development and evaluation of public policies regarding population health education and the empowerment of women. It was also heartening to see the political commitment towards ensuring the continuation of this outstanding survey programme at regular and predictable intervals. Nonetheless a single concern permeated the two-day conference. Can India’s existing data infrastructure support high quality data collection or are we staring at a precipice where deteriorating data quality will lead evidence-based policy development astray?

Presentations by Dr. Amy Tsui Professor at Johns Hopkins University and Dr. Santanu Pramanik Deputy Director National Council of Applied Economic Research (NCAER)-National Data Innovation Centre on contraceptive use highlighted the difficulties in obtaining reliable high quality data. Between 2005-06 and 2015-16 the total fertility rate (TFR) declined from 2.68 to 2.18 births. However instead of being accompanied by increased contraceptive use as would happen during normal circumstances contraceptive use also declined from 56.3% to 53.5%. Using different approaches both Prof. Tsui and Dr. Pramanik came to the same conclusion — that this aberration must be attributed at least partially to declining quality of contraceptive use data in NFHS-4.

Much of the data quality discussions in the past have erupted when politically sensitive results around topics such as GDP growth rate or poverty rates have been released and partisan bickering allows for little room to think about data collection systems. A retrospective look at the way in which an outstanding programme of research such as the NFHS has changed over time along with the nation it chronicles and emerging challenges facing the NFHS and other data collection efforts provide an opportunity to look at overall challenges facing our data infrastructure in a constructive manner.

As Pravin Srivastava Chief Statistician of India noted at the NFHS conference there is an amazing greed for data in modern India. This greed ranges from wanting to evaluate success of Poshan Abhiyaan (nutrition programme) to measuring changes in the aspirational districts. However he also noted that the once vaunted Indian statistical infrastructure is crumbling and is not able to fulfil even its traditional tasks let alone meet these new demands.

Being realistic

I would like to submit that every government over the past two decades has been complicit in this neglect. If we are to move towards developing a more robust data infrastructure subscribing to the following core principles may be a good start. First set realistic goals and use creative strategies. In order to obtain data at the district level the sample size grew from about one lakh households in NFHS-3 to over six lakhs in NFHS-4. At that time the National Statistical Commission had expressed a concern that such an expansion may reduce data quality. There was a fair amount of agreement among the participants at the NFHS conference that this concern may have been prescient. The government’s need for district-level estimates of various health and population parameters is legitimate but do we need to rely on household surveys to obtain them? With a variety of small area estimation techniques available for pooling data from diverse sources to obtain robust estimates at district level it may make sense for us to think of alternatives and to make sure that we obtain required local government directory identifiers in each aspect of government data including Census sample registration system and Ayushman Bharat payment systems to ensure that these data can be pooled and leveraged.

Ensuring quality

Second adapt to changing institutional and technological environment for data collection. Veterans of the Indian statistical system blame deteriorating data quality on the move from regular employees to contract investigators at the National Sample Survey and use of for-profit data collection agencies in the NFHS. For better or worse that train has left the station. Rising government salaries combined with increased technological needs of modern data collection systems make it difficult to rely on veteran investigators in the civil services to meet all of government data needs. However if we are going to rely on outside data collectors what do we need to do to ensure quality? Some of the initiatives undertaken by the Ministry of Statistics and Programme Implementation for developing training programmes for investigators offer a welcome improvement but stop far short of the radical restructuring of data collection oversight.

I have enormous empathy for field investigators. They work under difficult conditions and are sometimes employed by for-profit agencies that require unrealistically high levels of output. Nonetheless this is the data that guides the policies affecting millions of Indians and must be faithfully collected. Where interviewers make a mistake they must be retrained. Where agencies impose an unrealistic workload they must be checked. However discovering mistakes after data collection has been completed is far too late to take any corrective steps. Concurrent monitoring using technologically-enabled procedures such as random voice recording of interviews judicious back checks and evaluation of agency and interviewer performance on parameters such as skipping sections inconsistent data and consistent misreporting may be needed to ensure quality. Academician Dr. Leela Visaria noted the declining role of State population research centres in NFHS data collection. It may be worth investigating if they can be involved in quality monitoring.

Need for exclusive units

Third establish research units exclusively focused on data collection and research design. At one point in time innovative research on the NSS was undertaken by an associated unit at the Indian Statistical Institute in Kolkata. Since the dissolution of this association very little research on data collection techniques takes place in India. We know little about whether men or women are better responders for data on household consumption expenditure. Nor do we know the extent of discrepancy in reporting on employment data between a direct response from women in the household vis-à-vis a proxy response via the household head. Do Likert scales that ask individuals to respond on their health status in five categories work well in India or do Indian respondents avoid choosing extreme categories? How does the presence of other people bias responses on contraceptive use? And does it have an equal impact on reported pill use as it does on sterilisation?

While research on data collection methods has stagnated research methodologies have changed phenomenally. Telephone surveys via random digit dialling or selection of respondents using voter lists are increasingly emerging as low-cost ways of collecting data. However we know little about representativeness of such samples. Are men or women more likely to respond to telephone surveys? Are migrants from other States well represented on the voter list?

Unless we pay systematic attention to the data infrastructure we are likely to have the national discourse hijacked by poor quality data as has happened in the past with a measurement of poverty or inconsistent data on GDP.

Sonalde Desai is Professor of Sociology University of Maryland and Professor and Centre Director NCAER-National Data Innovation Centre. The views expressed are personal

Published in: The Hindu, October 24, 2019