Opinion: Poonam Munjal and Palash Baruah.
Data accuracy. To improve data system, NSO is now the nodal agency for all statistical activities.
The National Sample Survey, under the Ministry of Statistics and Programme Implementation, has been conducting large-scale sample surveys on All-India basis since the 1950s.
These surveys are the most important and reliable sources of data on several subjects including employment, household consumption, housing conditions, health, education, village amenities, and domestic tourism.
The reference period for these surveys is a crucial aspect of the entire data collection system. The reference period refers to the time period for which the data are collected. It can vary based on the type of data, the methodology used for collection, and the purpose of the data. The reference period can be daily, weekly, monthly, quarterly, or yearly.
The survey period, on the other hand, is the period during which heldwork is carried out. Clearly, the reference period and the survey period are two different concepts. The time lag between the survey period and the reference period can lead to confusion and misinterpretation of data. Talking about a specihc survey, the annually conducted Periodic Labour Force Surveys (PLFS) are the principal source of data on employment, unemployment and underemployment situation in the country.
The survey period of any PLFS, say PLFS 2019-20, is July 2019 to June 2020, coinciding with the agricultural year. And the reference period for the usual status of employment is last 365 days preceding the date of the survey. Therefore, for the respondent who was surveyed on, say July 1, 2019, the information related to his/her employment details pertains to the period July 1, 2018 to June 30, 2019.
And for the respondent who was surveyed on, say June 30, 2020, the similar information pertains to the period July 1, 2019 to June 30, 2020. This makes a total of 730 days for which data on employment get collected, whereas, for all practical purposes, the data collected in this round of survey is referred to as that for 2019-20.
Whether the data users consider these data for the hnancial year or the agricultural year (which is what it is supposed to be) is another issue.
Recall lapse, of course, is an even bigger issue, especially in the case of expenditure-related surveys, like health, education, and particularly domestic tourism. Across the world, the reference periods are of much shorter duration. International Labour Organization (ILO) recommends a reference period of one week for labour force surveys, and the United Nations recommends a reference period of one month for household surveys.
For instance, in the United States, the Bureau of Labor Statistics considers the calendar week that contains the 12th day of the month. Labour force surveys in OECD countries consider the last four weeks as a reference period. Australian Bureau of Statistics considers one week before the date of the interview as reference period. The PLFS surveys also, of course, collect information on the current weekly work status besides the usual work status, and the reference period for the current weekly status is the last seven days. The short reference periods can reduce the risk of errors or misinterpretation.
Respondents are more likely to provide accurate and reliable information about their activities and behaviours if the reference period is short. In contrast, longer reference periods may result in respondents’ recall lapse leading to inaccurate data.
Also, shorter reference periods can capture changes in behaviour and activities that occur due to seasonal changes. The issue of recall lapse is of much greater concern in the case of expenditure surveys, such as the Domestic Tourism Survey. This survey enquires whether the household has completed an overnight trip in the last 365 days. If yes, then the expenditure details of all of those trips are recorded. That makes the total period to be one year, as described earlier.
Further, an overnight trip is defined as the movement outside the usual environment for the duration ranging from less than 12 hours in two consecutive calendar days to six months. Hence, a trip that started about a year-and-a-half prior to the date of the survey also gets recorded, along with its details on expenditure. The information collected, therefore, is for about two-and-a- half years.
Both recall lapse and misinterpretation of the reference period for tourism expenditure are quite a possibility in such a case. India’s data system also faces challenges related to data quality, timeliness, and coverage. The quality of data can deteriorate due to under-reporting, measurement errors, and sampling biases. Timeliness is an issue in India, as data are often released with a lag.
Coverage can also be a challenge, as some population groups are under-represented in the data.
To address these challenges, India has undertaken several initiatives to improve its data system. In 2019, the government established the National Statistical Office (NSO) as the nodal agency for all statistical activities in the country.
The NSO aims to improve the quality, timeliness, and coverage of data. The NSO has also introduced several reforms to improve data collection and analysis. For example, the NSO has developed a computer-assisted personal interviewing (CAPI) system for data collection, which has improved the accuracy and speed of data collection.
Going forward, NSO could potentially leverage machine learning and artihcial intelligence (AI) technologies to enhance their data processing and analysis capabilities. However, the reference period may be required to be revisited and aligned with international best practices. This will enhance the quality and usefulness of primary data collection in the country.
Munjal is Professor; and Baruah is Associate Fellow at National Council of Applied Economic Research. Views expressed are personal.