NSSO jobs data discredited as it challenges carefully cultivated narratives

The two-part article authored by the CEO of NITI Aayog (Business Standard February 6 and 7) raising questions on the report of the National Sample Survey Organization (NSSO) on employment-unemployment strikes at the fundamentals of this widely respected institution and the well-established process followed by it. The provocation for discrediting the NSSO arose because the data collected by NSSO in its latest labour force survey purportedly contradicted certain carefully cultivated narratives. Unfortunately, this has raised questions on the independence of statistical agencies. This short note covers two broad issues. The first concerns the questions raised on the methodology used by NSSO in its newly introduced Periodic Labour Force Survey (PLFS) and the second is about how the objectivity of the official statistical system nurtured over the years itself is questioned in this effort to protect the dominant narrative. 


Labelling the PLFS estimates in question “half-baked” not only reflects a lack of confidence in the large section of Indian Statistical Service officials engaged in designing, conducting and data processing of the survey, but also a sheer disregard for the Standing Committee entrusted with the task of overseeing the statistical and operational aspects of the survey. The Standing Committee consists of reputed survey statisticians and other experts and is headed by Prof SP Mukherjee, one among the few senior statisticians of eminence with long experience in the country at present. The article in effect casts aspersions on the large-scale sample survey procedures followed by the NSSO. These procedures have evolved gradually over more than six decades and are held in high esteem internationally. The procedures followed are rigorous, set in the institutional setup of the organisation. These are, in fact, applications of sampling theory designed to suit the specific objectives of the survey. The theory dictates the estimation procedure, that is, the algebraic formulas for deriving the estimates from the collected data to be adopted for a survey and the procedure itself is invariably drawn up much before completion of the fieldwork for data collection. This leaves no room for the external experts — even those in the NSC — to modify the estimates derived from the collected data. Modification, if any, is permitted only on the commentaries on the estimates made in the survey report. Thus, the question of approving or disapproving a report because the estimates do not conform to expectations, hunches or gut feelings of individuals, groups or institutions does not arise. 


On comparability


A survey is conducted to produce hundreds of estimates on different aspects of the study population. Whether or not the main results of two large-scale sample surveys on a given subject are comparable depends mainly on (i) the concepts, definitions and reference periods adopted for the surveys; (ii) how the sample of households are drawn; and (iii) how closely the set procedures are followed in fieldwork.


As for the factor (i), there is virtually no difference between the last employment and unemployment survey conducted in 2011-12 and the PLFS. The factor (ii) also can be disregarded for the surveys in question. For its household surveys, the NSSO has been using basically the same sampling design over the years, with some fine tuning made every year with the objective of improving accuracy of important estimates. In the PLFS the main departure from the usual practice was that of repeated visits to urban households in the sample, with no basic change in the sampling procedure. The finetunings are not known to have brought about any significant change in accuracy of the estimates and thus do not make the results of two surveys “not comparable”. The main outcome of a labour force surveys is undeniably the estimates of employment and unemployment rates. Whether these estimates are comparable depends mainly on the bias caused by the factor (iii), particularly in field operations. The issues relating to the existing field conditions raised in the article are indeed most pertinent, but the answers provided unfortunately can at best be said to be presumptuous.


On reliability and sample size


Here the author falls into the error, common among non-statisticians, of believing that the accuracy of the estimates is determined by the sampling fraction, which is the ratio of sample size to population size, such as “3 out of 1,000” cited in the article. Though counter intuitive, the theoretically established fact is that the accuracy actually depends on the sample size, with the sampling fraction having virtually no role to play, as long as it is small, say, under 1 per cent.  This implies that a sample of 55,000 households drawn from a population of 200 million households produces results as reliable as a sample of the same size drawn from a population of only 2 million households. Like the other household surveys of NSSO, the PLFS is designed to provide reliable estimates at the state-level. The minimum sample size worked out for estimating a ratio reliably applies to all the states —  whether as large as Uttar Pradesh or as small as Goa. The sample size on which national-level estimates of the PLFS are based are, in fact, much much larger than the minimum sample size required to produce reliable estimates of unemployment rate.  


Further, the author, noticing the higher-than-proportionate allocation made in the sample for households with members educated above a certain level, jumps to the conclusion that the data reported by these households will have a disproportionately large effect on the estimates. This is a common misapprehension among those unfamiliar with survey sampling methods — actually this does not happen because when some population groups are over-represented in the sample, correspondingly low blow-up factors are used in the estimation formula for households from these groups.


The issue of survey estimates of the population being lower than the known population figures have been examined by experts in the past and are well known to NSSO data users. It is one of the reason why the NSSO provides only estimates of rates and ratios that are known to be free from this problem. In fact, experts in the Indian Statistical Institute Kolkata have already developed a procedure to calibrate the survey estimates with the population data that will address this problem.


Having said this, the need for increasing sample sizes to produce estimates at lower levels of aggregation like regions, districts, population groups etc cannot be over emphasised. The manpower resources of the entire national statistical system including the NSSO has remained the same since the eighties. Commensurate with the increase in population and expansion of the economic activities this requires to be urgently augmented and the reliance on temporary investigators minimised. While technology can address the question of data recording, transmission and processing, the actual survey data collection remains a task best performed through personal interviews with the respondents.


(Next: Autonomy of statistical agencies)


Mohanan was a member of the National Statistical Commission and resigned his position recently. Kar is a survey statistician and member of the Standing Committee for Labour Force surveys that guided the Periodic Labour Force Survey. He is currently associated with the ISI Kolkata.

Business Standard is now on Telegram.
For insightful reports and views on business, markets, politics and other issues, subscribe to our official Telegram channel