Primitive explanations of primitive assumptions

Prof. Pulak Ghosh and Dr. Soumya Kanti Ghosh provided explanations to what I called as primitive assumptions in their work on payrolls data. I had pointed out to them, four areas where their assumptions were primitive. These were (1) assumption of a 50 per cent "haircut", (2) assumption of a 25 per cent "drop out" rate, (3) selection of the age band 18-25 years in the case of EPFO and (4) selection of a different band viz. 18-22 years in the case of ESIC.

Nevertheless, I welcomed the use of a payrolls database and even exhorted the government to use the work of the authors. The assumptions are not OK even if tweaking them don't alter their results. Here's why...

A "haircut" of 50 per cent used by the authors possibly means that they have assumed that there is a 50 per cent overlap between the different databases used by them. I call this a primitive assumption unless the authors have an explanation. The authors' response is that you may use any other level of haircut and it will not make any difference to the outcome. Left without an explanation but now with a newly-acquired lack of relevance of the level of the haircut, the assumption continues to remain primitive and now also of little use -- with respect to the desired outcome, I suppose.

The second question is whether the term "drop out" rate of 25 per cent meant a labour participation rate of 75 per cent for the age group 18-25 years? If yes, then is this not too high? Their answer is that this is based on discussions. But, the age-wise LPR is known to be very low in the age group 18-25, and even the best LPR in India is around 60 per cent for the middle aged. If their study chooses to pick 75 per cent from discussions rather than empirical work then it is worse than primitive because it chooses to not use established wisdom of the inverted U curve of the LPR.

My third question is about their cherry-picking of EPFO accounts of only those people who are between 18 and 25 years of age. There is no explanation of this choice. This is the age group that sees the maximum new job-seekers and also the maximum job turnover. But, it is primitive to assume that these are new jobs. Youngsters don't get only "new" jobs. They get replacement jobs as well - replacements for retirements and for jobs turnover.

Indrajit Gupta, in an article in Business Standard on January 25, mentioned that according to a survey 70 per cent of students quit their first job after graduating within a year. The probability of a student quitting a first job within two years is obviously even higher. Thus, even if students take their first job anytime between 18 and 23 years of age, the probability of almost all of them taking two jobs by the time they are 25 is very high. Therefore, the possibility of double counting of jobs within the age group chosen by the authors is almost a certainty. In fact, people can move more than two jobs before they reach 26 years of age.

Ghosh and Ghosh have revealed that there is a cluster around 22 years of age. They claim that 22 years is best age for the first job. But, it is unlikely that this cluster would be bigger than all the first jobs till age 21. And, if the 22 year-olds change jobs by age 25 or more as the authors explain then why will the 18 to 21 year olds remain in their first jobs without any change till 25? Evidently, a cluster around 22 years does not explain anything. The explanations are as primitive as the assumptions were in the first place.

The authors are silent on the difference in the chosen age groups for the EPFO database (18-25) and ESIC (18-22).

It appears that the authors have not done any de-duplication of overlapping records or adjustments for bulk additions from small firms. Instead, they made some assumptions which they believe will automatically overcome problems of duplication. This is a very primitive approach.

Finally, their stance that privileged access to data does not matter is untenable. Replication is the essence of all research today. Results that cannot be scrutinised and replicated by independent referees (independent of the authors and sponsors) do not carry any credibility. And to clarify, all CMIE databases, even at the record-level, are available for a subscription and we institutionally support replication in research.

 
/> Every Tuesday, Business Standard brings you CMIE’s Consumer Sentiments Index and Unemployment Rate, the only weekly estimates of such data. The sample size is bigger than that surveyed by the National Sample Survey Organisation. To read earlier reports on the weekly numbers, click on the dates:
November 21November 28December 4, December 11December 18December 25January 1January 8January 15 , January 22January 29February 4 , February 12February 19February 27March 5March 13March 19, March 26April 02, April 10April 17April 23May 1May 8May 15May 21May 28June 4June 11June 18June 25July 2July 10July 16July 23July 30August 7August 14August 21August 27September 3September 10September 17September 24October 1October 8October 15October 22October 29November 5November 12November 19November 26December 5December 11December, 17December 25January 2January 7January 14, January 21
/>
Methodology

Consumer sentiment indices and unemployment rate are generated from CMIE's Consumer Pyramids survey machinery. The weekly estimates are based on a sample size of about 6,500 households and about 17,000 individuals who are more than 14 years of age. The sample changes every week but repeats after 16 weeks with a scheduled replenishment and enhancement every year. The overall sample size run over a wave of 16 weeks is 158,624 households. The sample design is of multi-stratrification to select primary sampling units and simple random selection of the ultimate sampling units, which are the households.

The Consumer Sentiment index is based on responses to five questions on the lines of the Surveys of Consumers conducted by University of Michigan in the US. The five questions seek a household's views on its well-being compared to a year earlier, its expectation of its well-being a year later, its view regarding the economic conditions in the coming one year, its view regarding the general trend of the economy over the next five years, and finally its view whether this is a good time to buy consumer durables.

The unemployment rate is computed on a current daily basis. A person is considered unemployed if she states that she is unemployed, is willing to work and is actively looking for a job. Labour force is the sum of all unemployed and employed persons above the age of 14 years. The unemployment rate is the ratio of the unemployed to the total labour force.

All estimations are made using Thomas Lumley's R package, survey. For full details on methodology, please visit CMIE India Unemployment data and CMIE India Consumer Sentiment.

The creation of these indices and their public dissemination is supported by BSE. University of Michigan is a partner in the creation of the consumer sentiment indices.