Fudging Aadhaar

The Supreme Court’s interim order of March 13 appears incongruous to the earlier unanimous verdict on the right to privacy and the Constitutional right to equality. The deadline to link Aadhaar with bank accounts and phones have been indefinitely extended while they aren't extended for notifications of welfare schemes for the poor such as old age pensions and MGNREGA. In a nutshell, it implies that Aadhaar will be voluntary for the rich while being mandatory for the poor to access welfare schemes that are critical for survival. In other words, sadly, right to privacy crucially hinges on the class of a citizen. In essence, it reinstates the Orwellian cliche, that some of us, readers of this newspaper for example, are more equal than others.

Be that as it may, this differential access to “freedom” implies that we can happily choose to file our taxes without Aadhaar. There is a small catch though. For some inexplicable reason, no one can file their returns without an Aadhaar number. As such, even conscientious taxpayers, who do not have Aadhaar, are reportedly being forced to provide an arbitrary 12 digit number as a proxy for Aadhaar.

Anecdotal evidence suggests that taxpayers are trying out their hand at imagining a random Aadhaar number. It is not the intention to suggest that anyone should do that but, in this context, it is instructive to learn about Benford’s law (also called first digit law).  In many data sets where the numbers vary across orders of magnitude, the first digit of most data points is small. For example, consider a 250-page book. There are about 109 pages whose page numbers begin with the digit 1 (pages 1,11-19,100-199), 60 pages beginning with the digit 2 while only 10 pages beginning with the digit 9. As another example — consider the number of people in various age-groups. As a proportion of the population, chances are there are more people with ages that start with the digit 1 than with the digit 9. 

At first, Benford’s law appears to be counterintuitive because if the distribution of the first digit in data sets are truly random, then each digit between 0 and 9 should have the same chance, i.e., 1 out 10 (11.1 per cent) of being the first digit. However, the frequency distribution of leading digits in many empirical data settings indicate otherwise. In particular, this holds true for datasets that grow exponentially such as bacterial colony data, populations of cities, and not to forget income tax data. It is empirically observed that, on an average, the first digit is 1 in about 30 per cent of the cases, it is 2 in about 17.5 per cent of the cases, and it is 3 in about 12 per cent of the cases. The systematic  pattern of the first digit being further away from 1 continues to decline and number 9 is the first digit in only about 5 per cent of all the data points. The fact is a consequence of digits being non-uniformly spread on the original scale but uniformly distributed on a logarithmic scale, which is used to rescale data growing in orders of magnitude. Very simply, logarithms (log) tell us the number of digits after the first digit in a number. So log 10 = 1 and log 1000 = 3.

This phenomena was first highlighted by the astronomer Simon Newcomb in 1881 when he observed that the initial pages of the logarithm tables were yellower and more smudged than the latter pages. This led him to conjecture that the logarithms of numbers beginning with the digit 1 were more prevalent than numbers beginning with higher digits. The same principle was later tested and verified across several datasets by the physicist, Frank Benford of GE Research in 1938. He tested the hypothesis by looking at surface areas of rivers, US population, natural scientific constants, numbers in Reader’s Digest magazine, street addresses of hundreds of people etc.

In the 1990s, the researcher Mark Nigrini used Benford’s law to track accounting frauds. Owing to a policy threshold of $100,000, a fraudster wrote several cheques to himself just below this threshold, i.e., with a first digit in the cheque amount being 9. This obvious departure from the expectation of 5 per cent of numbers beginning with the digit 9 was a red signal to catch accounting fraud. The law has been similarly used to look at fudged numbers in income tax returns. In fact, this law is legally permissible as evidence in criminal cases in the USA.

However, Benford’s law does not apply to data that have a clear maximum and/or data that have been assigned. Consequently, this law will not apply to the oft-compared twin (quite incorrectly so) of Aadhaar — Social Security Numbers (SSN) in the USA. The 9 digit SSN are not randomly generated but have a well defined structure of assignation.

Given that Aadhaar is a 11 digit random number (the 12th is technically not), will Aadhaar numbers also follow Benford’s law? Given the strong data security and data protection laws in India, you may be caught simply because you sprinkled ones and nines equally in your fudged Aadhaar number and income taxes because you wanted to pretend to be random.


Outbrain