Benford’s Law is the law of anomalous numbers or the first-digit law. Basically, it states that in many real-life sets of numerical data, the leading significant digit is likely to be small. So in sets which obey the law, the number 1 appears about 30% of the time while 9 appears less than 5% of the time, as opposed to a uniform distribution where each digit would appear 11.1% of the time. This result applies to a large variety of datasets including, count of twitter followers, population numbers, electricity bills, etc.
At Prodigal we deal with a lot of data and we thought it would be very interesting to see if any of our datasets follow this law. The first candidate was a fairly obvious one, to check if the debt amounts follow this distribution.
Largely the law holds true with 1 being an outlier.
However, debt amount owed is just a part of the story. So we extended this analysis to the bread and butter of what we do at Prodigal i.e. speech analytics and NLP. We found out the “Active Talk Time” from the millions of calls we process. Active Talk Time is the actual time the borrower and agent talked, removing transfers and silence from it.
We observe that the distribution of digits does follow the percentages laid out by Benford’s Law.
In today’s world where there is a glut of data, it is imperative that we be conscious of the veracity of it. All of our business insights and data models are built on top of data, however, if the underlying data itself is tainted, the insights are worthless. Benford’s Law is one of the many statistical techniques we use at Prodigal to ensure the sanity and truthfulness of data.
This post was originally written by our data scientist, Akshat Vaidya.