Benford's law - PHBS/MLF GitHub Wiki
Background
Benford’s Law, also known as the Law of First Digits or the Phenomenon of Significant Digits, is the finding that the first digits (or numerals to be exact) of the numbers found in series of records of the most varied sources do not display a uniform distribution, but rather are arranged in such a way that the digit 1 is the most frequent, followed by 2, 3, and so in a successively decreasing manner down to 9.
Figure: The distribution of first digits, according to Benford's law. Each bar represents a digit, and the height of the bar is the percentage of numbers that start with that digit. (Source: Wikipedia)
Project goals
Benford's law has been used to detect financial fraud/crime because artificially generated numbers do not follow Benford's law. The goal of project is how to use Benford's law and machine learning together to further increase the accuracy of the detection. Read some references below.
References
- Benford's law: Wikipedia
- Koch, C., & Okamura, K. (2020). Benford’s Law and COVID-19 reporting. Economics Letters, 196, 109573.: Media and politicians have cast doubt on Chinese reported data on COVID-19 cases. We find Chinese confirmed infections match the distribution expected in Benford’s Law and are similar to that seen in the U.S. and Italy.
- Badal-Valero, E., Alvarez-Jareño, J. A., & Pavía, J. M. (2018). Combining Benford’s Law and machine learning to detect money laundering. An actual Spanish court case. Forensic Science International, 282, 24–34.
- Bauer, J., & Groß, J. (2011). Difficulties Detecting Fraud? The Use of Benford’s Law on Regression Tables. Jahrbücher Für Nationalökonomie Und Statistik, 231(5–6).
- https://towardsdatascience.com/what-is-benfords-law-and-why-is-it-important-for-data-science-312cb8b61048
First Digit Distribution Pre-Lockdown number of confirmed cases in Chinese Provinces, U.S. States and Italian Regions.(Source: Koch, C., & Okamura, K. (2020))