21 September 2021
Bhattacharya is being supervised by Edo Roos Lindgreen, and assisted by fellow researcher Ana Micovic. His research draws on 10-K reports. These are documents that publicly traded US companies are required to file every year. Unlike the traditional way of detecting fraud, which basically relies on quantitative research, the ABS researcher is applying a machine-learning model in order to get at the deeper meaning of texts through contextual information.
'Bookkeeping fraud is a worldwide problem that inflicts major damage on financial markets,' the researcher says. 'Well-known examples are the 2001 Enron scandal and the 2002 WorldCom scandal. Companies are of course thoroughly vetted if there’s a suspicion that something is wrong but the challenge is knowing which companies to focus on. There are thousands of publicly traded companies with perhaps just a dozen companies where there’s a serious problem in any given year. Which ones should you target? To find a satisfactory answer, we are working on an AI model that should perform better than alternative existing benchmark models from the financial literature.'
Bhattacharya, who has Indian citizenship, previously worked as a data scientist for Razorthink and as a risk analyst for McKinsey. 'One of the reasons for going to the Amsterdam Business School is that I was looking for more freedom to do research, especially on machine-learning algorithms and their application to company audits. The world of accountancy offers numerous challenges where machine learning will provide a solution, including fraud detection.'
Signals pointing to bookkeeping fraud can of course be found in financial figures but also in business texts. 'Fraudulent managers are more inclined to use certain formulations and are influenced by the context in which they’re writing.' Bhattacharya illustrates how this works with an example: 'Suppose you regularly exchange emails with 2 colleagues but, one day, you see a message without immediately knowing if it came from colleague A or colleague B. In this case, you’ll probably still be able to tell who sent it by looking at the writing style. That’s simply how the brain works.'
In his research, Bhattacharya is using natural language processing, a form of machine learning where computers can recognise written and spoken words in the same way as human beings. He is using the BERT NLP model, originally developed by Google AI. 'It understands English and is able to learn the context of words and passages very quickly. It has been trained on billions of texts. If you take a text about the earmarking of funds, for instance, the model can predict what lines are most likely to come next. We are continuing to train the model on specific 10-K texts so that it can recognise fraud.'
One of the downsides of models like BERT is that they are highly computational. 'When you train such a model on a new data set, it involves the optimisation of millions of parameters. So you need a lot of computer power. Fortunately, UvA gives access to the LISA supercomputer. But even with this high-end equipment, it still takes 2 to 3 days to calculate the results of any given experiment. This makes the research very time-consuming.' Apart from technical complexity, another disadvantage of using machine learning to detect accounting fraud is the lack of explainability. 'Why does the model do what it does? There’s no clear indication of what the model sees and does. This poses a challenge in a strictly regulated environment with substantial economic interests such as the world of finance. So we really have to show that our model is better than the existing benchmarks by getting some concrete results.'
In any event, Bhattacharya is convinced that the model adds value. 'We’ve seen that it works and are very confident that our research will benefit auditors, shareholders and monitoring institutions in identifying and investigating fraudulent companies. We’ll have a paper ready for publication in the near future although right now it’s still a work in progress. Ultimately, we want to put the model on the market but, even then, it’ll require a great deal of fine-tuning and optimisation. Also, the world of accountancy is not all that familiar with machine learning. I hope that, in the coming years, I’ll be able to continue to play a role here by removing any obstacles and by exploring and encouraging the use of advanced models. There are simply numerous opportunities in a whole range of areas, with fraud detection being just one example.'
Aside from his research as a PhD student at the Amsterdam Business School, Indranil Bhattacharya is a Kaggle expert who competes in different code competitions occasionally.