Detecting Healthcare Fraud using Machine Learning

As the elderly populations rise, so does medical care costs that come with treating those that need to be served. Medicare provides insurance to those 65 and older to help with the financial burden of healthcare. Medicare costs about $588 billion and is expected to increase by 18% in the next decade. Healthcare fraud is estimated by NHCAA to be as much as 10% of the nation’s total healthcare spend, or $58.8 billion. Fraudulent claims include both patient abuse or neglect, as well as billing for services that were not received. By using publicly available claims data, machine learning can be used to help detect fraud in the Medicare system helping reduce the cost to taxpayers.

Machine learning is a subset of artificial intelligence that can find a fraudulent needle in the haystack by applying continuous learning algorithms. With each instance that the algorithm is right about a fraudulent transaction, that information goes back into the equation, making it smarter. The same happens when the algorithm is wrong.

Using unsupervised machine learning on publicly available datasets is a growing trend with great potential. The publicly available Medicare claims data has 37 million cases. In machine learning, an essential part of the process is labeling as it affects both the data quality and the performance of the model. Different researchers have created the labels for fraud and non-fraud by mapping the data with other publicly available resources like the National Provider Identifier and List of Excluded Individuals and Entities database. The 37 million cases can then be reduced to under 4 million that can be run through the machine learning algorithm to help identify fraudulent providers.

For example, unsupervised machine learning has been used successfully on Florida’s Medicare data to detect anomalies in Medicare payments using regression techniques and Bayesian modeling. Also, decision tree and logistic regression with random undersampling class distributions have provided some promising results. Initial results have indicated that having more non-fraud cases has helped the model learn better and produce more accurate results between fraud and non-fraud cases.

Using machine learning to detect fraud is game-changing. Machine learning allows humans to be notified early on in the fraud attempt, stopping losses earlier on in the process. Having a continuous look on publicly available data can go a long way in helping minimize fraudulent claims and accelerate the time to prosecute criminals. 

#BigData #MachineLearning #AI #Healthcare