HomePage » Fraud Detection (1) Fraud Runs Rampant! Building Anti-Fraud Strategy with Neo4j Graph Data Science
- Author | Tangwei Hung
Fraud Runs Rampant! Building Anti-Fraud Strategy with Neo4j Graph Data Science
Introduction
Why does the current anti-fraud strategy fail to identify all fraud incidents?
Fraud detection models using machine learning and analytics have been widely adopted, but the problem is that most data science models have left off a significant factor, the network structure.
All types of data can be presented as a graph. Let’s picture these account holders and their information as a graph. You may find out that multiple account holders share the same phone number or personally identifiable information (PII) from the network structure, which implies the possibility of synthetic identity fraud. Without graph algorithms, it can be extremely difficult to spot this kind of fraud traces in a massive network structure of countless account holders.
Different from tabular data which is typically arranged in rows and columns, a graph data structure is able to represent the complex relationships of data, allowing easier analysis of network structure.
Fraud Detection with Graph Data Science
We can improve the accuracy of prediction through graph analysis and graph feature engineering. Using graph databases, we can engineer graph features based on connection-related metrics such as the number of relationships between nodes or potential triangles or neighbors, for instance, the community detection algorithm highlighting the cluster structure of data (which is similar to clustering), to identify fraudulent behaviors.
Graph data thus can be used to facilitate fraud detection without changing the machine learning system. Simply put, what we do is to implement more graph features in feature engineering.
Prerequisites
- Neo4j 4.0+
- Graph Data Science Library (Neo4j GDS 1.5+)
- APOC Library (Neo4j APOC 3.5+)
Dataset
PaySim
Published by Lopez-Rojas, Elmire, and Axelsson, PaySim is a financial dataset integrating an agent-based model and anonymized data of real-life transaction scenarios from mobile payment providers.
The PaySim dataset involves banks and merchants. Merchants can make mobile payments through the network, and deposit money into the network such as the act of top-up.
You can view it as Apple Pay, but you can further make deposits through the merchants.
Agent Types:
- Clients
Clients are unique accounts controlled by real end users in the mobile payment network.
- Some clients are fraudsters manipulating the network and other clients for their own benefit.
- Some clients are mules who move funds around and then leave the network.
- Most clients are ordinary people without suspicious activities.
- Merchants
- Merchants act as a gateway to the network, which allows the inflow and outflow of funds in the network.
- Like traditional vendors, merchants provide goods or services in exchange for money in the network.
- Banks
Banks serve as debit transactions.
Transactions:
- CashIn: Clients transfer funds to the network through merchants.
- CashOut: Clients transfer funds out of the network through merchants.
- Debit: Client transfers funds to banks
- Transfer: A client remits money to another client.
- Payment: Clients pay money for something from merchants.