HomePage » Fraud Detection (3) Combining Neo4j with Machine Learning to Predict Financial Fraud
- Author | Tangwei Hung
Combining Neo4j with Machine Learning to Predict Financial Fraud
Machine Learning for Fraudster Prediction
- Add labels to the fraudsters
- Add more properties as features (feature-engineering)
- Use FastRP as node embedding
- Train and test the model
- Make prediction
1. Label the fraudsters
We need to provide data with labels as training data for the computer; since labeling data is not available in PaySim, let’s make a hypothesis that clients who have participated in more than 10 fraudulent transactions have a high probability as fraudsters, and we label them as fraudsters. We also mark add labels to the suspects we identified in the previous chapter as fraudsters.
On the other hand, we use the Lovain algorithm to group clients. For clients with the fraudster probability of less than 0.065, we mark them as non-frauds.
2. Add more features
3. Use FastRP for node embedding
Calculates the nodes of low-dimensional vector in the graph with the Node Embedding algorithm. These vectors, also called embeddings, can be used for machine learning.
Fast Random Projection (FastRP) is a kind of node embedding algorithm in random projection algorithms. We can project n vectors of any dimension into O(log(n)) dimensions and still maintain the pairwise distance between nodes.
Reference: Fast Random Projection — Neo4j Graph Data Science
4. Train and test the model
5. Make prediction
Let's Summarize the Second Part: What Have We Done?
- We label and distinguish clients as fraudsters and non-fraudsters, and use the Lovain algorithm.
- Feature engineer (Use Degree, Page Rank and Triangle Count in GDS to add more features to the nodes.)
- Node embedding using FastRP algorithm
- Train and test the model
- Use the trained model to predict the unlabeled clients in the graph
The major difference between using Neo4j ML and the classic ML is that we can use the Graph Algorithm in Neo4j such as Centrality or Node Embedding to add features in the graph. In short, we increase the accuracy of the model by introducing more graph-based features to in feature engineering.