HomePage » Fraud Detection (2) Financial Fraud Detection: Leveraging Neo4j Graph Database to Identify Potential Fraudsters
- Author | Tangwei Hung
Financial Fraud Detection: Leveraging Neo4j Graph Database to Identify Potential Fraudsters
![如何以 Neo4j 關聯圖形資料開啟反詐欺策略?](https://blog.tpisoftware.com/wp-content/uploads/2021/12/neo4J詐騙_1600x900-03.png)
How Do We Detect Potential First-Party Financial Fraudsters with Graph Database?
![我們如何藉由圖數據庫來識別潛在的第一方金融詐騙犯呢?](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-04.png)
Explore Data
Stats:
![首先第一步,探索我們的PaySim資料集。](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-05.png)
List all nodes and the corresponding relative frequency:
![列出所有節點(Node)及對應頻率(Relative Frequency)](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-06.png)
![Transactions本身有什麼有趣的地方嗎?](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-07.png)
First-Party Fraud
First-party fraud refers to an individual or a group of people providing false information or identity when applying for financial services.
First-party fraud involves synthetic identity fraud, which is the fastest growing type of fraud, according to McKinsey. Synthetic identity fraud is the use of a combination of real and fake information to create a new identity. It now makes up roughly 80% of the credit card fraud losses, causing heavy losses for financial Institutions.
Now let’s catch these fraudulent accounts with the following steps:
- Identify clients who share the same personally identifiable information (PII)
- Identify client clusters sharing PII using Community Detection Algorithms
- Identify similar clients in a client cluster based on the shared PII using Pairwise Similarity Algorithms
- Calculate the fraud score for the client cluster with shared PII using Centrality Algorithms.
- Use the above scores to flag potential fraudsters.
1. Identify clients who share personally identifiable information (PII)
![找出共享PII 的一對客戶](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-08.png)
![創建一個新關係連結共享PII的客戶,並將共享PII的數量當作該關係的屬性添加](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-09.png)
2. Identify client clusters sharing PII
Run GDS Library’s Community Detection algorithms to identify client clusters sharing PII.
We use Weak Connected Components to find nodes connected to a cluster, where all nodes in the same cluster form a connected component.
The Weak Connected Components analyze the graph and identify “Graph Components”. A component is a set of nodes and relationships. In these nodes and relationships, each member (node) can be reached from any other node through traversal. The components are weakly connected, as it does not take the directionality of the relationship into consideration.
Weak Connected Components are usually used in the early stages of analysis to understand the structure of the graph.
Reference: Weakly Connected Components — Neo4j Graph Data Science
![A graph with three components](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-10.png)
![識別共享PII的顧客群集](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-11.png)
3. Identify similar clients in client clusters
Run GDS Pairwise Similarity Algorithms to filter clients in the client clusters based on similarity.
The node similarity is based on the relationship with other nodes, which helps us find similar nodes. The node similarity uses the Jaccard distance to calculate the similarity score of a pair of nodes by observing the related nodes shared by the two nodes in the network divided by the sum of all nodes that have been related to the two nodes.
Reference: Node Similarity — Neo4j Graph Data Science
![在顧客群集中找出類似的客戶](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-12.png)
4. Calculate the Fraud Score
![計算詐騙分數 (Fraud Score)](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-13.png)
5. Add labels to potential fraudsters
![標記潛在詐騙犯](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-14.png)
![六個群集包含少量的客戶(黃色的節點),似乎共享SSN、Email、電話號碼等PII(紫色的節點)](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-15.png)
Identify 2nd-Level Fraudster
Find out who is related to these fraudster clusters.
![找出與這些詐騙群集有聯繫的人](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-16.png)
Creating new relationships
![創造新關係](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-17.png)
Find out 2nd-level Fraudsters
Identify clients who may have colluded with first-party fraudsters and have not been identified as potential first-party fraudsters.
Our assumption is that clients who conduct transfer transactions with first-party fraudsters can be suspected as 2nd-level fraudsters.
Let’ use the TRANSFER_TO relationships we just created, and perform the following steps to identify these clients:
- Use the Community Detection Algorithm (WCC) to identify the client networks related to first-party fraudsters.
- Use the Centrality Algorithm (Page Rank) to calculate a Fraud Score.
- Identify suspects with relatively high Page Ranks and mark them as 2nd-level fraudsters.
![使用WCC找出與第一方詐騙者有關的客戶網路](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-18.png)
![使用中心性計算影響力分數](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-19.png)
![找出相對Page Rank高的嫌疑犯,並標記為2nd-level Fraudsters](https://blog.tpisoftware.com/wp-content/uploads/2021/12/Fraud-Detection-Neo4j-20.png)
Let's Summarize the First Part: What Did We Find Out?
To sum up, we use GDS to perform financial transaction data analysis:
- We use WCC and Degree Centrality algorithms to filter out potential first-party fraudsters.
- We identify the 2nd-level fraudsters associated with the first-party fraudsters using new fraudsters relationships (TRANFER_TO), WCC and Page Rank algorithms.
- We add labels to these suspects in the current network.