Fraud Runs Rampant! Building Anti-Fraud Strategy with Neo4j Graph Data Science

如何以 Neo4j 關聯圖形資料開啟反詐欺策略?
Contents

Introduction

Why does the current anti-fraud strategy fail to identify all fraud incidents?

Fraud detection models using machine learning and analytics have been widely adopted, but the problem is that most data science models have left off a significant factor, the network structure.

All types of data can be presented as a graph. Let’s picture these account holders and their information as a graph. You may find out that multiple account holders share the same phone number or personally identifiable information (PII) from the network structure, which implies the possibility of synthetic identity fraud. Without graph algorithms, it can be extremely difficult to spot this kind of fraud traces in a massive network structure of countless account holders.

Different from tabular data which is typically arranged in rows and columns, a graph data structure is able to represent the complex relationships of data, allowing easier analysis of network structure.

圖形資料庫
Source: Neo4j

Fraud Detection with Graph Data Science

We can improve the accuracy of prediction through graph analysis and graph feature engineering. Using graph databases, we can engineer graph features based on connection-related metrics such as the number of relationships between nodes or potential triangles or neighbors, for instance, the community detection algorithm highlighting the cluster structure of data (which is similar to clustering), to identify fraudulent behaviors.

Graph data thus can be used to facilitate fraud detection without changing the machine learning system. Simply put, what we do is to implement more graph features in feature engineering.

使用圖數據科學,可以在不改變機器學習系統的情況檢測更多詐騙行為
Source: Neo4j-Financial-Fraud-Detection-GDS-white-paper

Prerequisites

  • Neo4j 4.0+
  • Graph Data Science Library (Neo4j GDS 1.5+)
  • APOC Library (Neo4j APOC 3.5+)

Dataset

PaySim

Published by Lopez-Rojas, Elmire, and Axelsson, PaySim is a financial dataset integrating an agent-based model and anonymized data of real-life transaction scenarios from mobile payment providers.

The PaySim dataset involves banks and merchants. Merchants can make mobile payments through the network, and deposit money into the network such as the act of top-up.

You can view it as Apple Pay, but you can further make deposits through the merchants.

Agent Types:
The following are three main agents in the graph network.
- Clients

Clients are unique accounts controlled by real end users in the mobile payment network.

  • Some clients are fraudsters manipulating the network and other clients for their own benefit.
  • Some clients are mules who move funds around and then leave the network.
  • Most clients are ordinary people without suspicious activities.
- Merchants
Merchants represent vendors or businesses interacting with clients in the network.
  • Merchants act as a gateway to the network, which allows the inflow and outflow of funds in the network.
  • Like traditional vendors, merchants provide goods or services in exchange for money in the network.
- Banks

Banks serve as debit transactions.

Transactions:
Transaction is the only way for clients to interact with other agents. In fact, clients are the only agents that execute transactions.
The following are the five possible types of transactions:
  • CashIn: Clients transfer funds to the network through merchants.
  • CashOut: Clients transfer funds out of the network through merchants.
  • Debit: Client transfers funds to banks
  • Transfer: A client remits money to another client.
  • Payment: Clients pay money for something from merchants.