[Day 202] Setting up a Graph Convolution Network model to detect fraud credit card transactions

 Hello :)
Today is Day 202!


A quick summary of today:
  • finally got a GNN to work
  • learned how to use Mage for data streaming pipelines


I started today where I ended yesterday - trying to create some kind of a graph neural network to predict whether a transaction is fraud or not. 

Tldr (as it is ~3.20am, another late night)

I ended up using torch geometric's Homogeneous data class and the resulting data looks something like:

Data(x=[4290, 18], edge_index=[2, 23278], y=[4290], train_mask=[4290], test_mask=[4290])

The preprocessing involves undersampling the majority class and we end up with a balanced dataset. 

The dataset has the following amount of edges and nodes

Neo4j is nice. 

The model I found that works (at least for now, version 0.1) is:

After splitting data into train and test, the best model so far achieved the following results:

Accuracy: 0.8833, Precision: 0.8151, Recall: 0.9909, F1: 0.8945

Today I experimented with creating the training pipeline, but nothing is final yet. These are just experiments.


On another note, after getting a model to work - I decided to check out mage's streaming pipelines.

I set up kafka services in docker compose, and with a python script:

I started sending sample data to try to access it through mage.

On the mage side it is quite simple: using a Kafka data_loader block and a python transformer block to read the messages from the kafka stream

When I start running this pipeline we see:

Nice ^^ at least I know we can use mage for the real-time inference pipeline.

That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project