[Day 200] Kukmin Bank AI competition project idea

 Hello :)
Today is Day 200!


A quick summary of today:
  • creating a project plan, timeline and learning about graph DBs


My lab mate and I considered two project ideas. First I will talk and show our plan for the project we chose, and then for the other one which I did a simple demo for.

Project idea we chose: Real-time fraud analysis using graph data and graph database

My lab mate said he trusts me to choose the technologies for our project and to set up a plan to follow:

Training Pipeline

Data Collection and Preprocessing:

  • Raw dataset obtained from Kaggle containing transaction data.
  • Raw data is preprocessed before going into the database.

Storage in ArangoDB:

  • Transactions are stored in ArangoDB, a popular graph database.
  • Customers and merchants are represented as nodes, and transactions as edges.

Graph Neural Network with PyG:

  • PyTorch Geometric (PyG) is used to build and train a Graph Neural Network (GNN) for fraud detection on the graph data.
  • The model is trained on a batch of the dataset and saved for inference.

Real-time inference pipeline

Simulated Data Streaming:
  • Emulating a live transaction environment with continuous data flow using the downloaded full Kaggle dataset.
Data Ingestion with Kafka:
  • Kafka handles the real-time streaming of transaction data, ensuring efficient data flow and availability.
Storage in ArangoDB:
  • Real-time transactions are stored in ArangoDB, maintaining the graph structure.
Real-Time Fraud Detection:
  • The trained GNN model analyses new transactions in real-time as they are streamed and stored.
  • The model uses the saved GNN model to perform real-time inference and detect fraudulent transactions.

Also in excel, I used a project timeline template to create:
The competition started on Wednesday and submission deadline is 11th of Aug, so we are starting from today. The first task is finding a decent dataset with a decent amount of transactions.

One of the new things for me is using a graph db. I saw there is neo4j nd this ArangoDB, I decided to use ArangoDB because the python setup was easier. From playing with it, I included sample transactions and the db webapp (similar to postgres' pgAdmin) and got a limited (because there are too many nodes and edges) visualisation:

I am really looking forward to this project and working with my lab mate - Jae-hyeok Choi.  


As for the idea that we dismissed - banking voice assistant

I was up for both, and my lab mate preferred the first one, and now looking back - I am glad haha because it is so exciting.

Nevertheless, to set up a voice assistant demo was surprisingly easy. I set it up using huggingface spaces. Here is the link.

The above ~30 lines of code take voice -> turn it into text -> a language model answers the text -> the answer is transformed to speech and returned to the user. To try the demo, one needs an OpenAI api key. In order for this simple voice assistant to a bank once, I need to use the LM for a RAG app that talks to a db. Given I made db2chat it should not be that difficult. But this is a project for later. 


That is all for today!

See you tomorrow :) 

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project