50+ days of Machine Learning

Posts

Showing posts from July 22, 2024

[Day 203] Starting LLM zoomcamp module 4 - Monitoring

7/22/2024 11:29:00 pm

Hello :) Today is Day 203! A quick summary of today: meeting with my lab mate for the KB AI competition project starting LLM zoomcamp module 4 Today we met to talk about neo4j and doing EDA for the Kukmin Bank AI competition. We talked about the benefits of docker and docker-compose, and how to use neo4j. The next step we will do is EDA. This might sound a bit odd, given that yesterday I created a GNN. Then I just use raw data without many features. The reason being I wanted to make sure I can get even a simple GNN to work. My lab mate suggested to split the columns of the dataset in half, and to see what we can find in terms of needing to change in the dataset. In total there are 22 columns, so we split it down the middle. These are the columns assigned to myself: trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,street,city,state,zip,lat,long,city_pop,job,dob,trans_num,unix_time,merch_lat,merch_long,is_fraud On another note, today I cov...

[Day 202] Setting up a Graph Convolution Network model to detect fraud credit card transactions

7/22/2024 03:24:00 am

Hello :) Today is Day 202! A quick summary of today: finally got a GNN to work learned how to use Mage for data streaming pipelines I started today where I ended yesterday - trying to create some kind of a graph neural network to predict whether a transaction is fraud or not. Tldr (as it is ~3.20am, another late night) I ended up using torch geometric's Homogeneous data class and the resulting data looks something like: Data(x=[4290, 18], edge_index=[2, 23278], y=[4290], train_mask=[4290], test_mask=[4290]) The preprocessing involves undersampling the majority class and we end up with a balanced dataset. The dataset has the following amount of edges and nodes Neo4j is nice. The model I found that works (at least for now, version 0.1) is: After splitting data into train and test, the best model so far achieved the following results: Accuracy: 0.8833, Precision: 0.8151, Recall: 0.9909, F1: 0.8945 Today I experimented with creating the training pipeline, but nothing...