Posts

Showing posts from July 26, 2024

[Day 207] Finished with neo4j (for now) and thinking about fraud detection models

Image
 Hello :) Today is Day 207! A quick summary of today: setting up neo4j reading some papers on bank telemarketing classification Firstly, about the Kukmin Bank (KB) AI competition project We set up  the database on my partner's laptop insert CreditCard nodes insert Merchant nodes insert Transaction edge We also had a look at the final EDA notebook. At the moment the repo looks like: As for next steps, we will start developing models, and try to get something that can identify Fraud transactions well. And more importantly, explainability - what features help the model determine that a transaction is fraud. So we will start with some basic models like logistic reg, decision trees, random forest, then add hyperparam tuning, over/undersampling, etc. And try to get a model that detects Fraud well.  As I want to try out a GNN for this project's model. I saw there is a GNNExplainer by torch-geometric, so I need to try using it in practice and see if the explainability it provides is go

[Day 206] Finishing the Stock Market Analysis zoomcamp (for now)

Image
 Hello :) Today is Day 206! A quick summary of today: last homework from Stock Market Analysis Zoomcamp Uploading neo4j scripts to the KB project repo Over the past 4 weeks, my lab mate (Jae-Hyeok, the same guy with whom I am doing the KB project) and I have been covering another couse by DataTalksClub - the Stock Market Analysis Zoomcamp . The course introduces basic concepts and strategies for creating models that can potentially invest in stocks. Today was the 4th week since we started it and was the last homework, related to working with a financial model workbook to modify and analyse stocks, focusing on Random Forest tuning, reducing feature sets (to see difference in results), predicting strong future growth (investing in cases where the stock growth is above a fixed treshold), and developing an ideal trading strategy. Above is an example resulting graph from one of the questions. We compare models based on CAGR (Compound annual growth rate), and there are various models - rando