Hello :) Today is Day 198! A quick summary of today: data streaming pipeline project [v1 done] Here is a link to the project's repo. Well ... I did not know I can do it in a day (~14 hours) after yesterday's issues but here we are. Turns out in order to insert the full (~70 variables with nested/list structure), I need the proper pyspark schema. And yesterday I did not have that and that is why when I was reading data in the kafka producer I was getting NULL in the columns - my schema was wrong. Well today I not only fixed the schema for the 4 variables I had yesterday, but included *all* the variables that come from the Stripe API ~ 70 (for completeness). When I run docker-compose, the data streams and is input into the postgres db (and is still running). Unfortunately, the free Stripe API for creating realistic transactions has a limit of 25, so every 3 seconds, 25 new transactions are sent to the db. It has been running half the day (since I got that set up) and as I am w...
모험을 시작하기 전에 기초 지식을 복습하고자 했다. 오늘 SPSS 대안 프로그램을 찾아보려고 해서 JASP에 대해 알게 되었다. 유용한 프로그램인 것 같아서 선회귀와 기술통계를 내려고 했는데 재미있었다. 그런데 JASP에 대해 더 알기 전에 '기초 통계 지식을 좀 복습을 하고자 하면 좋을 것 같아'란 생각을 들었다. 다행히, Coursera에서 Stanford University의 Guenther Walther 교수님께서 진행된 Introduction to Statistics 무료 강좌가 있다. 좀 부족한 부분은 다양한 검정통계 하는 거고 (F test, t-test, chi-square 등) 이제 JASP 아니면 다른 통계 프로그램 사용하게 되어도 이런 부분을 좀 더 자세히 집중하여 공부하면 된다. 특히 homoscedasticity 및 heteroscedasticity 개념을 기억에 남았다. Homoscedasticity (선): Definition: In a homoscedastic dataset, the variance of the errors (residuals) is constant across all levels of the independent variable(s). In simpler terms, the spread of the residuals is the same throughout the range of predictor values. Heteroscedasticity (악): Definition: Heteroscedasticity occurs when the variance of the errors is not constant across all levels of the independent variable(s). In other words, the spread of residuals changes as the values of the independe...
Hello :) Today is Day 61! A quick summary of today: Covered Lecture 7 : machine translation, seq2seq, attention from Stanford CS224N Tried to make a conditional DCGAN to generate MNIST numbers ( colab ) ( kaggle ) I will first cover the GAN story (then will share my notes from the lecture) So... while watching and taking notes today, I started thinking, what if I can use my notes as data to a model and afterwards, when I want, I can give it raw string text and it will output text in the format of my notes (with my handwriting). Well I started looking around and actually the first model architecture that came to my mind was the GAN (specifically conditional GAN) - I remembered there was a GAN architecture that alongside the pictures, we can give it the labels, and then on-demand generate. In retrospect, there are of course others, but I decided to go with GAN. For maybe 2 hours I busted my head trying to make a simple model with the EMNIST dataset (english characters), and I ...