[Day 43] Coding up LeNet, VGG, InceptionNet, UNet from scratch

2/13/2024 10:01:00 pm

Hello :)
Today is Day 43!

Quick summary of today:
write LeNet from scratch
write VGG from scratch
write InceptioNet from scratch
write UNet from scratch (again)

Wanting to understand the popular models a bit more, I decided to do the above.

1) Let's begin with LeNet.

A basic framework developed in the 1990s, basic but set the groundwork for networks like AlexNet, VGG and ResNet.

consists of 2 conv layers, each followed by a maxpool, and then ending with 2 fully connected (linear) layer.

2) Next is VGG

The paper proposes numerous versions, VGG11, VGG13, VGG16, VGG19 but from a google search VGG16 seems most popular (version D in the pic). It is deeper than the earlier LeNet, consisting of multiple conv+maxpool layers, each increasing the amount of filters, and decreasing the size of the image.

instead of 1 version, a general model was created so that it can adapt to the desired VGG architecture

below is the implementation. I think this is a nice set-up for testing the 4 versions on 1 dataset of my choosing.

3) GoogLeNet / InceptionNet

This is a long one... haha. Funny that a research paper references a meme (and it got the name inception from it)

GoogleNet features inception modules - which consist of parallel conv branches of different receptive field sizes.Also 1x1 convolutions reduce dimensionality and improve efficiency.Towards the end, global avgpool replaces fully connected layers for fixed-length feature vectors.

and the implementation:

individual convolution block

individual inception block

And the final GoogLeNet

4) UNet

Yesterday I attemped it and I think I got a good kind-of working model, but I was not sure if it is correct or not because I translated it from tensorflow to pytorch. Today I found another version of UNet online and pasted it into my records (thank you to this youtuber)

Pretty similar to yesterday. Here is the double conv part, and the init of the Unet itself.

and then the forward

Actually I decided to even test this UNet on a human image dataset in kaggle

After 50 epochs, with learning_rate 0.01 and batch_size of 16, using DiceLoss and BCEWithLogitsLoss, the best model I got is: epoch: 42, train loss: 0.2380, valid loss: 0.3435. I am running a 100 epoch one with a learning rate scheduler. It will take long but the results (hopefully good) can be seen here in Kaggle.

Sample outputs from the 50 epoch model are:

I should try IoU, or maybe another combination of losses too.

Side note! Turns out I did not properly install pytorch on my laptop and was running things on cpu, which is a disaster, but today I set it up properly. (but still using kaggle is a bit faster haha).

That is all for today!

See you tomorrow :)

Search This Blog

50+ days of Machine Learning