[Day 86] Made a youtube video - Chat with your PDF for free in colab using huggingface, mongodb, llama_index, langchain

 Hello :)
Today is Day 86!


A quick summary of today:
  • Coded, planned, recorded and posted a video tutorial making a chat with your pdf rag system for free


Well, after waking up today, I definitely did not expect to plan, execute and upload an almost 1hr tutorial on youtube. 

I was looking around chat with your PDF videos, to see what I can improve in my pdf_rag_from_scratch but I saw that most of the videos require an OpenAI api key, and I did not like that, given the availability of so many free resources and models. 

And I found this great resource from huggingface - Building A RAG System with Gemma, MongoDB and Open Source Model. Instead of a pdf, they were using some dataframe for films, so I decided to improve upon that, and make the code preprocess a pdf, embed it, upload to mongodb, load gemma, create a prompt and chat with the pdf (kind of a combination of the tutorial + my pdf_rag_from_scratch). 

The code itself is not that complicated, but I wanted to write it once/twice to make sure when I write live in the video recording, I do not have problems. So the whole process from idea to published video maybe took me 8 hours, mind that I had to find an app to edit the video (the editing was not much, but the app's video processing time was long because I wanted it in 1080p). 

Anyway ~ below I will provide an overall summary of the code

1. Download libraries

2. Preprocess PDF

2.1 Load PDF with llama-index

2.2 Chunk PDF text using langchain
2.3 Embed chunks
3. Set up mongodb
An important part is to set up an atlas vector search

3.1 Connect to the db

3.2 Delete existing (if any), and insert data
4. Find relevant texts
4.1 Perform vector search in db + get context

5. Load gemma using huggingface


6. Prompt engineering + talk with your PDF

I used similar base_prompt with the pdf_rag_from_scratch

Query: Do you pay or charge interest?

Answer: Yes, the Core Banking Agreement states that interest is paid and charged on a daily basis, and the interest rate applicable to your account(s) is stated in the Product & Services Terms & Conditions or, if no such terms are provided, on the website.


The results are not perfect, but is a good starting point for fine-tuning. 


That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project