[Day 81] RAG from scratch - chunking is very important!
Hello :) Today is Day 81! A quick summary of today: Continued with my custom RAG from scratch project - extracting knowledge from a bank's terms and conditions ( github repo ) So, yesterday my main struggle was reading tables. I remembered gemini can read pictures so I gave the below to gemini to try to read and give me the text. The output, not very good. Gemini could not read this table very well, and such tables were common in yesterday's PDF Yes, this is an image, but even if a powerful gemini LLM could not read this table and output it as text, for now at least, I gave up on this particular PDF, and looked for one with a bit more straigh-forward text and less tables. The newly chosen pdf is here: on github Now... given an allegedly more simple pdf, I used the code as it was to get outputs. But the results were just... really bad. Most times even though the scores were high, and the top-1 included the exact answer, the output was ~'the context does not ...