[Day 190] Learning about evaluating vector search engines for RAG apps

 Hello :)
Today is Day 190!


A quick summary of today:
  • covered Module 3: vector search from the LLM zoomcamp


All the code from today is on my repo.

The first part of the module was related to doing semantic search using dense vectors in Elasticsearch

1. Loaded Q&A documents from a json file

2. Created dense vectors for each document using a pre-trained model

3. Created an index in Elasticsearch

4. Indexed the documents in Elasticsearch

5. Performed a semantic search using the dense vectors

6. Filtered the results using a specific section

(picture is from the course)


Next I learned about evaluating the retrieval mechanism

1. Generate unique IDs for each document to distinguish each other

2. Generate 5 sample questions for each document using the GPT API

3. Save the results to a file to use for evaluation

first 10 rows from the created dataset:

The ID is needed to connect the sample created questions to the documenta they are related to.


Next, I learned about two evaluation metrics for evaluating the used search mechanism

Recall

  • Measures the number of relevant documents retrieved out of the total number of relevant documents available.
  • Formula: Recall = (Number of relevant documents retrieved) / (Total number of relevant documents)

Mean Reciprocal Rank

  • Evaluates the rank position of the first relevant document.
  • Formula: MRR = (1 / |Q|) * Σ (1 / rank_i) for i = 1 to |Q|

To do evaluation, a few different engines were created and compared based on recall and MRR

  • base elastic search based on word similarity: recall: 0.74, mrr: 0.60
  • KNN using embedded question body: recall: 0.77, mrr: 0.66
  • KNN using embedded text body: recall: 0.83, mrr: 0.71
  • KNN using embedded question and text boxy: recall: 0.92, mrr: 0.82
All ran with different times, so it is about evaluating what do we care more, speed vs a set % improvement

At the end, I completed the homework which covered similar questions like the above learned content.


That is all for today!

See you tomorrow :)

Popular posts from this blog

[Day 198] Transactions Data Streaming Pipeline Porject [v1 completed]

[미리 공부] 기초 통계 복습 (Day 1는 1월2일)

[Day 61] Stanford CS224N (NLP with DL): Machine translation, seq2seq + a side CDCGAN mini project