The aim of this notebook is to walk through a comprehensive example of how to fine-tune OpenAI models for Retrieval Augmented Generation (RAG).

We will also be integrating Qdrant and Few-Shot Learning to boost the model's performance and reduce hallucinations. This could serve as a practical guide for ML practitioners, data scientists, and AI Engineers interested in leveraging the power of OpenAI models for specific use-cases. 🤩

Why should you read this blog?

You want to learn how to

To begin, we've selected a dataset where we've a guarantee that the retrieval is perfect. We've selected a subset of the SQuAD dataset, which is a collection of questions and answers about Wikipedia articles. We've also included samples where the answer is not present in the context, to demonstrate how RAG handles this case.

Terms, Definitions, and References

Retrieval Augmented Generation (RAG)? The phrase Retrieval Augmented Generation (RAG) comes from a recent paper by Lewis et al. from Facebook AI. The idea is to use a pre-trained language model (LM) to generate text, but to use a separate retrieval system to find relevant documents to condition the LM on.

What is Qdrant? Qdrant is an open-source vector search engine that allows you to search for similar vectors in a large dataset. It is built in Rust and here we'll use the Python client to interact with it. This is the Retrieval part of RAG.

What is Few-Shot Learning? Few-shot learning is a type of machine learning where the model is "improved" via training or fine-tuning on a small amount of data. In this case, we'll use it to fine-tune the RAG model on a small number of examples from the SQuAD dataset. This is the Augmented part of RAG.

What is Zero-Shot Learning? Zero-shot learning is a type of machine learning where the model is "improved" via training or fine-tuning without any dataset specific information.

What is Fine-Tuning? Fine-tuning is a type of machine learning where the model is "improved" via training or fine-tuning on a small amount of data. In this case, we'll use it to fine-tune the RAG model on a small number of examples from the SQuAD dataset. The LLM is what makes the Generation part of RAG.

1. Setting Up the Environment

Install and Import Dependencies

!pip install pandas openai tqdm tenacity scikit-learn tiktoken python-dotenv seaborn --upgrade --quiet
import json
import os
import time

import pandas as pd
import openai
import tiktoken
import seaborn as sns
from tenacity import retry, wait_exponential
from tqdm import tqdm
from collections import defaultdict
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix

import warnings
warnings.filterwarnings('ignore')

tqdm.pandas()