MLOps Live

Building Scalable Customer-Facing Gen AI Applications Effectively & Responsibly with MongoDB - July 30th, 2024

RAG vs Fine-Tuning: Navigating the Path to Enhanced LLMs

Alexandra Quinn and Nick Schenone | July 8, 2024

RAG and Fine-Tuning are two prominent LLM customization approaches. While RAG involves providing external and dynamic resources to trained models, fine-tuning involves further training on specialized datasets, altering the model. Each approach can be used for different use cases. In this blog post, we explain each approach, compare the two and recommend when to use them and which pitfalls to avoid.

What is Fine-Tuning?

LLM fine-tuning is an AI/ML LLM customization process where a pre-trained model is further trained on a new dataset that is specific to a particular task. Unlike RAG, another form of LLM customization, this includes modifying the model’s weights and parameters based on the new data. By adapting and “tweaking” the LLM, fine-tuning improves the model’s performance and accuracy for the required task. This allows for better applicability to specific tasks, domains and use cases and brings more business value.

For example:

1. If you need a model that excels in legal document analysis, you can fine-tune an LLM pre-trained in English, using a corpus of legal texts. The fine-tuned model will then better understand legal jargon, context and nuances. The result will be a model highly effective for tasks like legal document classification or summarization.

2. A pre-trained image recognition model can be fine-tuned to identify specific objects relevant to a particular industry, such as medical imaging for tumor detection or industrial inspection for defect identification.

3. Models pre-trained on general speech data can be fine-tuned to recognize industry-specific jargon or accents. This can improve accuracy in applications like customer service or transcription services.

How Does Fine-tuning Work?

1. The process starts with a pre-trained model. This model has already been trained on a large, diverse dataset. 

2. A smaller, task-specific dataset is then prepared. This dataset is closely related to the specific problem or domain that the model needs to address.

3. The pre-trained model's weights and parameters are used as a starting point. Transfer learning allows the model to retain general knowledge from the large dataset and apply it to the new task.

4. The model undergoes further training on the task-specific dataset. This process adjusts the model's weights to better suit the new data, improving its performance on the specific task.

5. The fine-tuned model is evaluated against validation data to ensure it performs well. Hyperparameters may be adjusted and additional fine-tuning may be performed to optimize the model's performance.

Benefits of Fine-Tuning

  • Higher Business Value - A fine-tuned model is better adapted to tasks, allowing it to provide outputs that generate more value for the business and stakeholders.
  • Specialization - Fine-tuning allows models to be customized for different applications and industries. For example, a language model can be fine-tuned for medical, legal, or technical language, making it more effective in those domains.
  • Resource-efficiency - Using a pre-trained model as a foundation saves computational resources and time. Fine-tuning is more efficient than training a model from scratch, as the pre-trained model already possesses a broad understanding of language or other data types and doesn't require training from scratch.
  • Lower Data Requirements - Fine-tuning often requires fewer data samples compared to training a model from scratch.
  • Accessibility - Fine-tuning democratizes AI by allowing smaller organizations or researchers to develop high-performance models without needing extensive computational resources or large datasets for initial training. 

Challenges in Fine-Tuning

  • Overfitting to the small task-specific dataset can reduce the model's generalization ability and its ability to provide value for answering prompts.
  • Hyperparameter Tuning, i.e finding the right balance of learning rates and other hyperparameters, is a complicated process.
  • Cost and Time - Fine-tuning requires computing power, an AI architecture and the ability to streamline the process. Otherwise, it will be a costly and time-consuming effort.
  • Finding Data that can be used for subsequent training. This data needs to be curated, labeled, cleansed, etc. This is not an easy task, especially for more specific use cases.

What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is an AI/ML LLM customization process that provides external, real-world and potentially dynamic  data sources to the model as a means to enhance its accuracy and reliability. These data sources were not part of the model’s initial training.

The retrieval component searches through these large corpuses of documents or databases to find relevant information based on a query. The generative component then uses this retrieved information to produce a more accurate and contextually relevant response. RAG grounds outputs in real-world data, improving the quality and reliability of the generated text, without changing the model itself.

For example,

1. Providing detailed and accurate responses to customer inquiries by pulling relevant information from a company's knowledge base or documentation.

2. Helping legal professionals find and interpret relevant case law and statutes to support their work.

3. Assisting researchers in finding and summarizing relevant academic papers and studies.

How Does RAG Work?

The RAG approach for LLMs works as follows:

1. Documents are split into chunks and embeddings are created via an embedding model. These document embeddings are then stored in a vector store. This is usually done once and only updated when needed.

2. The retrieval part of the system searches documents or databases to find relevant information or passages related to the input query. It uses sophisticated search algorithms, often leveraging embeddings and similarity metrics to find the most pertinent texts. In some cases, reranking will take place to ensure that the retrieved document chunks are actually relevant to the question.

Advanced techniques like Hypothetical Document Embeddings (HyDE) can also be used. This method  enhances retrieval by generating a hypothetical document for an incoming query. The document is then embedded, and that embedding is utilized to look up real documents that are similar to the hypothetical document. The underlying concept is that the hypothetical document may be closer in the embedding space than the query. 

3.  Once the relevant documents or passages are retrieved, the generation component uses this information to produce a coherent and contextually appropriate response and send it to the LLM. This is based on real-time, context-specific information, along with its training data. This leads to more accurate and informative outputs.

The prompt can be formatted with the document chunks only or can also use a more advanced technique like Parent Document Retrieval. This allows for a small document chunk to be used for efficient and accurate search, but the whole document is used in the prompt for full context to the LLM. 

Benefits of RAG

  • Up-to-date Information - RAG models ground responses in actual data retrieved from a large corpus. No need to re-train the model when the underlying knowledge changes. This is not trivial.
  • Reduced Hallucinations - The model has a ground truth to base its responses off of and compare against. A fine tuned model has the internal knowledge, but no "truth" to compare against.
  • Transparency and Trust - You know what the model is basing its response on and can double check it yourself using links to the original documents.

Challenges of RAG

  • Contextual Integration of Retrieved Information - RAG models must ensure that the retrieved documents or snippets are contextually relevant and integrated coherently into the generated text.
  • Balancing Relevance and Diversity - The system must retrieve documents that are highly relevant to the query while also ensuring a diversity of perspectives to provide comprehensive and well-rounded responses.
  • Mitigating Propagation of Bias from Retrieved Content - Since the model relies on external data sources, it must have mechanisms to identify and mitigate biases to ensure fair and accurate responses.

RAG vs Fine-Tuning: How to Choose?

Both RAG and fine-tuning are effective methods for LLM customization. Here’s a breakdown to help you decide which approach is best for your needs:

  • What’s the nature of the task?
    • If the task requires integrating and synthesizing information from a large and dynamic dataset - Choose RAG.
    • If the task is highly specialized and requires deep domain knowledge - Choose fine-tuning.
  • What’s your data availability?
    • If you have access to a large and continuously updated knowledge base but limited specialized training data - Choose RAG.
    • If you have a large amount of high-quality, domain-specific training data - fine-tuning will help create a specialized model.
  • Do you have resources for LLM maintenance and updates?
    • RAG makes it easier to keep LLMs up-to-date by updating the knowledge base
    • Fine-tuning - Requires retraining with new data.
  • Do you have resources for complex implementations?
    • RAG systems are more complex to implement and maintain, particularly the retrieval component.
    • Fine-tuning  is more straightforward if the infrastructure for training is already in place.
  • Example Use Cases
    • RAG - Customer support systems that require up-to-date information, knowledge management systems, applications requiring responses based on large and dynamic datasets.
    • Fine-tuning - Specialized chatbots for specific industries, sentiment analysis tailored to particular products or brands, custom language models for specific writing styles or content creation.
Data SourcesExisting external + internalDomain-specific internal
Changes to the Model?NoYes
Adaptability/SpecializationAdaptable to new informationSpecializing on domain-specific data
Resource OptimizationNo need for retraining, RAG mechanism can be resource-intensiveLess data required, training can be resource-intensive

Both RAG and LLM fine-tuning have their places in NLP tasks. RAG is ideal for scenarios requiring up-to-date, flexible, and broad knowledge integration. Fine-tuning excels in specialized, high-performance tasks where specific, consistent responses are needed. Assessing your task requirements, data availability and maintenance capabilities will guide you in making the right choice for your application.

That being said, the two approaches are not mutually exclusive and can be used together. For example, fine tune the model for tone and vocabulary. And use RAG for external knowledge. This allows generating factually correct responses based on external data but in the style or voice of your brand (e.g. marketing email, chatbot, call center agent, etc.)

Learn more about how a scalable and resilient AI platform can help you streamline both RAG and fine-tuning.