Retrieval-Augmented Generation: Approaches, State of the Art, and Optimization Strategies
⭐ RAG is particularly useful in knowledge-intensive scenarios or domain-specific applications that require knowledge that’s continually updating. RAG has been popularized recently with its application in conversational agents.
📌 Research in reference focusses mainly on current approaches & different components of RAG, State of the Art (SOTA), applications, evaluation for retrieval, generation, augmentation techniques.
With the evolution of RAG systems from Naïve to Advanced to Modular, and each of which is came into picture to address per use case basis enhancements.
✏ Naïve: User input is used for document query, appended/combined with prompt, used for model final response generation. With the Multiturn Dialogue interactions context/conversational history can be added/combined with the prompt. Cons: Low precision / Low recall, redundant, repetitive.
✏Advanced: Improves retrieval quality by optimizing pre-retrieval, retrieval, and post retrieval methods. With pre-retrieval, quality enhanced through enhancing data granularity, index structure improvements, metadata, alignment, mixed retrieval. In retrieval, optimizing the embedding model hence context. With post-retrieval, optimization in context window and noisy/distracting data rejection.
✏Modular: Incorporates a search module for similarity retrieval and fine tuning in retrieval. New Modules being Search, Memory, Fusion, Routing, Prediction, task Adaptor.
🥉 To optimize RAG Pipeline:
📜 Hybrid Search Exploration: Performance optimization balances by intelligently leveraging techniques such as keyword-based search, semantic and vector search.
📜Recursive Retrieval and Query Engine: Might start retrieval with acquiring smaller chunks in the initial phase, subsequently, larger chunks with better and more contextual information to LLM for balance between contextually rich responses and efficiency.
📜StepBack-prompt: This encourages the LLM to move away from specific instances and engage in reasoning around broader concepts and principles(arXiv:2310.13243). A significant performance increase observed, in various challenging, inference-based tasks when backward prompts are used, highlighting their natural adaptability to the RAG process.
📜Sub-Queries: Query strategies depending on the scenario could be applied such as using query engines provided by frameworks like LlamaIndex, leveraging tree queries, utilizing vector queries, or executing simple sequential querying of chunks.
📜Hypothetical Document Embeddings: With the LLM, HyDE responses to the query by creating hypothetical answer, embeds the answer, uses the same to retrieve real documents. Instead of seeking embedding similarity based on the query, this approach focuses on the embedding similarity from one answer to another[arXiv:2212.10496]. Cons: Inconsistent Answers not producing desirable outcomes, Errors for LLM unseen Subject Matter, leading to errors.
Let me cut off here. I’ll come up with a new post in follow-up
Ref: arxiv:2312.10997, RAG Surveys, Huggingfaceblogs