Searched Generation (RAG) is an approach to building AI systems that combine language models with external knowledge sources. Simply put, AI will first search for related documents (such as articles and web pages) related to the user’s query, and then use those documents to generate more accurate answers. This method is celebrated to help large-scale language models (LLMs) remain practical and reduce hallucinations by grounding responses to actual data.
Intuitively, the more documents an AI gets, the better the answer will be. However, recent research suggests a surprising twist. Sometimes it can be very little when it comes to supplying information to AI.
Less documentation and better answer
A new study by researchers at the Hebrew University of Jerusalem is number It affects the performance of documents given to the RAG system. Importantly, they kept the total amount of text constant. That is, if there were fewer documents provided, those documents were slightly expanded to fill the same length as many documents. In this way, performance differences can be attributed to the amount of documents rather than simply having shorter inputs.
The researchers used a dataset (Musique) to answer questions in their trivia questions. Each was originally combined with 20 Wikipedia paragraphs (the rest are distracting, actually the answers included). By trimming the number of documents to something truly relevant from 20 to 2-4, and padding people with a bit of extra context to maintain a consistent length, I created a scenario where there is less material to consider but still reads roughly the same total words.
The results were impressive. In most cases, the AI models responded more accurately when given fewer documents rather than full sets. Performance has been greatly improved – in some cases, if the system only used a small number of support documents instead of a large collection, the accuracy (F1 score) increased by up to 10%. This counterintuitive boost has been observed in several different open source language models, including variants such as the meta llama, indicating that this phenomenon is not tied to a single AI model.
One model (QWEN-2) was a notable exception to processing multiple documents without dropping scores, but most of the models tested had improved performance with less documents overall. In other words, adding more reference material beyond the main relevant pieces actually hurts performance more often than it helps.
Source: Levy et al.
Why is this so surprising? Typically, RAG systems are designed under the assumption that getting more extensive information can help AI. After all, if the answer is not in the first few documents, it may be in the 10th or 20th.
This study shows that inverting the script and indiscriminately stacking extra documents backfires. Even if the total text length is kept constant, the mere presence of many different documents (each with their own context and quirks) has made the task of answering questions more challenging for AI. Beyond certain points, additional documents appear to introduce more noise than signals, confusing the model, and undermine the ability to extract correct answers.
Why are there so many rags?
This “less than” outcome makes sense when considering how AI language models process information. If only the most relevant documents are given to AI, the context it sees cannot be focused and distracted, like a student handed the appropriate page to study.
In this study, given only support documents, the model performed significantly better when removing non-dependent materials. The rest of the context was not only shorter, but also cleaner. It included facts that pointed directly to the answer. Few documents to juggle will allow the model to pay full attention to the right information, less likely to cause sidetracks and confusion.
Meanwhile, when many documents were retrieved, AI had to sift through a combination of related and unrelated content. In many cases, these additional documents were “similar, but unrelated.” You may share a query with a topic or keyword, but in reality it does not include the answer. Such content can mislead the model. AI can waste your efforts to connect dots between documents that don’t actually lead to the correct answer. What’s worse, it can incorrectly integrate information from multiple sources. This increases the risk of hallucination. This is when AI generates answers that seem plausible but are not grounded to a single source.
Essentially, supplying too many documents to the model will dilute useful information, introduce conflicting details, and make it difficult for AI to determine the truth.
Interestingly, the researchers found that models are excellent at ignoring them when extra documents are clearly irrelevant (e.g., random, irrelevant text). Actual troubles arise from distracted data with relevant data. If all the text you get is on a similar topic, AI assumes you need to use them all, and you may have trouble finding out which details are actually important. This is consistent with the observations of the research. Random distractors are less confusing than realistic distractors With input. AI can rule out blatant nonsense, but subtly off-topic information is smooth trap. They sneak in under the guise of a connection and derail the answer. By reducing the number of documents to only those that are truly needed, you avoid setting these traps in the first place.
It also has practical advantages. Document acquisition and processing that reduces calculation overhead in leige systems. All documents drawn in should be analyzed for time and analysis (embedded, read, and attended) using time and computing resources (embedded, read, and corresponded by the model). Removing excess documents will make your system more efficient. You can find answers faster and at a lower cost. In scenarios where more accurate focusing on less sources can earn you win, such as better answers and more refined, more efficient processes.
Source: Levy et al.
Rethinking the Rag: Future Directions
This new evidence that quality often beats the amount of search has important implications for the future of AI systems that rely on external knowledge. RAG system designers suggest that huge amounts should be prioritized for smart filtering and document rankings. Instead of getting 100 possible passages and hoping that the answers will be buried somewhere, it might be wise to get only the few that are most relevant.
The authors of this study highlight the need for search methods to “balance relationships and diversity” in the information provided to the model. In other words, we want to provide ample coverage of the topic to answer the question, but not much, but the core facts own in the sea of unrelated texts.
In the future, researchers may explore techniques that allow AI models to process multiple documents more elegantly. One approach is that by developing a better retriever system or reranker, you can identify which documents really add value and which documents introduce conflicts. Another angle is to improve the language model itself. If one model (such as Qwen-2) could address many documents without losing accuracy, examining how it was trained or structured can provide clues to make the other models more robust. Perhaps future large-scale language models incorporate mechanisms that recognize that two sources are saying the same thing (or contradict each other) and focus accordingly. The goal is to make the model’s rich variety of sources available without falling prey to confusion.
It is also worth noting that as AI systems gain greater context Windows (the ability to read text at once), they not only dump more data to the prompt, but are not silver bullets. A larger context does not automatically imply a better understanding. This study shows that even if AI could technically read 50 pages at once, providing 50 pages of mixed quality information may not yield good results. This model still benefits from using curated, relevant content rather than indiscriminate dumps. In fact, intelligent searching can become even more important in the age of huge context windows. Ensure that additional capacity is used for valuable knowledge rather than noise.
Survey results from “Other documents, same length” (Adequately titled paper) encourages a reexamination of our assumptions in AI research. Sometimes, feeding AI with all the data we have is not as effective as we think. By focusing on the most relevant information, it not only improves the accuracy of AI-generated answers, but also makes the system more efficient and reliable. It’s a counterintuitive lesson, but it has been an exciting influence. Future RAG systems may be smarter and lean with fewer, better documents.