RAG Performance Optimization and Advanced Patterns: GraphRAG and Hybrid Search

June 12, 2025

by Aaron Dsilva, Founding Engineer

RAG Performance Optimization and Advanced Patterns: GraphRAG and Hybrid Search

Making Your RAG System Actually Fast (And Smart)

The Hybrid Search Approach That Actually Works

Here's what we learned the hard way: you need both semantic search AND keyword search working together.

Think of it like this:

Vector search is like having a really smart intern who "gets" what you mean
Keyword search is like having a detail-oriented colleague who never misses the fine print
Together they're unstoppable

In practice, this means combining:

Dense vectors for questions like "How do I cancel my subscription?" (semantic understanding)
BM25 keyword matching for queries like "What's the API rate limit for the /users endpoint?" (exact technical terms)

The Reranking Magic

Once you've got results from both approaches, you need to merge them intelligently. The most robust approach we've found uses Reciprocal Rank Fusion (RRF):

Combined_Score = 1/(rank_vector + k) + 1/(rank_keyword + k)

Where k is usually set to 60. Simple, effective, and doesn't require training additional models.

Alternative Reranking Approaches:

Cross-encoder models like bge-reranker (more accurate but slower)
Cohere's rerank API (excellent quality, costs money)
Custom reranking models trained on your specific domain (highest accuracy, most effort)

The Need for Speed

Performance Optimization Stack

Layer 1: Caching (80% of the wins)
Layer 2: Async processing (15% of the wins)
Layer 3: Hardware optimization (5% of the wins)

Focus on caching first—it's where you'll get the biggest performance improvements for the least effort.

The Smart Caching Strategy:

Query Result Caching:

Cache identical questions for 24 hours
Use fuzzy matching for "similar enough" queries
Store both the retrieved chunks AND the final answer

Embedding Caching:

Never re-embed the same document
Cache embeddings with document hashes
Invalidate only when content actually changes

Chunk Preprocessing:

Cache parsed and chunked documents
Store metadata with the chunks
Update incrementally, not from scratch

Advanced Performance Patterns

Streaming Responses: Don't wait for the entire answer to be generated. Stream the response back to users as it's being created. This makes your system feel 2-3x faster even if the actual processing time is the same.

Prefetch Common Queries: Analyze your query logs and pre-compute answers for the most common questions. Store these in a simple key-value cache for instant responses.

Smart Batch Processing: If you're processing multiple questions at once, batch them through your LLM calls. Most APIs support batching, which dramatically reduces overhead.

Monitoring & Evaluation: Catching Problems Before Users Do

The Metrics That Actually Matter

Response Quality Metrics:

Answer Accuracy - Are we giving correct information?
Relevance Score - Are the retrieved documents actually related?
Completeness - Are we answering the full question?
Citation Quality - Can users trace answers back to sources?

System Performance Metrics:

Query Latency (P95) - How fast is the slowest 5% of queries?
Retrieval Precision - What percentage of retrieved docs are useful?
Cache Hit Rate - How often are we avoiding expensive recomputation?
Error Rate - How often does something break?

User Experience Metrics:

User Satisfaction - Thumbs up/down feedback
Query Abandonment - Do users give up waiting?
Follow-up Questions - Are users getting what they need?
Session Duration - Are users finding value?

The RAGAS Evaluation Framework

RAGAS (Retrieval-Augmented Generation Assessment) is the closest thing we have to a standard for evaluating RAG systems. Here's what it measures:

Faithfulness: Does the answer contradict the retrieved documents? Answer Relevancy: Does the answer actually address the question? Context Precision: Are the top-ranked retrieved docs relevant? Context Recall: Did we retrieve all the relevant information available?

RAGAS Implementation:

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall

# Your evaluation dataset
dataset = {
    'question': [...],
    'answer': [...],
    'contexts': [...],
    'ground_truths': [...]
}

result = evaluate(
    dataset, 
    metrics=[faithfulness, answer_relevancy, context_precision, context_recall]
)

Setting Up Continuous Evaluation:

Sample 1-5% of production queries for evaluation
Mix automatic RAGAS scoring with human review
Set up alerts when scores drop below thresholds
Track trends to catch gradual degradation

Case Study• Enterprise SaaS Platform

Production RAG Monitoring Success

Challenge

Needed continuous monitoring of RAG quality across 10K+ daily queries without manual review overhead

Solution

Implemented RAGAS automated evaluation with human review sampling and trend monitoring

Results

Reduced manual review effort by 85% while improving response quality through early detection of performance degradation

Key Metrics

Manual Review Reduction85%

Quality Score Improvement23%

Issue Detection Time< 24 hours

Implementation Time2 weeks

GraphRAG: When Traditional RAG Hits a Wall

The Problem Traditional RAG Can't Solve

Traditional RAG works great for direct questions: "What's our refund policy?" or "How do I reset my password?"

But it struggles with questions like:

"What are the common themes across all customer complaints this quarter?"
"How do different product features relate to each other?"
"What patterns emerge from our sales data across regions?"

Why? Because these questions require connecting information across multiple documents and identifying patterns that span your entire corpus.

How GraphRAG Changes the Game

Microsoft's GraphRAG doesn't just retrieve relevant chunks—it builds a knowledge graph from your documents, then uses that graph to answer complex analytical questions.

The GraphRAG Process:

Entity Extraction: Identify people, places, concepts, and relationships
Community Detection: Group related entities into logical clusters
Community Summarization: Create summaries of each cluster
Query-time Graph Traversal: Use the graph structure to find connected information

Microsoft's Numbers: Research shows GraphRAG has a 70-80% "win rate" vs traditional RAG for complex analytical questions. The trade-off? It's more expensive and complex to implement.

When to Use GraphRAG

Good Candidates:

Research and analysis use cases
Legal document review requiring pattern identification
Business intelligence queries across large document sets
Investigative journalism connecting disparate sources
Academic research synthesis and discovery

Skip GraphRAG For:

Simple Q&A use cases
Technical documentation lookup
Customer support with direct answers
Cost-sensitive implementations
Small document sets (<1000 documents)

Building Knowledge Connections

Implementation Strategy:

Phase 1: Entity and Relationship Extraction Use LLMs to identify entities and relationships in your documents:

People, organizations, products, concepts
"mentions", "relates to", "part of", "caused by" relationships
Store in a graph database (Neo4j, Amazon Neptune, or ArangoDB)

Phase 2: Community Detection Group related entities using algorithms like:

Leiden algorithm (Microsoft's choice) for high-quality communities
Louvain method for faster processing
Hierarchical clustering for nested community structures

Phase 3: Query Processing For analytical questions:

Identify which communities contain relevant information
Retrieve community summaries instead of individual chunks
Use summaries to guide deeper retrieval if needed
Generate answers that synthesize across communities

The Cost Reality: GraphRAG is 2-5x more expensive than traditional RAG due to:

Entity extraction requiring LLM calls for every document
Graph storage and maintenance overhead
More complex query processing
Community summarization costs

But for analytical use cases, the quality improvement often justifies the cost.

Hybrid Approach: Best of Both Worlds

Smart Implementation Strategy:

Use traditional RAG for direct, factual questions
Route analytical queries to GraphRAG automatically
Implement query classification to choose the right approach
Cache community summaries to reduce ongoing costs

Query Routing Logic:

def route_query(question):
    analytical_keywords = ['trends', 'patterns', 'themes', 'analysis', 'compare', 'relationship']
    
    if any(keyword in question.lower() for keyword in analytical_keywords):
        return "graphrag"
    else:
        return "traditional_rag"

Progressive Enhancement:

Start with traditional RAG for 80% of use cases
Add GraphRAG for specific analytical requirements
Monitor which approach works better for different query types
Gradually expand GraphRAG usage based on proven value

The key insight: GraphRAG isn't a replacement for traditional RAG—it's a powerful complement for analytical use cases that require understanding connections and patterns across large document collections.

Ready to implement these optimization techniques in your RAG system? The performance improvements and advanced patterns you've learned here will transform your RAG system from a basic prototype into a production-ready solution that scales with your needs while maintaining quality and user satisfaction.

Our offices

Follow us

RAG Performance Optimization and Advanced Patterns: GraphRAG and Hybrid Search

RAG Performance Optimization and Advanced Patterns: GraphRAG and Hybrid Search

Making Your RAG System Actually Fast (And Smart)

The Hybrid Search Approach That Actually Works

The Reranking Magic

The Need for Speed

Performance Optimization Stack

Advanced Performance Patterns

Monitoring & Evaluation: Catching Problems Before Users Do

The Metrics That Actually Matter

The RAGAS Evaluation Framework

Production RAG Monitoring Success

Challenge

Solution

Results

Key Metrics

GraphRAG: When Traditional RAG Hits a Wall

The Problem Traditional RAG Can't Solve

How GraphRAG Changes the Game

When to Use GraphRAG

Building Knowledge Connections

Hybrid Approach: Best of Both Worlds

More articles

RAG Implementation Roadmap: Avoiding Pitfalls and 90-Day Success Plan

RAG Architecture and Vector Database Selection: Complete Decision Framework

Let’s turn your vision into reality.

Email us