What is hybrid search in a RAG pipeline?

Hybrid search combines semantic (vector) search with keyword-based (BM25) search. Semantic search finds contextually similar results; BM25 finds exact keyword matches. Combining both gives better retrieval than either approach alone.

Does RAG Factory support Pinecone?

Yes. RAG Factory supports both pgvector (default, self-hosted Postgres) and Pinecone (managed, for production scale). Switch between them via configuration.

Production-grade · No boilerplate pain

From Document
to Answer

Production-ready RAG pipeline with hybrid search, Cohere reranking, hallucination reduction, streaming responses, and configurable chunking. FastAPI + pgvector or Pinecone.

🧠 Get the RAG Kit See the Architecture ↓

One-time purchase · Full source code · MIT Licensed

PDF / MD / Web

→

Chunker

→

Embeddings

→

pgvector

→

Rerank

→

Stream →

Architecture

Production-grade from day one

Most RAG tutorials give you semantic search and call it done. This kit implements the full production architecture that actually works at scale.

🔍

Hybrid Search

Combines semantic (vector) search with keyword (BM25) search. Gets the benefits of both — context-aware AND keyword-precise retrieval.

🎯

Cohere Reranking

Cohere Rerank v3 as the second-stage ranker. Sends top candidates and returns relevance-ranked results. Dramatically improves precision.

🧩

Parent-Document Retrieval

Retrieves small chunks for precision but sends parent documents to the LLM for context. Best of both indexing strategies.

🌊

Streaming Responses

Server-Sent Events streaming via FastAPI. Users see answers token-by-token. Dramatic UX improvement for long responses.

🛡️

Hallucination Reduction

Prompt layer that grounds the LLM in retrieved context with source citations. Reduces fabricated answers significantly.

⚙️

Configurable Chunking

Three chunking strategies — fixed, recursive, and semantic. Switch between them per-collection based on document type.

The Stack

Best-in-class at each layer

Embeddings

OpenAI text-embedding-3-large

or nomic-embed-text (local)

Vector DB

pgvector

or Pinecone (scale tier)

Reranking

Cohere Rerank v3

plug-and-play

LLM

Claude 3.5 Sonnet

or GPT-4o (configurable)

API Layer

FastAPI

with SSE streaming

Ingestion

PDF + Markdown + Web

3 ingestion pipelines

What's Included

Every module, complete

✓

app/ingest/PDF, Markdown, and web scraping ingestion pipelines

✓

app/retrieval/Hybrid search, parent-doc, and reranking modules

✓

app/generation/Hallucination-reduction prompt layer + streaming

✓

Vector DB configpgvector schema + Pinecone integration

✓

Chunking strategiesFixed, recursive, and semantic chunkers

✓

FastAPI endpointsQuery + ingest + health check endpoints

✓

Docker ComposeLocal dev with Postgres + pgvector pre-configured

✓

Evaluation scriptsRAGAS-compatible evaluation setup

Pricing

Skip weeks of architecture work.

^$49

One-time purchase · Full source code · MIT License

Complete FastAPI RAG pipeline
Hybrid search (semantic + BM25)
Cohere Rerank v3 integration
PDF + Markdown + web ingestion
Streaming SSE responses
pgvector + Pinecone support
Configurable chunking strategies
Docker Compose + evaluation scripts

Buy on Gumroad →

30-day money-back guarantee

From Documentto Answer

Production-grade from day one

Best-in-class at each layer

Every module, complete

Skip weeks of architecture work.

From Document
to Answer