Production-grade · No boilerplate pain

From Document
to Answer

Production-ready RAG pipeline with hybrid search, Cohere reranking, hallucination reduction, streaming responses, and configurable chunking. FastAPI + pgvector or Pinecone.

One-time purchase · Full source code · MIT Licensed

PDF / MD / Web
Chunker
Embeddings
pgvector
Rerank
Stream →

Production-grade from day one

Most RAG tutorials give you semantic search and call it done. This kit implements the full production architecture that actually works at scale.

🔍
Hybrid Search
Combines semantic (vector) search with keyword (BM25) search. Gets the benefits of both — context-aware AND keyword-precise retrieval.
🎯
Cohere Reranking
Cohere Rerank v3 as the second-stage ranker. Sends top candidates and returns relevance-ranked results. Dramatically improves precision.
🧩
Parent-Document Retrieval
Retrieves small chunks for precision but sends parent documents to the LLM for context. Best of both indexing strategies.
🌊
Streaming Responses
Server-Sent Events streaming via FastAPI. Users see answers token-by-token. Dramatic UX improvement for long responses.
🛡️
Hallucination Reduction
Prompt layer that grounds the LLM in retrieved context with source citations. Reduces fabricated answers significantly.
⚙️
Configurable Chunking
Three chunking strategies — fixed, recursive, and semantic. Switch between them per-collection based on document type.

Best-in-class at each layer

Embeddings
OpenAI text-embedding-3-large
or nomic-embed-text (local)
Vector DB
pgvector
or Pinecone (scale tier)
Reranking
Cohere Rerank v3
plug-and-play
LLM
Claude 3.5 Sonnet
or GPT-4o (configurable)
API Layer
FastAPI
with SSE streaming
Ingestion
PDF + Markdown + Web
3 ingestion pipelines

Every module, complete

app/ingest/PDF, Markdown, and web scraping ingestion pipelines
app/retrieval/Hybrid search, parent-doc, and reranking modules
app/generation/Hallucination-reduction prompt layer + streaming
Vector DB configpgvector schema + Pinecone integration
Chunking strategiesFixed, recursive, and semantic chunkers
FastAPI endpointsQuery + ingest + health check endpoints
Docker ComposeLocal dev with Postgres + pgvector pre-configured
Evaluation scriptsRAGAS-compatible evaluation setup

Skip weeks of architecture work.

$49

One-time purchase · Full source code · MIT License

  • Complete FastAPI RAG pipeline
  • Hybrid search (semantic + BM25)
  • Cohere Rerank v3 integration
  • PDF + Markdown + web ingestion
  • Streaming SSE responses
  • pgvector + Pinecone support
  • Configurable chunking strategies
  • Docker Compose + evaluation scripts
Buy on Gumroad →

30-day money-back guarantee