A retrieval-augmented generation (RAG) pipeline built over 1.2M+ Boston Public Library records, enabling semantic search across 5M+ embedded text chunks with 2-5 second query latency. The system combines PostgreSQL with pgvector for vector similarity search, OpenAI GPT-4o-mini for LLM-based query expansion, and BM25 reranking to improve retrieval accuracy on historical natural language queries over keyword-based search. Chunking and embedding strategies were tuned for archival text — balancing chunk size and overlap to preserve context across paragraph boundaries.
- PlatformWeb App (Streamlit)
- StackPython, PostgreSQL, pgvector, OpenAI GPT-4o-mini, Streamlit
- TechniquesRetrieval-Augmented Generation, Vector Embeddings, BM25 Reranking, LLM Query Expansion, Semantic Chunking