Semantic Search Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Search Query โ
โ "find maintenance docs for Q3" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Query Processing โ
โ (Embedding, filter extraction, query expansion) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ Vector Search โ โ Keyword Search โ โ Filters โ
โ (PGVector) โ โ (Elasticsearch) โ โ (SQL/NoSQL) โ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Result Fusion โ
โ (RRF, weighted combination, de-dup) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Re-ranking โ
โ (Cross-encoder, business rules, freshness) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Search Results โ
โ (Ranked, highlighted, faceted) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
Embedding Pipeline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| from sentence_transformers import SentenceTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
class EmbeddingPipeline:
def __init__(self):
self.model = SentenceTransformer('all-MiniLM-L6-v2')
self.splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
def embed_document(self, doc: Document) -> list[Chunk]:
# Split into chunks
chunks = self.splitter.split_text(doc.content)
# Generate embeddings
embeddings = self.model.encode(chunks)
return [
Chunk(
content=chunk,
embedding=embedding,
metadata={
"doc_id": doc.id,
"title": doc.title,
"chunk_index": i
}
)
for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))
]
def embed_query(self, query: str) -> np.ndarray:
return self.model.encode(query)
|
Hybrid Search Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
| from pgvector.sqlalchemy import Vector
class HybridSearch:
def __init__(self, db, elasticsearch):
self.db = db
self.es = elasticsearch
self.embedder = EmbeddingPipeline()
async def search(
self,
query: str,
filters: dict = None,
k: int = 10
) -> list[SearchResult]:
# Semantic search
query_embedding = self.embedder.embed_query(query)
semantic_results = await self.db.execute(
select(Document)
.order_by(Document.embedding.cosine_distance(query_embedding))
.limit(50)
)
# Keyword search
keyword_results = await self.es.search(
index="documents",
body={
"query": {
"multi_match": {
"query": query,
"fields": ["title^2", "content"]
}
},
"size": 50
}
)
# Reciprocal Rank Fusion
fused = self.reciprocal_rank_fusion(
semantic_results,
keyword_results,
k=60
)
# Re-rank with cross-encoder
reranked = await self.rerank(query, fused)
return reranked[:k]
def reciprocal_rank_fusion(self, *result_lists, k=60):
scores = {}
for results in result_lists:
for rank, doc in enumerate(results):
if doc.id not in scores:
scores[doc.id] = 0
scores[doc.id] += 1 / (k + rank + 1)
return sorted(scores.items(), key=lambda x: -x[1])
|
Vector Databases I Use
| Database | Best For | Characteristics |
|---|
| PGVector | PostgreSQL shops | SQL familiarity, ACID |
| Pinecone | Managed, scale | Serverless, fast |
| Chroma | Prototyping | Simple, embedded |
| Weaviate | Multi-modal | GraphQL, modules |
| Qdrant | Performance | Rust, filtering |
Technologies for Semantic Search
- Embeddings: OpenAI, Sentence Transformers, Cohere
- Vector DBs: PGVector, Pinecone, Chroma, Qdrant
- Keyword: Elasticsearch, OpenSearch
- Re-ranking: Cohere Rerank, cross-encoders
- Frameworks: LangChain, LlamaIndex
- Languages: Python, TypeScript
Frequently Asked Questions
What is semantic search?
Semantic search finds results based on meaning rather than exact keyword matches. It uses embeddings (vector representations) to understand context and intent. Semantic search can find relevant documents even when they don’t contain the exact search terms.
How much does semantic search implementation cost?
Semantic search development typically costs $110-160 per hour. A basic implementation starts around $15,000-30,000, while enterprise search with hybrid retrieval, faceting, and multi-language support ranges from $50,000-120,000+.
Semantic search vs keyword search, when should I use each?
Use semantic search for: natural language queries, finding similar content, handling synonyms, or when users don’t know exact terms. Use keyword search for: exact matches, filtering, or structured queries. Best practice: hybrid search combining both.
What embedding models do you use?
I work with: OpenAI embeddings (ada-002, text-embedding-3), Sentence Transformers, Cohere, and custom fine-tuned models. The choice depends on accuracy requirements, language support, and cost. I help benchmark options for your use case.
How do you handle multi-language semantic search?
I implement: multilingual embedding models that work across languages, language detection for query routing, and proper tokenization for non-English text. This enables search that works across language barriers.
Experience:
Case Studies:
Related Technologies: RAG Systems, Vector Databases, Elasticsearch, PostgreSQL