Semantic Embedding Service for MoE Classification System.
Provides true vector embeddings using sentence-transformers for improved document classification accuracy. Replaces regex-based pattern matching with semantic similarity.
Features:
- Pre-computed exemplar embeddings for each document type
- Efficient similarity calculation via cosine similarity
- Caching support for repeated classifications
- Graceful fallback when model unavailable
File: embeddings.py
Classes
EmbeddingConfig
Configuration for embedding service.
SimilarityResult
Result from semantic similarity analysis.
SemanticEmbeddingService
True semantic embedding service for document classification.
Functions
get_embedding_service(config)
Get or create singleton embedding service.
classify(content)
Classify document by embedding similarity.
get_similar_types(content, top_k)
Get top-k similar document types.
is_available()
Check if embedding service is available.
clear_cache()
Clear embedding cache.
get_stats()
Get service statistics.
Usage
python embeddings.py