Skip to main content

Semantic Embedding Service for MoE Classification System.

Provides true vector embeddings using sentence-transformers for improved document classification accuracy. Replaces regex-based pattern matching with semantic similarity.

Features:

  • Pre-computed exemplar embeddings for each document type
  • Efficient similarity calculation via cosine similarity
  • Caching support for repeated classifications
  • Graceful fallback when model unavailable

File: embeddings.py

Classes

EmbeddingConfig

Configuration for embedding service.

SimilarityResult

Result from semantic similarity analysis.

SemanticEmbeddingService

True semantic embedding service for document classification.

Functions

get_embedding_service(config)

Get or create singleton embedding service.

classify(content)

Classify document by embedding similarity.

get_similar_types(content, top_k)

Get top-k similar document types.

is_available()

Check if embedding service is available.

clear_cache()

Clear embedding cache.

get_stats()

Get service statistics.

Usage

python embeddings.py