Skip to main content

Meilisearch Index Definition for HIPAA + FINRA

You already have pgvector for semantic search. Meilisearch (or similar) can own the text + metadata faceting for UI search filters.

Index Settings

{
"uid": "documents",
"primaryKey": "id",

"searchableAttributes": [
"title",
"summary",
"body",
"keywords",
"tags"
],

"filterableAttributes": [
"document_type",
"domain",
"jurisdiction",
"regulations",
"security_class",
"contains_phi",
"contains_pii",
"contains_financial",
"status",
"retention_category",
"business_unit",
"desk",
"facility",
"owner_role",
"owner_user_id",
"effective_date",
"review_due_date",
"retain_until",
"created_at",
"updated_at"
],

"sortableAttributes": [
"effective_date",
"review_due_date",
"retain_until",
"created_at",
"updated_at"
]
}

Document Structure

Each indexed document maps from documents + document_metadata:

{
"id": "550e8400-e29b-41d4-a716-446655440000",
"title": "HIPAA Privacy Officer Policy",
"summary": "Defines the role and responsibilities of the Privacy Officer",
"body": "This policy establishes the requirements for...",
"keywords": ["hipaa", "privacy", "officer", "compliance"],
"tags": ["policy", "hipaa", "privacy"],

"document_type": "policy",
"domain": "security-privacy",
"jurisdiction": ["US"],
"regulations": ["HIPAA-164.316", "HIPAA-164.530"],
"security_class": "confidential",
"contains_phi": true,
"contains_pii": true,
"contains_financial": false,
"status": "effective",
"retention_category": "HIPAA-6Y",

"business_unit": "Compliance",
"desk": null,
"facility": "Hospital-A",
"owner_role": "Privacy Officer",
"owner_user_id": "u123",

"effective_date": "2025-01-01",
"review_due_date": "2027-01-01",
"retain_until": "2031-01-01",
"created_at": "2025-01-01T10:00:00Z",
"updated_at": "2025-01-10T11:00:00Z"
}

Search Integration with CODITECT API

Hybrid Search Flow

  1. User query → CODITECT /api/v1/search/hybrid
  2. pgvector → k-NN over embeddings → candidate doc IDs
  3. Meilisearch → text search + compliance filters → refined results
  4. RRF fusion → combined scoring
  5. Return → ranked results with snippets

Filter Syntax Examples

# HIPAA PHI documents in specific facility
contains_phi = true AND facility = "Hospital-A"

# FINRA documents approaching review
regulations = "FINRA-4511" AND review_due_date < 1735689600

# Confidential documents by domain
security_class = "confidential" AND domain = "finance"

# Documents on legal hold
legal_hold = true

Integration with Existing Endpoints

Enhanced Search Request

POST /api/v1/search/hybrid

{
"query": "breach notification procedure",
"mode": "hybrid",
"filters": {
"regulations": ["HIPAA-164.316"],
"status": "effective",
"contains_phi": true
},
"limit": 20,
"offset": 0
}

Response Mapping

{
"results": [
{
"id": "uuid",
"title": "...",
"summary": "...",
"score": 0.95,
"highlights": {...},
"compliance": {
"regulations": ["HIPAA-164.316"],
"classification": "confidential",
"retention": "HIPAA-6Y",
"retain_until": "2031-01-01"
}
}
],
"total": 45,
"processing_time_ms": 23
}

Sync Strategy

On Document Create/Update

def sync_to_meilisearch(document, metadata):
"""Sync document to Meilisearch after DB update."""
index = meilisearch_client.index('documents')

doc = {
'id': str(document.id),
'title': document.title,
'summary': document.summary,
'body': get_document_text(document),
'keywords': list(document.keywords),
'tags': list(document.tags),
'document_type': document.document_type,
'domain': metadata.domain,
'jurisdiction': metadata.jurisdiction,
'regulations': metadata.regulations,
'security_class': metadata.security_class,
'contains_phi': metadata.contains_phi,
'contains_pii': metadata.contains_pii,
'contains_financial': metadata.contains_financial,
'status': metadata.status,
'retention_category': metadata.retention_category,
'business_unit': metadata.business_unit,
'facility': metadata.facility,
'owner_role': metadata.owner_role,
'owner_user_id': metadata.owner_user_id,
'effective_date': metadata.effective_date.isoformat() if metadata.effective_date else None,
'review_due_date': metadata.review_due_date.isoformat() if metadata.review_due_date else None,
'retain_until': metadata.retain_until.isoformat(),
'created_at': document.created_at.isoformat(),
'updated_at': document.updated_at.isoformat()
}

index.add_documents([doc])

References