Skip to main content

ADR-024: FastAPI Web Framework Selection

Status: Accepted Date: 2025-12-28 Deciders: Hal Casteel Categories: Architecture, Backend, API


Context

CODITECT Document Management System requires a robust API layer to expose document management, semantic search, and analytics capabilities. The API must support:

Requirements

  1. High-Performance Vector Search: Sub-100ms p95 latency for semantic search queries
  2. Async Database Operations: Non-blocking PostgreSQL (pgvector) and Redis operations
  3. Real-Time Embedding Generation: Concurrent calls to OpenAI/Anthropic embedding APIs
  4. Multi-Tenant Isolation: Secure tenant separation with proper authentication
  5. API Documentation: Auto-generated OpenAPI/Swagger documentation
  6. Type Safety: Strong typing for request/response validation
  7. Production Readiness: Battle-tested for enterprise deployment

Constraints

  • Must integrate with existing Python ML/AI ecosystem (embeddings, vector operations)
  • Must support async patterns for I/O-bound operations
  • Must provide automatic request validation
  • Must generate API documentation without manual maintenance
  • Must have proven production track record at scale

Decision

We will use FastAPI as the primary web framework for the CODITECT Document Management System API because it provides the optimal combination of async performance, type safety, automatic documentation, and Python ecosystem integration required for our semantic search and document management use cases.


Alternatives Considered

1. Django REST Framework (DRF)

Pros:

  • Mature, battle-tested framework
  • Excellent admin interface
  • Strong ORM integration
  • Large community and ecosystem

Cons:

  • Sync-first architecture (async support added later, not native)
  • Heavier footprint for API-only services
  • ORM doesn't natively support pgvector
  • Slower request processing compared to ASGI frameworks

Rejection Reason: Our use case requires async I/O for vector search, embedding APIs, and Redis operations. Django's sync-first architecture would require additional complexity for proper async handling.

2. Flask

Pros:

  • Lightweight and flexible
  • Simple to get started
  • Large extension ecosystem

Cons:

  • No native async support (requires additional libraries)
  • No built-in request validation
  • No automatic API documentation
  • Manual type checking required

Rejection Reason: Lacks native async support and automatic OpenAPI generation, which are critical for our requirements.

3. Starlette (FastAPI's Foundation)

Pros:

  • Maximum control and flexibility
  • Minimal overhead
  • Pure ASGI implementation

Cons:

  • No built-in request/response validation
  • No automatic OpenAPI generation
  • Requires more boilerplate code

Rejection Reason: Would require reimplementing features that FastAPI provides out of the box (Pydantic validation, OpenAPI generation).

4. gRPC (Python)

Pros:

  • Binary protocol (efficient)
  • Strong typing via Protocol Buffers
  • Excellent for microservices communication

Cons:

  • No browser client support without gateway
  • More complex tooling
  • Protocol Buffer management overhead

Rejection Reason: Our API needs to be consumable by web browsers and third-party integrations, which requires REST/HTTP. gRPC may be considered for internal service-to-service communication in future.

5. Go Frameworks (Gin, Echo, Fiber)

Pros:

  • Superior raw performance
  • Lower memory footprint
  • Excellent concurrency model

Cons:

  • Separate language from ML/AI code
  • No native pgvector integration
  • Would require polyglot architecture

Rejection Reason: Our embedding and vector operations are Python-native. Introducing Go would create language boundaries and complicate deployment.


Rationale

Why FastAPI?

1. Native Async Support

FastAPI is built on Starlette (ASGI) with first-class async support:

@app.get("/documents/search")
async def search_documents(query: str, top_k: int = 10):
# All I/O operations are non-blocking
embedding = await embedding_service.embed_text(query)
results = await search_service.vector_search(embedding, top_k)
return results

This is critical for:

  • pgvector similarity searches
  • Redis cache operations
  • OpenAI/Anthropic embedding API calls
  • Concurrent request handling

2. Automatic Request Validation (Pydantic)

Type-safe request/response handling with automatic validation:

class SearchRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=10000)
top_k: int = Field(default=10, ge=1, le=100)
tenant_id: UUID
min_score: float = Field(default=0.0, ge=0.0, le=1.0)

@app.post("/documents/search")
async def search(request: SearchRequest) -> SearchResponse:
# Request is already validated
return await search_service.search(request)

3. Automatic OpenAPI Documentation

Zero-configuration API documentation:

  • Swagger UI at /docs
  • ReDoc at /redoc
  • OpenAPI JSON at /openapi.json

This eliminates documentation drift and enables:

  • Client SDK generation
  • API testing tools integration
  • Developer self-service

4. Performance Benchmarks

FastAPI consistently ranks among the fastest Python frameworks:

FrameworkRequests/secLatency (p95)
FastAPI15,000+<10ms
Django REST2,000-5,00030-50ms
Flask3,000-6,00020-40ms

Benchmarks for simple JSON endpoints. Source: TechEmpower Framework Benchmarks

5. Production Track Record

FastAPI is used in production by:

  • Microsoft - Various internal services
  • Netflix - Internal tools and APIs
  • Uber - Dispatch services
  • Stripe - Internal tooling

This validates enterprise-readiness and scalability.

6. Python Ecosystem Integration

Seamless integration with our stack:

  • SQLAlchemy 2.0 - Async ORM support
  • asyncpg - Native async PostgreSQL driver
  • pgvector - Vector similarity operations
  • Redis - Async client (redis-py with aioredis)
  • Pydantic V2 - Data validation and settings
  • OpenAI/Anthropic SDKs - Async embedding generation

Consequences

Positive

  1. Performance: Async I/O enables high concurrency for database and API operations
  2. Developer Experience: Type hints + auto-docs reduce development time
  3. Type Safety: Pydantic validation catches errors at request time
  4. Documentation: API docs always in sync with code
  5. Testing: Easy to test with TestClient and pytest-asyncio
  6. Ecosystem: Strong integration with Python ML/AI libraries

Negative

  1. Learning Curve: Developers unfamiliar with async patterns need training
  2. Debugging Complexity: Async stack traces can be harder to follow
  3. Dependency on Pydantic: Tight coupling to Pydantic for validation

Mitigations

  1. Async Training: Document async patterns in developer guides
  2. Structured Logging: Implement correlation IDs for request tracing
  3. Pydantic Abstraction: Use interface layer to minimize coupling

Implementation Guidelines

Project Structure

src/backend/
├── api/
│ ├── __init__.py
│ ├── main.py # FastAPI app instance
│ ├── dependencies.py # Dependency injection
│ ├── middleware.py # Custom middleware
│ └── routes/
│ ├── documents.py # Document CRUD
│ ├── search.py # Semantic search
│ ├── analytics.py # Metrics/analytics
│ └── auth.py # Authentication
├── schemas/ # Pydantic models
│ ├── documents.py
│ ├── search.py
│ └── common.py
├── services/ # Business logic
└── models/ # SQLAlchemy models

Standard Patterns

Dependency Injection

async def get_db() -> AsyncGenerator[AsyncSession, None]:
async with async_session() as session:
yield session

async def get_current_tenant(
token: str = Depends(oauth2_scheme),
db: AsyncSession = Depends(get_db),
) -> Tenant:
return await auth_service.validate_token(token, db)

@app.get("/documents")
async def list_documents(
tenant: Tenant = Depends(get_current_tenant),
db: AsyncSession = Depends(get_db),
):
return await document_service.list(tenant.id, db)

Error Handling

class DocumentNotFoundError(HTTPException):
def __init__(self, doc_id: UUID):
super().__init__(
status_code=404,
detail=f"Document {doc_id} not found"
)

@app.exception_handler(ValidationError)
async def validation_exception_handler(request, exc):
return JSONResponse(
status_code=422,
content={"detail": exc.errors()}
)

Compliance

Security Requirements

  • JWT authentication via python-jose
  • CORS configuration for allowed origins
  • Rate limiting via middleware
  • Request size limits
  • Security headers (HSTS, CSP, etc.)

Performance Requirements

  • Connection pooling for PostgreSQL
  • Redis connection pooling
  • Response compression (gzip)
  • Cache headers for static responses

Monitoring Requirements

  • Prometheus metrics endpoint (/metrics)
  • Health check endpoint (/health)
  • OpenTelemetry tracing integration


Revision History

VersionDateAuthorChanges
1.0.02025-12-28Hal CasteelInitial ADR creation