ADR-024: FastAPI Web Framework Selection
Status: Accepted Date: 2025-12-28 Deciders: Hal Casteel Categories: Architecture, Backend, API
Context
CODITECT Document Management System requires a robust API layer to expose document management, semantic search, and analytics capabilities. The API must support:
Requirements
- High-Performance Vector Search: Sub-100ms p95 latency for semantic search queries
- Async Database Operations: Non-blocking PostgreSQL (pgvector) and Redis operations
- Real-Time Embedding Generation: Concurrent calls to OpenAI/Anthropic embedding APIs
- Multi-Tenant Isolation: Secure tenant separation with proper authentication
- API Documentation: Auto-generated OpenAPI/Swagger documentation
- Type Safety: Strong typing for request/response validation
- Production Readiness: Battle-tested for enterprise deployment
Constraints
- Must integrate with existing Python ML/AI ecosystem (embeddings, vector operations)
- Must support async patterns for I/O-bound operations
- Must provide automatic request validation
- Must generate API documentation without manual maintenance
- Must have proven production track record at scale
Decision
We will use FastAPI as the primary web framework for the CODITECT Document Management System API because it provides the optimal combination of async performance, type safety, automatic documentation, and Python ecosystem integration required for our semantic search and document management use cases.
Alternatives Considered
1. Django REST Framework (DRF)
Pros:
- Mature, battle-tested framework
- Excellent admin interface
- Strong ORM integration
- Large community and ecosystem
Cons:
- Sync-first architecture (async support added later, not native)
- Heavier footprint for API-only services
- ORM doesn't natively support pgvector
- Slower request processing compared to ASGI frameworks
Rejection Reason: Our use case requires async I/O for vector search, embedding APIs, and Redis operations. Django's sync-first architecture would require additional complexity for proper async handling.
2. Flask
Pros:
- Lightweight and flexible
- Simple to get started
- Large extension ecosystem
Cons:
- No native async support (requires additional libraries)
- No built-in request validation
- No automatic API documentation
- Manual type checking required
Rejection Reason: Lacks native async support and automatic OpenAPI generation, which are critical for our requirements.
3. Starlette (FastAPI's Foundation)
Pros:
- Maximum control and flexibility
- Minimal overhead
- Pure ASGI implementation
Cons:
- No built-in request/response validation
- No automatic OpenAPI generation
- Requires more boilerplate code
Rejection Reason: Would require reimplementing features that FastAPI provides out of the box (Pydantic validation, OpenAPI generation).
4. gRPC (Python)
Pros:
- Binary protocol (efficient)
- Strong typing via Protocol Buffers
- Excellent for microservices communication
Cons:
- No browser client support without gateway
- More complex tooling
- Protocol Buffer management overhead
Rejection Reason: Our API needs to be consumable by web browsers and third-party integrations, which requires REST/HTTP. gRPC may be considered for internal service-to-service communication in future.
5. Go Frameworks (Gin, Echo, Fiber)
Pros:
- Superior raw performance
- Lower memory footprint
- Excellent concurrency model
Cons:
- Separate language from ML/AI code
- No native pgvector integration
- Would require polyglot architecture
Rejection Reason: Our embedding and vector operations are Python-native. Introducing Go would create language boundaries and complicate deployment.
Rationale
Why FastAPI?
1. Native Async Support
FastAPI is built on Starlette (ASGI) with first-class async support:
@app.get("/documents/search")
async def search_documents(query: str, top_k: int = 10):
# All I/O operations are non-blocking
embedding = await embedding_service.embed_text(query)
results = await search_service.vector_search(embedding, top_k)
return results
This is critical for:
- pgvector similarity searches
- Redis cache operations
- OpenAI/Anthropic embedding API calls
- Concurrent request handling
2. Automatic Request Validation (Pydantic)
Type-safe request/response handling with automatic validation:
class SearchRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=10000)
top_k: int = Field(default=10, ge=1, le=100)
tenant_id: UUID
min_score: float = Field(default=0.0, ge=0.0, le=1.0)
@app.post("/documents/search")
async def search(request: SearchRequest) -> SearchResponse:
# Request is already validated
return await search_service.search(request)
3. Automatic OpenAPI Documentation
Zero-configuration API documentation:
- Swagger UI at
/docs - ReDoc at
/redoc - OpenAPI JSON at
/openapi.json
This eliminates documentation drift and enables:
- Client SDK generation
- API testing tools integration
- Developer self-service
4. Performance Benchmarks
FastAPI consistently ranks among the fastest Python frameworks:
| Framework | Requests/sec | Latency (p95) |
|---|---|---|
| FastAPI | 15,000+ | <10ms |
| Django REST | 2,000-5,000 | 30-50ms |
| Flask | 3,000-6,000 | 20-40ms |
Benchmarks for simple JSON endpoints. Source: TechEmpower Framework Benchmarks
5. Production Track Record
FastAPI is used in production by:
- Microsoft - Various internal services
- Netflix - Internal tools and APIs
- Uber - Dispatch services
- Stripe - Internal tooling
This validates enterprise-readiness and scalability.
6. Python Ecosystem Integration
Seamless integration with our stack:
- SQLAlchemy 2.0 - Async ORM support
- asyncpg - Native async PostgreSQL driver
- pgvector - Vector similarity operations
- Redis - Async client (redis-py with aioredis)
- Pydantic V2 - Data validation and settings
- OpenAI/Anthropic SDKs - Async embedding generation
Consequences
Positive
- Performance: Async I/O enables high concurrency for database and API operations
- Developer Experience: Type hints + auto-docs reduce development time
- Type Safety: Pydantic validation catches errors at request time
- Documentation: API docs always in sync with code
- Testing: Easy to test with
TestClientandpytest-asyncio - Ecosystem: Strong integration with Python ML/AI libraries
Negative
- Learning Curve: Developers unfamiliar with async patterns need training
- Debugging Complexity: Async stack traces can be harder to follow
- Dependency on Pydantic: Tight coupling to Pydantic for validation
Mitigations
- Async Training: Document async patterns in developer guides
- Structured Logging: Implement correlation IDs for request tracing
- Pydantic Abstraction: Use interface layer to minimize coupling
Implementation Guidelines
Project Structure
src/backend/
├── api/
│ ├── __init__.py
│ ├── main.py # FastAPI app instance
│ ├── dependencies.py # Dependency injection
│ ├── middleware.py # Custom middleware
│ └── routes/
│ ├── documents.py # Document CRUD
│ ├── search.py # Semantic search
│ ├── analytics.py # Metrics/analytics
│ └── auth.py # Authentication
├── schemas/ # Pydantic models
│ ├── documents.py
│ ├── search.py
│ └── common.py
├── services/ # Business logic
└── models/ # SQLAlchemy models
Standard Patterns
Dependency Injection
async def get_db() -> AsyncGenerator[AsyncSession, None]:
async with async_session() as session:
yield session
async def get_current_tenant(
token: str = Depends(oauth2_scheme),
db: AsyncSession = Depends(get_db),
) -> Tenant:
return await auth_service.validate_token(token, db)
@app.get("/documents")
async def list_documents(
tenant: Tenant = Depends(get_current_tenant),
db: AsyncSession = Depends(get_db),
):
return await document_service.list(tenant.id, db)
Error Handling
class DocumentNotFoundError(HTTPException):
def __init__(self, doc_id: UUID):
super().__init__(
status_code=404,
detail=f"Document {doc_id} not found"
)
@app.exception_handler(ValidationError)
async def validation_exception_handler(request, exc):
return JSONResponse(
status_code=422,
content={"detail": exc.errors()}
)
Compliance
Security Requirements
- JWT authentication via
python-jose - CORS configuration for allowed origins
- Rate limiting via middleware
- Request size limits
- Security headers (HSTS, CSP, etc.)
Performance Requirements
- Connection pooling for PostgreSQL
- Redis connection pooling
- Response compression (gzip)
- Cache headers for static responses
Monitoring Requirements
- Prometheus metrics endpoint (
/metrics) - Health check endpoint (
/health) - OpenTelemetry tracing integration
Related Documents
- ADR-010: Clean Architecture
- ADR-012: Background Processing
- TDD-001: API Layer Technical Design
- SDD-001: Technology Stack
Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2025-12-28 | Hal Casteel | Initial ADR creation |