Quick Start Guide: AI-Powered PDF Analysis Platform
Get the complete system running in under 30 minutes (local) or 2 hours (GKE production).
Prerequisites
Required Tools
# Verify installations
node --version # v20+
python --version # 3.11+
docker --version # 24.0+
kubectl version # 1.28+
gcloud version # Latest
Required Accounts
- Google Cloud Platform account with billing enabled
- Anthropic API key (get one here)
Option A: Local Development (5 minutes)
Perfect for development and testing.
1. Clone Repository
git clone https://github.com/yourorg/pdf-analysis-platform.git
cd pdf-analysis-platform
2. Start Infrastructure Services
# Start Redis
docker run -d -p 6379:6379 --name redis redis:7-alpine
# Start PostgreSQL (optional for local dev)
docker run -d -p 5432:5432 --name postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=pdfanalysis \
postgres:15-alpine
3. Configure Environment
# Backend configuration
cd backend
cp .env.example .env
# Edit .env
cat > .env << EOF
REDIS_URL=redis://localhost:6379
ANTHROPIC_API_KEY=your_api_key_here
GCS_BUCKET=local-dev-bucket
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/pdfanalysis
LOG_LEVEL=debug
EOF
4. Start Backend
# Install dependencies
pip install -r requirements.txt
# Run migrations (if using database)
alembic upgrade head
# Start server
uvicorn main:app --reload --port 8000
Backend running at: http://localhost:8000
API docs: http://localhost:8000/docs
5. Start Frontend
# In new terminal
cd frontend
# Install dependencies
npm install
# Configure environment
cat > .env << EOF
VITE_API_URL=http://localhost:8000
VITE_WS_URL=ws://localhost:8000/ws
EOF
# Start dev server
npm run dev
Frontend running at: http://localhost:5173
6. Test the System
# Upload a test PDF
curl -X POST http://localhost:8000/api/v1/documents/upload \
-F "file=@test.pdf" \
-F "user_id=test_user"
# Response:
# {
# "document_id": "uuid",
# "filename": "test.pdf",
# "status": "processing",
# "websocket_channel": "document:uuid"
# }
Open browser: http://localhost:5173 and upload a PDF!
Option B: Docker Compose (10 minutes)
Complete local environment with all services.
1. Create docker-compose.yml
version: '3.8'
services:
backend:
build: ./backend
ports:
- "8000:8000"
environment:
- REDIS_URL=redis://redis:6379
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- DATABASE_URL=postgresql+asyncpg://postgres:postgres@postgres:5432/pdfanalysis
depends_on:
- redis
- postgres
volumes:
- ./backend:/app
- /tmp/uploads:/tmp/uploads
frontend:
build: ./frontend
ports:
- "80:80"
environment:
- VITE_API_URL=http://localhost:8000
- VITE_WS_URL=ws://localhost:8000/ws
redis:
image: redis:7-alpine
ports:
- "6379:6379"
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=pdfanalysis
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
2. Start All Services
# Set your API key
export ANTHROPIC_API_KEY=your_key_here
# Start everything
docker-compose up -d
# View logs
docker-compose logs -f
# Check status
docker-compose ps
3. Access Application
- Frontend: http://localhost
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
4. Stop Services
docker-compose down
# Remove volumes too
docker-compose down -v
Option C: GKE Production (2 hours)
Full production deployment on Google Kubernetes Engine.
Phase 1: GCP Setup (30 minutes)
1. Enable Required APIs
# Set project
export PROJECT_ID=your-project-id
gcloud config set project $PROJECT_ID
# Enable APIs
gcloud services enable \
container.googleapis.com \
sqladmin.googleapis.com \
redis.googleapis.com \
storage-api.googleapis.com \
secretmanager.googleapis.com
2. Create GKE Cluster
# Create Autopilot cluster (recommended)
gcloud container clusters create-auto pdf-analysis-cluster \
--region=us-central1 \
--project=$PROJECT_ID
# Get credentials
gcloud container clusters get-credentials pdf-analysis-cluster \
--region=us-central1
3. Create Cloud SQL Instance
gcloud sql instances create pdf-analysis-db \
--database-version=POSTGRES_15 \
--tier=db-custom-2-7680 \
--region=us-central1 \
--network=default \
--no-assign-ip
# Create database
gcloud sql databases create pdfanalysis \
--instance=pdf-analysis-db
# Set password
gcloud sql users set-password postgres \
--instance=pdf-analysis-db \
--password=YOUR_SECURE_PASSWORD
4. Create Redis Instance
gcloud redis instances create pdf-analysis-redis \
--size=5 \
--region=us-central1 \
--redis-version=redis_7_0 \
--tier=standard
5. Create Storage Bucket
gsutil mb -l us-central1 gs://${PROJECT_ID}-pdf-storage
gsutil versioning set on gs://${PROJECT_ID}-pdf-storage
Phase 2: Deploy Application (45 minutes)
1. Build and Push Images
# Configure Docker to use GCR
gcloud auth configure-docker
# Build backend
cd backend
docker build -t gcr.io/${PROJECT_ID}/pdf-analysis-backend:v1.0.0 .
docker push gcr.io/${PROJECT_ID}/pdf-analysis-backend:v1.0.0
# Build frontend
cd ../frontend
docker build -t gcr.io/${PROJECT_ID}/pdf-analysis-frontend:v1.0.0 .
docker push gcr.io/${PROJECT_ID}/pdf-analysis-frontend:v1.0.0
2. Create Secrets
# Create namespace
kubectl create namespace pdf-analysis
# Create secret for Anthropic API key
kubectl create secret generic pdf-analysis-secrets \
--from-literal=ANTHROPIC_API_KEY=your_key_here \
--from-literal=DATABASE_URL=postgresql://... \
--namespace=pdf-analysis
3. Update Kubernetes Manifests
# Update image references in k8s/deployment.yaml
sed -i "s/PROJECT_ID/${PROJECT_ID}/g" k8s/deployment.yaml
4. Deploy to Cluster
# Apply all manifests
kubectl apply -f k8s/
# Wait for rollout
kubectl rollout status deployment/backend -n pdf-analysis
kubectl rollout status deployment/frontend -n pdf-analysis
# Check pods
kubectl get pods -n pdf-analysis
5. Configure Ingress
# Get external IP
kubectl get ingress -n pdf-analysis
# Update DNS
# Create A record: pdfanalysis.yourdomain.com -> EXTERNAL_IP
6. Enable SSL (Optional but Recommended)
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
# Create ClusterIssuer
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: gce
EOF
Phase 3: Monitoring Setup (30 minutes)
1. Install Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
2. Access Grafana
# Get Grafana password
kubectl get secret -n monitoring prometheus-grafana \
-o jsonpath="{.data.admin-password}" | base64 --decode
# Port forward
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Access: http://localhost:3000
# Username: admin
# Password: <from above>
3. Import Dashboards
Pre-built dashboards available at:
- Backend API: Dashboard ID 14282
- Redis: Dashboard ID 11835
- Nginx: Dashboard ID 9614
Phase 4: Verify Deployment (15 minutes)
1. Health Checks
# Backend health
curl https://pdfanalysis.yourdomain.com/
# API documentation
open https://pdfanalysis.yourdomain.com/docs
2. Upload Test
# Upload via API
curl -X POST https://pdfanalysis.yourdomain.com/api/v1/documents/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@test.pdf"
3. Monitor Logs
# Backend logs
kubectl logs -f -l app=backend -n pdf-analysis
# All pods
kubectl logs -f -l app=backend -n pdf-analysis --all-containers=true
Troubleshooting
Common Issues
Backend won't start
# Check logs
kubectl logs -l app=backend -n pdf-analysis
# Common causes:
# - Missing ANTHROPIC_API_KEY
# - Redis connection failed
# - Database connection failed
# Verify secrets
kubectl get secrets -n pdf-analysis
kubectl describe secret pdf-analysis-secrets -n pdf-analysis
Frontend can't connect to backend
# Check ingress
kubectl describe ingress pdf-analysis-ingress -n pdf-analysis
# Verify backend service
kubectl get svc backend-service -n pdf-analysis
# Test internal connectivity
kubectl run -it --rm debug --image=curlimages/curl --restart=Never \
-- curl http://backend-service.pdf-analysis/
WebSocket connection fails
# Check ingress annotations
kubectl get ingress pdf-analysis-ingress -n pdf-analysis -o yaml
# Ensure these annotations exist:
# nginx.ingress.kubernetes.io/websocket-services: backend-service
# nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
# nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
AI analysis fails
# Check Claude API key
kubectl get secret pdf-analysis-secrets -n pdf-analysis -o jsonpath='{.data.ANTHROPIC_API_KEY}' | base64 -d
# Check API quota
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-20250514","max_tokens":10,"messages":[{"role":"user","content":"test"}]}'
High memory usage
# Check resource usage
kubectl top pods -n pdf-analysis
# Scale up if needed
kubectl scale deployment backend --replicas=5 -n pdf-analysis
# Update resource limits in deployment.yaml
Performance Optimization
Backend Optimization
# Add connection pooling
from sqlalchemy.ext.asyncio import create_async_engine
engine = create_async_engine(
DATABASE_URL,
pool_size=20,
max_overflow=10,
pool_pre_ping=True,
pool_recycle=3600
)
# Enable Redis caching
from functools import lru_cache
@lru_cache(maxsize=1000)
async def get_document_metadata(doc_id: str):
# Cached in memory
pass
Frontend Optimization
// Lazy load components
const AnalysisPanel = lazy(() => import('./components/analysis/AnalysisPanel'));
// Memoize expensive computations
const sortedDocuments = useMemo(() => {
return documents.sort((a, b) =>
new Date(b.uploaded_at).getTime() - new Date(a.uploaded_at).getTime()
);
}, [documents]);
// Virtualize long lists
import { FixedSizeList } from 'react-window';
Cost Management
Monitor Costs
# View GKE costs
gcloud billing accounts list
gcloud billing projects link $PROJECT_ID --billing-account=ACCOUNT_ID
# Set budget alerts
gcloud billing budgets create \
--billing-account=ACCOUNT_ID \
--display-name="PDF Analysis Budget" \
--budget-amount=500 \
--threshold-rule=percent=50 \
--threshold-rule=percent=90
Cost Optimization Tips
- Use Autopilot Mode: Saves 20-30% on GKE costs
- Committed Use Discounts: 57% discount for 3-year commitment
- Preemptible VMs: For non-critical workloads (60-80% savings)
- Aggressive Caching: Reduce Claude API calls by 40%
- Smart Scaling: Scale down during off-peak hours
Next Steps
Development
- Set up CI/CD pipeline (GitHub Actions)
- Add unit tests (pytest, vitest)
- Configure pre-commit hooks
- Set up local debugging
Production
- Configure SSL/TLS certificates
- Set up monitoring alerts
- Configure backup automation
- Implement disaster recovery
- Security hardening audit
- Load testing
Features
- User authentication (OAuth)
- Document sharing
- Export functionality
- Batch processing
- Advanced search
- Analytics dashboard
Support & Resources
Documentation
Community
- GitHub Issues: Report bugs
- Discussions: Feature requests
- Slack: #pdf-analysis
References
Questions? Check TROUBLESHOOTING.md or open an issue.
Ready to deploy? Follow the production checklist in deployment.md.