Skip to main content

Quick Start Guide: AI-Powered PDF Analysis Platform

Get the complete system running in under 30 minutes (local) or 2 hours (GKE production).


Prerequisites

Required Tools

# Verify installations
node --version # v20+
python --version # 3.11+
docker --version # 24.0+
kubectl version # 1.28+
gcloud version # Latest

Required Accounts

  • Google Cloud Platform account with billing enabled
  • Anthropic API key (get one here)

Option A: Local Development (5 minutes)

Perfect for development and testing.

1. Clone Repository

git clone https://github.com/yourorg/pdf-analysis-platform.git
cd pdf-analysis-platform

2. Start Infrastructure Services

# Start Redis
docker run -d -p 6379:6379 --name redis redis:7-alpine

# Start PostgreSQL (optional for local dev)
docker run -d -p 5432:5432 --name postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=pdfanalysis \
postgres:15-alpine

3. Configure Environment

# Backend configuration
cd backend
cp .env.example .env

# Edit .env
cat > .env << EOF
REDIS_URL=redis://localhost:6379
ANTHROPIC_API_KEY=your_api_key_here
GCS_BUCKET=local-dev-bucket
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/pdfanalysis
LOG_LEVEL=debug
EOF

4. Start Backend

# Install dependencies
pip install -r requirements.txt

# Run migrations (if using database)
alembic upgrade head

# Start server
uvicorn main:app --reload --port 8000

Backend running at: http://localhost:8000
API docs: http://localhost:8000/docs

5. Start Frontend

# In new terminal
cd frontend

# Install dependencies
npm install

# Configure environment
cat > .env << EOF
VITE_API_URL=http://localhost:8000
VITE_WS_URL=ws://localhost:8000/ws
EOF

# Start dev server
npm run dev

Frontend running at: http://localhost:5173

6. Test the System

# Upload a test PDF
curl -X POST http://localhost:8000/api/v1/documents/upload \
-F "file=@test.pdf" \
-F "user_id=test_user"

# Response:
# {
# "document_id": "uuid",
# "filename": "test.pdf",
# "status": "processing",
# "websocket_channel": "document:uuid"
# }

Open browser: http://localhost:5173 and upload a PDF!


Option B: Docker Compose (10 minutes)

Complete local environment with all services.

1. Create docker-compose.yml

version: '3.8'

services:
backend:
build: ./backend
ports:
- "8000:8000"
environment:
- REDIS_URL=redis://redis:6379
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- DATABASE_URL=postgresql+asyncpg://postgres:postgres@postgres:5432/pdfanalysis
depends_on:
- redis
- postgres
volumes:
- ./backend:/app
- /tmp/uploads:/tmp/uploads

frontend:
build: ./frontend
ports:
- "80:80"
environment:
- VITE_API_URL=http://localhost:8000
- VITE_WS_URL=ws://localhost:8000/ws

redis:
image: redis:7-alpine
ports:
- "6379:6379"

postgres:
image: postgres:15-alpine
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=pdfanalysis
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data

volumes:
postgres_data:

2. Start All Services

# Set your API key
export ANTHROPIC_API_KEY=your_key_here

# Start everything
docker-compose up -d

# View logs
docker-compose logs -f

# Check status
docker-compose ps

3. Access Application

4. Stop Services

docker-compose down

# Remove volumes too
docker-compose down -v

Option C: GKE Production (2 hours)

Full production deployment on Google Kubernetes Engine.

Phase 1: GCP Setup (30 minutes)

1. Enable Required APIs

# Set project
export PROJECT_ID=your-project-id
gcloud config set project $PROJECT_ID

# Enable APIs
gcloud services enable \
container.googleapis.com \
sqladmin.googleapis.com \
redis.googleapis.com \
storage-api.googleapis.com \
secretmanager.googleapis.com

2. Create GKE Cluster

# Create Autopilot cluster (recommended)
gcloud container clusters create-auto pdf-analysis-cluster \
--region=us-central1 \
--project=$PROJECT_ID

# Get credentials
gcloud container clusters get-credentials pdf-analysis-cluster \
--region=us-central1

3. Create Cloud SQL Instance

gcloud sql instances create pdf-analysis-db \
--database-version=POSTGRES_15 \
--tier=db-custom-2-7680 \
--region=us-central1 \
--network=default \
--no-assign-ip

# Create database
gcloud sql databases create pdfanalysis \
--instance=pdf-analysis-db

# Set password
gcloud sql users set-password postgres \
--instance=pdf-analysis-db \
--password=YOUR_SECURE_PASSWORD

4. Create Redis Instance

gcloud redis instances create pdf-analysis-redis \
--size=5 \
--region=us-central1 \
--redis-version=redis_7_0 \
--tier=standard

5. Create Storage Bucket

gsutil mb -l us-central1 gs://${PROJECT_ID}-pdf-storage
gsutil versioning set on gs://${PROJECT_ID}-pdf-storage

Phase 2: Deploy Application (45 minutes)

1. Build and Push Images

# Configure Docker to use GCR
gcloud auth configure-docker

# Build backend
cd backend
docker build -t gcr.io/${PROJECT_ID}/pdf-analysis-backend:v1.0.0 .
docker push gcr.io/${PROJECT_ID}/pdf-analysis-backend:v1.0.0

# Build frontend
cd ../frontend
docker build -t gcr.io/${PROJECT_ID}/pdf-analysis-frontend:v1.0.0 .
docker push gcr.io/${PROJECT_ID}/pdf-analysis-frontend:v1.0.0

2. Create Secrets

# Create namespace
kubectl create namespace pdf-analysis

# Create secret for Anthropic API key
kubectl create secret generic pdf-analysis-secrets \
--from-literal=ANTHROPIC_API_KEY=your_key_here \
--from-literal=DATABASE_URL=postgresql://... \
--namespace=pdf-analysis

3. Update Kubernetes Manifests

# Update image references in k8s/deployment.yaml
sed -i "s/PROJECT_ID/${PROJECT_ID}/g" k8s/deployment.yaml

4. Deploy to Cluster

# Apply all manifests
kubectl apply -f k8s/

# Wait for rollout
kubectl rollout status deployment/backend -n pdf-analysis
kubectl rollout status deployment/frontend -n pdf-analysis

# Check pods
kubectl get pods -n pdf-analysis

5. Configure Ingress

# Get external IP
kubectl get ingress -n pdf-analysis

# Update DNS
# Create A record: pdfanalysis.yourdomain.com -> EXTERNAL_IP
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# Create ClusterIssuer
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: gce
EOF

Phase 3: Monitoring Setup (30 minutes)

1. Install Prometheus

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace

2. Access Grafana

# Get Grafana password
kubectl get secret -n monitoring prometheus-grafana \
-o jsonpath="{.data.admin-password}" | base64 --decode

# Port forward
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

# Access: http://localhost:3000
# Username: admin
# Password: <from above>

3. Import Dashboards

Pre-built dashboards available at:

  • Backend API: Dashboard ID 14282
  • Redis: Dashboard ID 11835
  • Nginx: Dashboard ID 9614

Phase 4: Verify Deployment (15 minutes)

1. Health Checks

# Backend health
curl https://pdfanalysis.yourdomain.com/

# API documentation
open https://pdfanalysis.yourdomain.com/docs

2. Upload Test

# Upload via API
curl -X POST https://pdfanalysis.yourdomain.com/api/v1/documents/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@test.pdf"

3. Monitor Logs

# Backend logs
kubectl logs -f -l app=backend -n pdf-analysis

# All pods
kubectl logs -f -l app=backend -n pdf-analysis --all-containers=true

Troubleshooting

Common Issues

Backend won't start

# Check logs
kubectl logs -l app=backend -n pdf-analysis

# Common causes:
# - Missing ANTHROPIC_API_KEY
# - Redis connection failed
# - Database connection failed

# Verify secrets
kubectl get secrets -n pdf-analysis
kubectl describe secret pdf-analysis-secrets -n pdf-analysis

Frontend can't connect to backend

# Check ingress
kubectl describe ingress pdf-analysis-ingress -n pdf-analysis

# Verify backend service
kubectl get svc backend-service -n pdf-analysis

# Test internal connectivity
kubectl run -it --rm debug --image=curlimages/curl --restart=Never \
-- curl http://backend-service.pdf-analysis/

WebSocket connection fails

# Check ingress annotations
kubectl get ingress pdf-analysis-ingress -n pdf-analysis -o yaml

# Ensure these annotations exist:
# nginx.ingress.kubernetes.io/websocket-services: backend-service
# nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
# nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"

AI analysis fails

# Check Claude API key
kubectl get secret pdf-analysis-secrets -n pdf-analysis -o jsonpath='{.data.ANTHROPIC_API_KEY}' | base64 -d

# Check API quota
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-20250514","max_tokens":10,"messages":[{"role":"user","content":"test"}]}'

High memory usage

# Check resource usage
kubectl top pods -n pdf-analysis

# Scale up if needed
kubectl scale deployment backend --replicas=5 -n pdf-analysis

# Update resource limits in deployment.yaml

Performance Optimization

Backend Optimization

# Add connection pooling
from sqlalchemy.ext.asyncio import create_async_engine

engine = create_async_engine(
DATABASE_URL,
pool_size=20,
max_overflow=10,
pool_pre_ping=True,
pool_recycle=3600
)

# Enable Redis caching
from functools import lru_cache

@lru_cache(maxsize=1000)
async def get_document_metadata(doc_id: str):
# Cached in memory
pass

Frontend Optimization

// Lazy load components
const AnalysisPanel = lazy(() => import('./components/analysis/AnalysisPanel'));

// Memoize expensive computations
const sortedDocuments = useMemo(() => {
return documents.sort((a, b) =>
new Date(b.uploaded_at).getTime() - new Date(a.uploaded_at).getTime()
);
}, [documents]);

// Virtualize long lists
import { FixedSizeList } from 'react-window';

Cost Management

Monitor Costs

# View GKE costs
gcloud billing accounts list
gcloud billing projects link $PROJECT_ID --billing-account=ACCOUNT_ID

# Set budget alerts
gcloud billing budgets create \
--billing-account=ACCOUNT_ID \
--display-name="PDF Analysis Budget" \
--budget-amount=500 \
--threshold-rule=percent=50 \
--threshold-rule=percent=90

Cost Optimization Tips

  1. Use Autopilot Mode: Saves 20-30% on GKE costs
  2. Committed Use Discounts: 57% discount for 3-year commitment
  3. Preemptible VMs: For non-critical workloads (60-80% savings)
  4. Aggressive Caching: Reduce Claude API calls by 40%
  5. Smart Scaling: Scale down during off-peak hours

Next Steps

Development

  • Set up CI/CD pipeline (GitHub Actions)
  • Add unit tests (pytest, vitest)
  • Configure pre-commit hooks
  • Set up local debugging

Production

  • Configure SSL/TLS certificates
  • Set up monitoring alerts
  • Configure backup automation
  • Implement disaster recovery
  • Security hardening audit
  • Load testing

Features

  • User authentication (OAuth)
  • Document sharing
  • Export functionality
  • Batch processing
  • Advanced search
  • Analytics dashboard

Support & Resources

Documentation

Community

  • GitHub Issues: Report bugs
  • Discussions: Feature requests
  • Slack: #pdf-analysis

References


Questions? Check TROUBLESHOOTING.md or open an issue.

Ready to deploy? Follow the production checklist in deployment.md.