ADR-002-v4: Storage Architecture
Status: Accepted
Date: 2025-08-27
Context: Implementing three-tier storage for optimal cost and performance
Decision​
We will implement a three-tier storage system with FoundationDB for metadata and small files (≤10KB), Google Cloud Storage for larger files (>10KB), and Git integration for version control.
Implementation Details​
1. Storage Tier Decision Logic​
// src/storage/mod.rs - Core storage routing
use foundationdb::Transaction;
use google_cloud_storage::{Client as GcsClient, Object};
use sha2::{Sha256, Digest};
// CRITICAL: Exact threshold for storage tier decision
pub const FDB_SIZE_LIMIT: usize = 10_240; // Exactly 10KB (10 * 1024 bytes)
#[derive(Debug)]
pub struct StorageRouter {
fdb: Database,
gcs_client: GcsClient,
bucket_name: String,
}
impl StorageRouter {
pub async fn store_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
metadata: FileMetadata,
) -> Result<FileStorageResult, StorageError> {
// Calculate content hash for deduplication
let content_hash = calculate_sha256(content);
// CRITICAL: Size determines storage tier
if content.len() <= FDB_SIZE_LIMIT {
// Store in FoundationDB
self.store_in_fdb(workspace_id, path, content, metadata, content_hash).await
} else {
// Store in GCS
self.store_in_gcs(workspace_id, path, content, metadata, content_hash).await
}
}
async fn store_in_fdb(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
metadata: FileMetadata,
content_hash: String,
) -> Result<FileStorageResult, StorageError> {
let txn = self.fdb.create_transaction()?;
// Key structure for files in FDB
let file_key = format!("{}/files/{}/content", workspace_id, path);
let meta_key = format!("{}/files/{}/metadata", workspace_id, path);
// Store content and metadata atomically
txn.set(&file_key, content);
txn.set(&meta_key, &serde_json::to_vec(&metadata)?);
txn.commit().await?;
Ok(FileStorageResult {
storage_tier: StorageTier::FoundationDB,
content_hash,
size: content.len(),
path: path.to_string(),
})
}
async fn store_in_gcs(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
mut metadata: FileMetadata,
content_hash: String,
) -> Result<FileStorageResult, StorageError> {
// GCS object name includes workspace for isolation
let object_name = format!("{}/{}/{}", workspace_id, content_hash, path);
// Upload to GCS with content hash for deduplication
let upload_result = self.gcs_client
.object()
.create(
&self.bucket_name,
content.to_vec(),
&object_name,
"application/octet-stream",
)
.await?;
// Store metadata in FDB with GCS reference
metadata.gcs_location = Some(GcsLocation {
bucket: self.bucket_name.clone(),
object: object_name.clone(),
generation: upload_result.generation,
});
let txn = self.fdb.create_transaction()?;
let meta_key = format!("{}/files/{}/metadata", workspace_id, path);
txn.set(&meta_key, &serde_json::to_vec(&metadata)?);
txn.commit().await?;
Ok(FileStorageResult {
storage_tier: StorageTier::GoogleCloudStorage,
content_hash,
size: content.len(),
path: path.to_string(),
})
}
}
fn calculate_sha256(content: &[u8]) -> String {
let mut hasher = Sha256::new();
hasher.update(content);
format!("{:x}", hasher.finalize())
}
2. File Repository Implementation​
// src/db/repositories/file_repository.rs
use super::Repository;
use crate::models::{File, FileMetadata, StorageTier};
use crate::storage::StorageRouter;
pub struct FileRepository {
db: Database,
storage: StorageRouter,
}
impl FileRepository {
pub async fn create_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
) -> Result<File, RepositoryError> {
// Validate path
if !is_valid_path(path) {
return Err(RepositoryError::InvalidPath(path.to_string()));
}
// Check for path conflicts
if self.file_exists(workspace_id, path).await? {
return Err(RepositoryError::FileAlreadyExists(path.to_string()));
}
let metadata = FileMetadata {
created_at: Utc::now(),
updated_at: Utc::now(),
created_by: self.get_current_user_id()?,
size: content.len() as i64,
mime_type: detect_mime_type(path, content),
permissions: FilePermissions::default(),
attributes: HashMap::new(),
gcs_location: None, // Set by storage router if needed
};
// Store file (router decides FDB vs GCS)
let storage_result = self.storage
.store_file(workspace_id, path, content, metadata.clone())
.await?;
// Create file record
let file = File {
file_id: Uuid::new_v4(),
workspace_id,
path: path.to_string(),
storage_tier: storage_result.storage_tier,
content_hash: storage_result.content_hash,
metadata,
};
// Store file record
let txn = self.db.create_transaction()?;
let file_record_key = format!("{}/files/{}/record", workspace_id, path);
txn.set(&file_record_key, &serde_json::to_vec(&file)?);
txn.commit().await?;
Ok(file)
}
pub async fn read_file(
&self,
workspace_id: Uuid,
path: &str,
) -> Result<(File, Vec<u8>), RepositoryError> {
let file = self.get_file_record(workspace_id, path).await?;
let content = match file.storage_tier {
StorageTier::FoundationDB => {
// Read from FDB
let txn = self.db.create_transaction()?;
let file_key = format!("{}/files/{}/content", workspace_id, path);
match txn.get(&file_key, false).await? {
Some(content) => content,
None => return Err(RepositoryError::FileNotFound(path.to_string())),
}
}
StorageTier::GoogleCloudStorage => {
// Read from GCS
let gcs_location = file.metadata.gcs_location
.ok_or_else(|| RepositoryError::InvalidStorageState(
"GCS location missing".to_string()
))?;
self.storage.gcs_client
.object()
.download(&gcs_location.bucket, &gcs_location.object)
.await?
}
StorageTier::GitRepository => {
// Read from Git
self.read_from_git(workspace_id, path, &file.metadata).await?
}
};
// Verify content integrity
let actual_hash = calculate_sha256(&content);
if actual_hash != file.content_hash {
return Err(RepositoryError::IntegrityError(format!(
"Content hash mismatch for {}: expected {}, got {}",
path, file.content_hash, actual_hash
)));
}
Ok((file, content))
}
}
fn is_valid_path(path: &str) -> bool {
// Path validation rules
!path.is_empty()
&& !path.contains("..")
&& !path.starts_with('/')
&& path.chars().all(|c| c.is_ascii() && c != '\0')
}
fn detect_mime_type(path: &str, content: &[u8]) -> String {
// Use tree_magic for content-based detection
tree_magic::from_u8(content)
}
3. Git Integration​
// src/storage/git_integration.rs
use git2::{Repository as GitRepo, Signature, Oid, ObjectType};
use std::path::{Path, PathBuf};
pub struct GitIntegration {
workspace_root: PathBuf,
author_name: String,
author_email: String,
}
impl GitIntegration {
pub fn init_workspace_repo(&self, workspace_id: Uuid) -> Result<GitRepo, git2::Error> {
let repo_path = self.workspace_root.join(workspace_id.to_string());
// Initialize bare repository for storage efficiency
let repo = GitRepo::init_bare(&repo_path)?;
// Set up default configuration
let config = repo.config()?;
config.set_str("core.compression", "9")?; // Maximum compression
config.set_str("gc.auto", "256")?; // Auto GC threshold
Ok(repo)
}
pub async fn commit_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
message: &str,
) -> Result<String, GitError> {
let repo_path = self.workspace_root.join(workspace_id.to_string());
let repo = GitRepo::open(&repo_path)?;
// Create blob from content
let blob_oid = repo.blob(content)?;
// Get current tree
let head = repo.head()?;
let parent_commit = head.peel_to_commit()?;
let mut tree_builder = repo.treebuilder(Some(&parent_commit.tree()?))?;
// Add file to tree
tree_builder.insert(
path,
blob_oid,
0o100644, // Regular file permissions
)?;
let tree_oid = tree_builder.write()?;
let tree = repo.find_tree(tree_oid)?;
// Create commit
let signature = Signature::now(&self.author_name, &self.author_email)?;
let commit_oid = repo.commit(
Some("HEAD"),
&signature,
&signature,
message,
&tree,
&[&parent_commit],
)?;
Ok(commit_oid.to_string())
}
pub async fn read_file_at_commit(
&self,
workspace_id: Uuid,
path: &str,
commit_sha: &str,
) -> Result<Vec<u8>, GitError> {
let repo_path = self.workspace_root.join(workspace_id.to_string());
let repo = GitRepo::open(&repo_path)?;
// Parse commit SHA
let oid = Oid::from_str(commit_sha)?;
let commit = repo.find_commit(oid)?;
let tree = commit.tree()?;
// Find file in tree
let entry = tree.get_path(Path::new(path))?;
let object = entry.to_object(&repo)?;
match object.kind() {
Some(ObjectType::Blob) => {
let blob = object.as_blob()
.ok_or_else(|| GitError::InvalidObject)?;
Ok(blob.content().to_vec())
}
_ => Err(GitError::NotAFile(path.to_string())),
}
}
}
4. GCS Lifecycle Management​
# deploy/gcs-lifecycle.yaml
lifecycle:
rule:
- action:
type: SetStorageClass
storageClass: NEARLINE
condition:
age: 30 # Move to Nearline after 30 days
- action:
type: SetStorageClass
storageClass: COLDLINE
condition:
age: 90 # Move to Coldline after 90 days
- action:
type: Delete
condition:
age: 365 # Delete after 1 year
isLive: false # Only if not accessed
versioning:
enabled: true
encryption:
defaultKmsKeyName: projects/PROJECT_ID/locations/global/keyRings/coditect/cryptoKeys/storage
cors:
- origin: ["https://coditect.io"]
method: ["GET", "PUT", "POST", "DELETE"]
responseHeader: ["Content-Type", "Content-Length"]
maxAgeSeconds: 3600
5. Complete Test Suite​
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_10kb_threshold_exact() {
let router = create_test_router().await;
// Test exactly 10KB (should go to FDB)
let content_10kb = vec![0u8; 10_240];
let result = router.store_file(
Uuid::new_v4(),
"test.bin",
&content_10kb,
FileMetadata::default(),
).await.unwrap();
assert_eq!(result.storage_tier, StorageTier::FoundationDB);
assert_eq!(result.size, 10_240);
// Test 10KB + 1 byte (should go to GCS)
let content_10kb_plus = vec![0u8; 10_241];
let result = router.store_file(
Uuid::new_v4(),
"test2.bin",
&content_10kb_plus,
FileMetadata::default(),
).await.unwrap();
assert_eq!(result.storage_tier, StorageTier::GoogleCloudStorage);
assert_eq!(result.size, 10_241);
}
#[tokio::test]
async fn test_content_hash_verification() {
let repo = create_test_file_repository().await;
let workspace_id = Uuid::new_v4();
let content = b"Hello, World!";
// Create file
let file = repo.create_file(workspace_id, "hello.txt", content)
.await.unwrap();
// Read back and verify
let (read_file, read_content) = repo.read_file(workspace_id, "hello.txt")
.await.unwrap();
assert_eq!(read_content, content);
assert_eq!(file.content_hash, read_file.content_hash);
// Verify hash calculation
let expected_hash = "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f";
assert_eq!(file.content_hash, expected_hash);
}
#[tokio::test]
async fn test_git_integration() {
let git = GitIntegration::new_test();
let workspace_id = Uuid::new_v4();
// Initialize repository
git.init_workspace_repo(workspace_id).unwrap();
// Commit file
let commit_sha = git.commit_file(
workspace_id,
"src/main.rs",
b"fn main() { println!(\"Hello!\"); }",
"Initial commit",
).await.unwrap();
// Read file back
let content = git.read_file_at_commit(
workspace_id,
"src/main.rs",
&commit_sha,
).await.unwrap();
assert_eq!(content, b"fn main() { println!(\"Hello!\"); }");
}
}
Rationale​
-
10KB Threshold:
- FoundationDB value size recommendation
- Keeps FDB performant for metadata queries
- Reduces GCS API calls for small files
- Clear boundary for predictable behavior
-
Three-Tier Approach:
- FDB: Fast access for small files and all metadata
- GCS: Cost-effective for large files with lifecycle management
- Git: Version history and collaboration features
-
Content-Addressed Storage:
- SHA-256 hashing enables deduplication
- Integrity verification on every read
- Immutable content references
Status​
Current: Not implemented
Target: Complete three-tier storage system
Consequences​
-
Positive:
- Optimal cost: Small files in FDB, large in GCS
- Deduplication saves storage space
- Git integration enables version control
- Clear 10KB boundary simplifies decisions
-
Negative:
- Complexity of managing three storage tiers
- Network latency for GCS reads
- Git storage overhead for binaries
Security Considerations​
- Access Control: All file access through workspace permissions
- Encryption: At-rest encryption in both FDB and GCS
- Integrity: SHA-256 verification on all reads
- Isolation: workspace-based key prefixing
Performance Targets​
- FDB reads: <10ms for files ≤10KB
- GCS reads: <100ms for files >10KB
- Git commits: <50ms for text files
- Hash calculation: ~100MB/s