ADR-002-v4: Storage Architecture - Part 2 (Technical)
Table of Contents​
- Document Specification Block
- 1. Technical Requirements
- 2. Storage Tier Implementation
- 3. File Repository Pattern
- 4. Git Integration
- 5. Testing Requirements
- 6. Performance Specifications
- 7. Security Controls
- 8. Deployment Configuration
- 9. Monitoring & Observability
- 10. Constraints for AI Implementation
- References
- Approval Signatures
Document Specification Block​
Document: ADR-002-v4-storage-architecture-part2-technical
Version: 1.1.0
Purpose: Provide exact technical specifications for three-tier storage implementation
Audience: AI agents, developers implementing the storage system
Date Created: 2025-08-31
Date Modified: 2025-08-31
Status: DRAFT
1. Technical Requirements​
1.1 Constraints​
- MUST use exactly 10KB (10,240 bytes) as FDB storage threshold
- MUST NOT store files larger than 10KB in FoundationDB
- MUST calculate SHA-256 hash for all stored content
- MUST verify content integrity on every read operation
- MUST support workspace-based isolation
1.2 Dependencies​
[dependencies]
foundationdb = "0.8"
google-cloud-storage = "0.15"
git2 = "0.18"
sha2 = "0.10"
tree_magic = "0.2"
tokio = { version = "1.35", features = ["full"] }
uuid = { version = "1.6", features = ["v4", "serde"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
chrono = { version = "0.4", features = ["serde"] }
anyhow = "1.0"
thiserror = "1.0"
# CODI logging integration
coditect-logging = { path = "../logging" }
2. Storage Tier Implementation​
2.1 Core Storage Router​
// src/storage/mod.rs
use foundationdb::{Database, Transaction};
use google_cloud_storage::Client as GcsClient;
use sha2::{Sha256, Digest};
use crate::logging::{CoditecLogger, LogLevel, LogEntry};
pub const FDB_SIZE_LIMIT: usize = 10_240; // Exactly 10KB
#[derive(Debug, Clone)]
pub struct StorageRouter {
fdb: Database,
gcs_client: GcsClient,
bucket_name: String,
}
#[derive(Debug, Clone, PartialEq)]
pub enum StorageTier {
FoundationDB,
GoogleCloudStorage,
GitRepository,
}
impl StorageRouter {
pub async fn store_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
metadata: FileMetadata,
) -> Result<FileStorageResult> {
// CODI logging for operation start
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: "storage.router".to_string(),
action: "file_store_start".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "pending".to_string(),
duration_ms: None,
details: Some(json!({
"path": path,
"size_bytes": content.len(),
"storage_decision": if content.len() <= FDB_SIZE_LIMIT { "fdb" } else { "gcs" }
})),
error: None,
}).await;
let start_time = Instant::now();
let content_hash = calculate_sha256(content);
let result = if content.len() <= FDB_SIZE_LIMIT {
self.store_in_fdb(workspace_id, path, content, metadata, content_hash).await
} else {
self.store_in_gcs(workspace_id, path, content, metadata, content_hash).await
};
// CODI logging for operation result
let duration_ms = start_time.elapsed().as_millis() as i64;
match &result {
Ok(res) => {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: "storage.router".to_string(),
action: "file_store_complete".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "success".to_string(),
duration_ms: Some(duration_ms),
details: Some(json!({
"path": path,
"storage_tier": format!("{:?}", res.storage_tier),
"content_hash": &res.content_hash,
"size_bytes": res.size
})),
error: None,
}).await;
}
Err(e) => {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::ERROR,
component: "storage.router".to_string(),
action: "file_store_failed".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "failure".to_string(),
duration_ms: Some(duration_ms),
details: Some(json!({
"path": path,
"size_bytes": content.len()
})),
error: Some(json!({
"type": "StorageError",
"message": e.to_string()
})),
}).await;
}
}
result
}
async fn store_in_fdb(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
metadata: FileMetadata,
content_hash: String,
) -> Result<FileStorageResult> {
let txn = self.fdb.create_transaction()?;
let file_key = format!("{}/files/{}/content", workspace_id, path);
let meta_key = format!("{}/files/{}/metadata", workspace_id, path);
// Log FDB-specific operation
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::DEBUG,
component: "storage.fdb".to_string(),
action: "fdb_transaction_start".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "pending".to_string(),
duration_ms: None,
details: Some(json!({
"keys": [&file_key, &meta_key],
"content_size": content.len(),
"content_hash": &content_hash
})),
error: None,
}).await;
txn.set(&file_key, content);
txn.set(&meta_key, &serde_json::to_vec(&metadata)?);
txn.commit().await?;
Ok(FileStorageResult {
storage_tier: StorageTier::FoundationDB,
content_hash,
size: content.len(),
path: path.to_string(),
})
}
}
fn calculate_sha256(content: &[u8]) -> String {
let mut hasher = Sha256::new();
hasher.update(content);
format!("{:x}", hasher.finalize())
}
3. File Repository Pattern​
3.1 Repository Implementation​
// src/db/repositories/file_repository.rs
use crate::models::{File, FileMetadata};
use crate::storage::StorageRouter;
use crate::logging::{CoditecLogger, LogLevel, LogEntry};
pub struct FileRepository {
db: Database,
storage: StorageRouter,
}
impl FileRepository {
pub async fn create_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
) -> Result<File> {
// Path validation
if !is_valid_path(path) {
return Err(RepositoryError::InvalidPath(path.to_string()));
}
let metadata = FileMetadata {
created_at: Utc::now(),
updated_at: Utc::now(),
size: content.len() as i64,
mime_type: tree_magic::from_u8(content),
gcs_location: None,
};
let storage_result = self.storage
.store_file(workspace_id, path, content, metadata.clone())
.await?;
let file = File {
file_id: Uuid::new_v4(),
workspace_id,
path: path.to_string(),
storage_tier: storage_result.storage_tier,
content_hash: storage_result.content_hash,
metadata,
};
// Store file record
let txn = self.db.create_transaction()?;
let record_key = format!("{}/files/{}/record", workspace_id, path);
txn.set(&record_key, &serde_json::to_vec(&file)?);
txn.commit().await?;
Ok(file)
}
pub async fn read_file(
&self,
workspace_id: Uuid,
path: &str,
) -> Result<(File, Vec<u8>)> {
let file = self.get_file_record(workspace_id, path).await?;
let content = match file.storage_tier {
StorageTier::FoundationDB => {
self.read_from_fdb(workspace_id, path).await?
}
StorageTier::GoogleCloudStorage => {
self.read_from_gcs(&file.metadata).await?
}
_ => unreachable!(),
};
// Verify integrity
let actual_hash = calculate_sha256(&content);
if actual_hash != file.content_hash {
// Log integrity failure with CODI
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::ERROR,
component: "storage.integrity".to_string(),
action: "integrity_check_failed".to_string(),
user_id: None,
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "failure".to_string(),
duration_ms: None,
details: Some(json!({
"path": path,
"expected_hash": &file.content_hash,
"actual_hash": &actual_hash,
"storage_tier": format!("{:?}", file.storage_tier)
})),
error: Some(json!({
"type": "IntegrityError",
"message": "Content hash mismatch detected"
})),
}).await;
return Err(RepositoryError::IntegrityError);
}
Ok((file, content))
}
}
fn is_valid_path(path: &str) -> bool {
!path.is_empty()
&& !path.contains("..")
&& !path.starts_with('/')
&& path.chars().all(|c| c.is_ascii() && c != '\0')
}
4. Git Integration​
4.1 Git Storage Implementation​
// src/storage/git_integration.rs
use git2::{Repository as GitRepo, Signature};
use std::path::PathBuf;
pub struct GitIntegration {
workspace_root: PathBuf,
author_name: String,
author_email: String,
}
impl GitIntegration {
pub fn init_workspace_repo(&self, workspace_id: Uuid) -> Result<GitRepo> {
let repo_path = self.workspace_root.join(workspace_id.to_string());
let repo = GitRepo::init_bare(&repo_path)?;
let config = repo.config()?;
config.set_str("core.compression", "9")?;
config.set_str("gc.auto", "256")?;
Ok(repo)
}
pub async fn commit_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
message: &str,
) -> Result<String> {
let repo_path = self.workspace_root.join(workspace_id.to_string());
let repo = GitRepo::open(&repo_path)?;
let blob_oid = repo.blob(content)?;
let head = repo.head()?;
let parent_commit = head.peel_to_commit()?;
let mut tree_builder = repo.treebuilder(Some(&parent_commit.tree()?))?;
tree_builder.insert(path, blob_oid, 0o100644)?;
let tree_oid = tree_builder.write()?;
let tree = repo.find_tree(tree_oid)?;
let signature = Signature::now(&self.author_name, &self.author_email)?;
let commit_oid = repo.commit(
Some("HEAD"),
&signature,
&signature,
message,
&tree,
&[&parent_commit],
)?;
Ok(commit_oid.to_string())
}
}
5. Testing Requirements​
5.1 Critical Test Cases​
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_10kb_threshold_exact() {
let router = create_test_router().await;
// Exactly 10KB -> FDB
let content_10kb = vec![0u8; 10_240];
let result = router.store_file(
Uuid::new_v4(),
"test.bin",
&content_10kb,
FileMetadata::default(),
).await.unwrap();
assert_eq!(result.storage_tier, StorageTier::FoundationDB);
// 10KB + 1 byte -> GCS
let content_10kb_plus = vec![0u8; 10_241];
let result = router.store_file(
Uuid::new_v4(),
"test2.bin",
&content_10kb_plus,
FileMetadata::default(),
).await.unwrap();
assert_eq!(result.storage_tier, StorageTier::GoogleCloudStorage);
}
#[tokio::test]
async fn test_content_integrity() {
let repo = create_test_file_repository().await;
let content = b"Hello, World!";
let file = repo.create_file(
Uuid::new_v4(),
"hello.txt",
content
).await.unwrap();
let (_, read_content) = repo.read_file(
file.workspace_id,
"hello.txt"
).await.unwrap();
assert_eq!(read_content, content);
// SHA-256 of "Hello, World!"
assert_eq!(
file.content_hash,
"dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f"
);
}
}
6. Performance Specifications​
6.1 Latency Requirements​
storage_performance:
fdb_operations:
read_p99: 10ms
write_p99: 15ms
max_value_size: 10240 # 10KB exactly
gcs_operations:
read_p99: 100ms
write_p99: 200ms
multipart_threshold: 5MB
git_operations:
commit_p99: 50ms
checkout_p99: 100ms
6.2 Throughput Targets​
- FDB: 100,000 ops/sec per workspace
- GCS: 1,000 ops/sec per workspace
- Git: 100 commits/sec per workspace
7. Security Controls​
7.1 Access Control​
impl StorageRouter {
pub async fn validate_access(
&self,
workspace_id: Uuid,
user_id: Uuid,
operation: FileOperation,
) -> Result<()> {
// All file access must be authorized
let has_access = self.check_workspace_membership(
workspace_id,
user_id
).await?;
if !has_access {
return Err(StorageError::Unauthorized);
}
Ok(())
}
}
7.2 Encryption​
- FDB: Encryption at rest enabled
- GCS: Customer-managed encryption keys (CMEK)
- Git: Repository encryption with GPG
8. Deployment Configuration​
8.1 GCS Lifecycle Policy​
# deploy/gcs-lifecycle.yaml
lifecycle:
rule:
- action:
type: SetStorageClass
storageClass: NEARLINE
condition:
age: 30
- action:
type: SetStorageClass
storageClass: COLDLINE
condition:
age: 90
- action:
type: Delete
condition:
age: 365
isLive: false
versioning:
enabled: true
encryption:
defaultKmsKeyName: projects/${PROJECT_ID}/locations/global/keyRings/coditect/cryptoKeys/storage
9. Monitoring & Observability​
9.1 Metrics with CODI Integration​
use prometheus::{Counter, Histogram};
use crate::logging::{CoditecLogger, LogLevel, LogEntry};
lazy_static! {
static ref STORAGE_OPS: Counter = Counter::new(
"storage_operations_total",
"Total storage operations"
).unwrap();
static ref STORAGE_LATENCY: Histogram = Histogram::new(
"storage_operation_duration_seconds",
"Storage operation latency"
).unwrap();
}
pub async fn store_file_with_monitoring(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
user_id: Option<Uuid>,
request_id: Option<String>,
) -> Result<FileStorageResult> {
STORAGE_OPS.inc();
let timer = STORAGE_LATENCY.start_timer();
let start_time = Instant::now();
// CODI logging for monitoring
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: "storage.monitoring".to_string(),
action: "storage_operation_start".to_string(),
user_id: user_id.map(|id| id.to_string()),
tenant_id: Some(workspace_id.to_string()),
request_id,
session_id: None,
result: "pending".to_string(),
duration_ms: None,
details: Some(json!({
"operation": "store_file",
"path": path,
"content_size": content.len(),
"metrics": {
"ops_total": STORAGE_OPS.get(),
"p99_latency_ms": STORAGE_LATENCY.get_sample_sum() * 1000.0
}
})),
error: None,
}).await;
let result = self.store_file(workspace_id, path, content).await;
timer.observe_duration();
let duration_ms = start_time.elapsed().as_millis() as i64;
// Log performance metrics to CODI
if duration_ms > 100 {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::WARN,
component: "storage.performance".to_string(),
action: "slow_storage_operation".to_string(),
user_id: user_id.map(|id| id.to_string()),
tenant_id: Some(workspace_id.to_string()),
request_id,
session_id: None,
result: if result.is_ok() { "success" } else { "failure" }.to_string(),
duration_ms: Some(duration_ms),
details: Some(json!({
"path": path,
"threshold_ms": 100,
"actual_ms": duration_ms
})),
error: None,
}).await;
}
result
}
9.2 CODI Logging Patterns​
// Standard CODI logging for storage operations
pub struct StorageLogger;
impl StorageLogger {
pub async fn log_operation(
component: &str,
action: &str,
workspace_id: Uuid,
details: serde_json::Value,
) {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: format!("storage.{}", component),
action: action.to_string(),
user_id: None,
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "success".to_string(),
duration_ms: None,
details: Some(details),
error: None,
}).await;
}
pub async fn log_error(
component: &str,
action: &str,
workspace_id: Uuid,
error: &dyn std::error::Error,
) {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::ERROR,
component: format!("storage.{}", component),
action: action.to_string(),
user_id: None,
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "failure".to_string(),
duration_ms: None,
details: None,
error: Some(json!({
"type": std::any::type_name_of_val(&error),
"message": error.to_string(),
"chain": error.chain().map(|e| e.to_string()).collect::<Vec<_>>()
})),
}).await;
}
}
10. Constraints for AI Implementation​
10.1 MUST Requirements​
- MUST use exactly 10,240 bytes as the FDB threshold
- MUST calculate SHA-256 for every file
- MUST verify hash on every read
- MUST use workspace ID in all storage keys
- MUST handle all error cases gracefully
10.2 MUST NOT Requirements​
- MUST NOT store >10KB files in FoundationDB
- MUST NOT skip integrity verification
- MUST NOT allow path traversal attacks
- MUST NOT expose internal storage paths
- MUST NOT mix workspace data
10.3 Test Coverage Requirements​
- Unit tests for threshold behavior
- Integration tests for all three tiers
- Performance tests for latency targets
- Security tests for access control
- Failure tests for each storage tier
References​
Technical Documentation​
Related ADRs​
Approval Signatures​
Technical Approval​
| Role | Name | Signature | Date |
|---|---|---|---|
| Author | AI System (Claude) | _________________ | 2025-08-31 |
| Tech Lead | _________________ | _________________ | __________ |
| Storage Engineer | _________________ | _________________ | __________ |
| Security Engineer | _________________ | _________________ | __________ |
| SRE Lead | _________________ | _________________ | __________ |
Implementation Sign-off​
| Component | Owner | Test Coverage | Sign-off Date |
|---|---|---|---|
| Storage Router | _________________ | ____% | __________ |
| File Repository | _________________ | ____% | __________ |
| Git Integration | _________________ | ____% | __________ |
| GCS Integration | _________________ | ____% | __________ |
| Security Controls | _________________ | ____% | __________ |
Review History​
| Version | Date | Changes | Reviewer |
|---|---|---|---|
| 1.0.0 | 2025-08-31 | Initial conversion from single-file ADR | AI System |
| 1.1.0 | 2025-08-31 | Added CODI logging integration throughout | AI System |
This technical implementation blueprint provides exact specifications for CODITECT's three-tier storage architecture. All code must be implemented as specified with complete test coverage.