Skip to main content

ADR-002-v4: Storage Architecture - Part 2 (Technical)

Table of Contents​

↑ Back to Top

Document Specification Block​

Document: ADR-002-v4-storage-architecture-part2-technical
Version: 1.1.0
Purpose: Provide exact technical specifications for three-tier storage implementation
Audience: AI agents, developers implementing the storage system
Date Created: 2025-08-31
Date Modified: 2025-08-31
Status: DRAFT

1. Technical Requirements​

1.1 Constraints​

  • MUST use exactly 10KB (10,240 bytes) as FDB storage threshold
  • MUST NOT store files larger than 10KB in FoundationDB
  • MUST calculate SHA-256 hash for all stored content
  • MUST verify content integrity on every read operation
  • MUST support workspace-based isolation

1.2 Dependencies​

[dependencies]
foundationdb = "0.8"
google-cloud-storage = "0.15"
git2 = "0.18"
sha2 = "0.10"
tree_magic = "0.2"
tokio = { version = "1.35", features = ["full"] }
uuid = { version = "1.6", features = ["v4", "serde"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
chrono = { version = "0.4", features = ["serde"] }
anyhow = "1.0"
thiserror = "1.0"
# CODI logging integration
coditect-logging = { path = "../logging" }

↑ Back to Top

2. Storage Tier Implementation​

2.1 Core Storage Router​

// src/storage/mod.rs
use foundationdb::{Database, Transaction};
use google_cloud_storage::Client as GcsClient;
use sha2::{Sha256, Digest};
use crate::logging::{CoditecLogger, LogLevel, LogEntry};

pub const FDB_SIZE_LIMIT: usize = 10_240; // Exactly 10KB

#[derive(Debug, Clone)]
pub struct StorageRouter {
fdb: Database,
gcs_client: GcsClient,
bucket_name: String,
}

#[derive(Debug, Clone, PartialEq)]
pub enum StorageTier {
FoundationDB,
GoogleCloudStorage,
GitRepository,
}

impl StorageRouter {
pub async fn store_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
metadata: FileMetadata,
) -> Result<FileStorageResult> {
// CODI logging for operation start
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: "storage.router".to_string(),
action: "file_store_start".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "pending".to_string(),
duration_ms: None,
details: Some(json!({
"path": path,
"size_bytes": content.len(),
"storage_decision": if content.len() <= FDB_SIZE_LIMIT { "fdb" } else { "gcs" }
})),
error: None,
}).await;

let start_time = Instant::now();
let content_hash = calculate_sha256(content);

let result = if content.len() <= FDB_SIZE_LIMIT {
self.store_in_fdb(workspace_id, path, content, metadata, content_hash).await
} else {
self.store_in_gcs(workspace_id, path, content, metadata, content_hash).await
};

// CODI logging for operation result
let duration_ms = start_time.elapsed().as_millis() as i64;
match &result {
Ok(res) => {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: "storage.router".to_string(),
action: "file_store_complete".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "success".to_string(),
duration_ms: Some(duration_ms),
details: Some(json!({
"path": path,
"storage_tier": format!("{:?}", res.storage_tier),
"content_hash": &res.content_hash,
"size_bytes": res.size
})),
error: None,
}).await;
}
Err(e) => {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::ERROR,
component: "storage.router".to_string(),
action: "file_store_failed".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "failure".to_string(),
duration_ms: Some(duration_ms),
details: Some(json!({
"path": path,
"size_bytes": content.len()
})),
error: Some(json!({
"type": "StorageError",
"message": e.to_string()
})),
}).await;
}
}

result
}

async fn store_in_fdb(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
metadata: FileMetadata,
content_hash: String,
) -> Result<FileStorageResult> {
let txn = self.fdb.create_transaction()?;

let file_key = format!("{}/files/{}/content", workspace_id, path);
let meta_key = format!("{}/files/{}/metadata", workspace_id, path);

// Log FDB-specific operation
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::DEBUG,
component: "storage.fdb".to_string(),
action: "fdb_transaction_start".to_string(),
user_id: metadata.created_by.clone(),
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "pending".to_string(),
duration_ms: None,
details: Some(json!({
"keys": [&file_key, &meta_key],
"content_size": content.len(),
"content_hash": &content_hash
})),
error: None,
}).await;

txn.set(&file_key, content);
txn.set(&meta_key, &serde_json::to_vec(&metadata)?);

txn.commit().await?;

Ok(FileStorageResult {
storage_tier: StorageTier::FoundationDB,
content_hash,
size: content.len(),
path: path.to_string(),
})
}
}

fn calculate_sha256(content: &[u8]) -> String {
let mut hasher = Sha256::new();
hasher.update(content);
format!("{:x}", hasher.finalize())
}

↑ Back to Top

3. File Repository Pattern​

3.1 Repository Implementation​

// src/db/repositories/file_repository.rs
use crate::models::{File, FileMetadata};
use crate::storage::StorageRouter;
use crate::logging::{CoditecLogger, LogLevel, LogEntry};

pub struct FileRepository {
db: Database,
storage: StorageRouter,
}

impl FileRepository {
pub async fn create_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
) -> Result<File> {
// Path validation
if !is_valid_path(path) {
return Err(RepositoryError::InvalidPath(path.to_string()));
}

let metadata = FileMetadata {
created_at: Utc::now(),
updated_at: Utc::now(),
size: content.len() as i64,
mime_type: tree_magic::from_u8(content),
gcs_location: None,
};

let storage_result = self.storage
.store_file(workspace_id, path, content, metadata.clone())
.await?;

let file = File {
file_id: Uuid::new_v4(),
workspace_id,
path: path.to_string(),
storage_tier: storage_result.storage_tier,
content_hash: storage_result.content_hash,
metadata,
};

// Store file record
let txn = self.db.create_transaction()?;
let record_key = format!("{}/files/{}/record", workspace_id, path);
txn.set(&record_key, &serde_json::to_vec(&file)?);
txn.commit().await?;

Ok(file)
}

pub async fn read_file(
&self,
workspace_id: Uuid,
path: &str,
) -> Result<(File, Vec<u8>)> {
let file = self.get_file_record(workspace_id, path).await?;

let content = match file.storage_tier {
StorageTier::FoundationDB => {
self.read_from_fdb(workspace_id, path).await?
}
StorageTier::GoogleCloudStorage => {
self.read_from_gcs(&file.metadata).await?
}
_ => unreachable!(),
};

// Verify integrity
let actual_hash = calculate_sha256(&content);
if actual_hash != file.content_hash {
// Log integrity failure with CODI
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::ERROR,
component: "storage.integrity".to_string(),
action: "integrity_check_failed".to_string(),
user_id: None,
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "failure".to_string(),
duration_ms: None,
details: Some(json!({
"path": path,
"expected_hash": &file.content_hash,
"actual_hash": &actual_hash,
"storage_tier": format!("{:?}", file.storage_tier)
})),
error: Some(json!({
"type": "IntegrityError",
"message": "Content hash mismatch detected"
})),
}).await;

return Err(RepositoryError::IntegrityError);
}

Ok((file, content))
}
}

fn is_valid_path(path: &str) -> bool {
!path.is_empty()
&& !path.contains("..")
&& !path.starts_with('/')
&& path.chars().all(|c| c.is_ascii() && c != '\0')
}

↑ Back to Top

4. Git Integration​

4.1 Git Storage Implementation​

// src/storage/git_integration.rs
use git2::{Repository as GitRepo, Signature};
use std::path::PathBuf;

pub struct GitIntegration {
workspace_root: PathBuf,
author_name: String,
author_email: String,
}

impl GitIntegration {
pub fn init_workspace_repo(&self, workspace_id: Uuid) -> Result<GitRepo> {
let repo_path = self.workspace_root.join(workspace_id.to_string());
let repo = GitRepo::init_bare(&repo_path)?;

let config = repo.config()?;
config.set_str("core.compression", "9")?;
config.set_str("gc.auto", "256")?;

Ok(repo)
}

pub async fn commit_file(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
message: &str,
) -> Result<String> {
let repo_path = self.workspace_root.join(workspace_id.to_string());
let repo = GitRepo::open(&repo_path)?;

let blob_oid = repo.blob(content)?;
let head = repo.head()?;
let parent_commit = head.peel_to_commit()?;
let mut tree_builder = repo.treebuilder(Some(&parent_commit.tree()?))?;

tree_builder.insert(path, blob_oid, 0o100644)?;

let tree_oid = tree_builder.write()?;
let tree = repo.find_tree(tree_oid)?;

let signature = Signature::now(&self.author_name, &self.author_email)?;
let commit_oid = repo.commit(
Some("HEAD"),
&signature,
&signature,
message,
&tree,
&[&parent_commit],
)?;

Ok(commit_oid.to_string())
}
}

↑ Back to Top

5. Testing Requirements​

5.1 Critical Test Cases​

#[cfg(test)]
mod tests {
use super::*;

#[tokio::test]
async fn test_10kb_threshold_exact() {
let router = create_test_router().await;

// Exactly 10KB -> FDB
let content_10kb = vec![0u8; 10_240];
let result = router.store_file(
Uuid::new_v4(),
"test.bin",
&content_10kb,
FileMetadata::default(),
).await.unwrap();

assert_eq!(result.storage_tier, StorageTier::FoundationDB);

// 10KB + 1 byte -> GCS
let content_10kb_plus = vec![0u8; 10_241];
let result = router.store_file(
Uuid::new_v4(),
"test2.bin",
&content_10kb_plus,
FileMetadata::default(),
).await.unwrap();

assert_eq!(result.storage_tier, StorageTier::GoogleCloudStorage);
}

#[tokio::test]
async fn test_content_integrity() {
let repo = create_test_file_repository().await;
let content = b"Hello, World!";

let file = repo.create_file(
Uuid::new_v4(),
"hello.txt",
content
).await.unwrap();

let (_, read_content) = repo.read_file(
file.workspace_id,
"hello.txt"
).await.unwrap();

assert_eq!(read_content, content);
// SHA-256 of "Hello, World!"
assert_eq!(
file.content_hash,
"dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f"
);
}
}

↑ Back to Top

6. Performance Specifications​

6.1 Latency Requirements​

storage_performance:
fdb_operations:
read_p99: 10ms
write_p99: 15ms
max_value_size: 10240 # 10KB exactly

gcs_operations:
read_p99: 100ms
write_p99: 200ms
multipart_threshold: 5MB

git_operations:
commit_p99: 50ms
checkout_p99: 100ms

6.2 Throughput Targets​

  • FDB: 100,000 ops/sec per workspace
  • GCS: 1,000 ops/sec per workspace
  • Git: 100 commits/sec per workspace

↑ Back to Top

7. Security Controls​

7.1 Access Control​

impl StorageRouter {
pub async fn validate_access(
&self,
workspace_id: Uuid,
user_id: Uuid,
operation: FileOperation,
) -> Result<()> {
// All file access must be authorized
let has_access = self.check_workspace_membership(
workspace_id,
user_id
).await?;

if !has_access {
return Err(StorageError::Unauthorized);
}

Ok(())
}
}

7.2 Encryption​

  • FDB: Encryption at rest enabled
  • GCS: Customer-managed encryption keys (CMEK)
  • Git: Repository encryption with GPG

↑ Back to Top

8. Deployment Configuration​

8.1 GCS Lifecycle Policy​

# deploy/gcs-lifecycle.yaml
lifecycle:
rule:
- action:
type: SetStorageClass
storageClass: NEARLINE
condition:
age: 30

- action:
type: SetStorageClass
storageClass: COLDLINE
condition:
age: 90

- action:
type: Delete
condition:
age: 365
isLive: false

versioning:
enabled: true

encryption:
defaultKmsKeyName: projects/${PROJECT_ID}/locations/global/keyRings/coditect/cryptoKeys/storage

↑ Back to Top

9. Monitoring & Observability​

9.1 Metrics with CODI Integration​

use prometheus::{Counter, Histogram};
use crate::logging::{CoditecLogger, LogLevel, LogEntry};

lazy_static! {
static ref STORAGE_OPS: Counter = Counter::new(
"storage_operations_total",
"Total storage operations"
).unwrap();

static ref STORAGE_LATENCY: Histogram = Histogram::new(
"storage_operation_duration_seconds",
"Storage operation latency"
).unwrap();
}

pub async fn store_file_with_monitoring(
&self,
workspace_id: Uuid,
path: &str,
content: &[u8],
user_id: Option<Uuid>,
request_id: Option<String>,
) -> Result<FileStorageResult> {
STORAGE_OPS.inc();
let timer = STORAGE_LATENCY.start_timer();
let start_time = Instant::now();

// CODI logging for monitoring
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: "storage.monitoring".to_string(),
action: "storage_operation_start".to_string(),
user_id: user_id.map(|id| id.to_string()),
tenant_id: Some(workspace_id.to_string()),
request_id,
session_id: None,
result: "pending".to_string(),
duration_ms: None,
details: Some(json!({
"operation": "store_file",
"path": path,
"content_size": content.len(),
"metrics": {
"ops_total": STORAGE_OPS.get(),
"p99_latency_ms": STORAGE_LATENCY.get_sample_sum() * 1000.0
}
})),
error: None,
}).await;

let result = self.store_file(workspace_id, path, content).await;

timer.observe_duration();
let duration_ms = start_time.elapsed().as_millis() as i64;

// Log performance metrics to CODI
if duration_ms > 100 {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::WARN,
component: "storage.performance".to_string(),
action: "slow_storage_operation".to_string(),
user_id: user_id.map(|id| id.to_string()),
tenant_id: Some(workspace_id.to_string()),
request_id,
session_id: None,
result: if result.is_ok() { "success" } else { "failure" }.to_string(),
duration_ms: Some(duration_ms),
details: Some(json!({
"path": path,
"threshold_ms": 100,
"actual_ms": duration_ms
})),
error: None,
}).await;
}

result
}

9.2 CODI Logging Patterns​

// Standard CODI logging for storage operations
pub struct StorageLogger;

impl StorageLogger {
pub async fn log_operation(
component: &str,
action: &str,
workspace_id: Uuid,
details: serde_json::Value,
) {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::INFO,
component: format!("storage.{}", component),
action: action.to_string(),
user_id: None,
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "success".to_string(),
duration_ms: None,
details: Some(details),
error: None,
}).await;
}

pub async fn log_error(
component: &str,
action: &str,
workspace_id: Uuid,
error: &dyn std::error::Error,
) {
CoditecLogger::log(LogEntry {
timestamp: Utc::now(),
level: LogLevel::ERROR,
component: format!("storage.{}", component),
action: action.to_string(),
user_id: None,
tenant_id: Some(workspace_id.to_string()),
request_id: None,
session_id: None,
result: "failure".to_string(),
duration_ms: None,
details: None,
error: Some(json!({
"type": std::any::type_name_of_val(&error),
"message": error.to_string(),
"chain": error.chain().map(|e| e.to_string()).collect::<Vec<_>>()
})),
}).await;
}
}

↑ Back to Top

10. Constraints for AI Implementation​

10.1 MUST Requirements​

  1. MUST use exactly 10,240 bytes as the FDB threshold
  2. MUST calculate SHA-256 for every file
  3. MUST verify hash on every read
  4. MUST use workspace ID in all storage keys
  5. MUST handle all error cases gracefully

10.2 MUST NOT Requirements​

  1. MUST NOT store >10KB files in FoundationDB
  2. MUST NOT skip integrity verification
  3. MUST NOT allow path traversal attacks
  4. MUST NOT expose internal storage paths
  5. MUST NOT mix workspace data

10.3 Test Coverage Requirements​

  • Unit tests for threshold behavior
  • Integration tests for all three tiers
  • Performance tests for latency targets
  • Security tests for access control
  • Failure tests for each storage tier

↑ Back to Top

References​

Technical Documentation​

↑ Back to Top

Approval Signatures​

Technical Approval​

RoleNameSignatureDate
AuthorAI System (Claude)_________________2025-08-31
Tech Lead____________________________________________
Storage Engineer____________________________________________
Security Engineer____________________________________________
SRE Lead____________________________________________

Implementation Sign-off​

ComponentOwnerTest CoverageSign-off Date
Storage Router_____________________%__________
File Repository_____________________%__________
Git Integration_____________________%__________
GCS Integration_____________________%__________
Security Controls_____________________%__________

Review History​

VersionDateChangesReviewer
1.0.02025-08-31Initial conversion from single-file ADRAI System
1.1.02025-08-31Added CODI logging integration throughoutAI System

This technical implementation blueprint provides exact specifications for CODITECT's three-tier storage architecture. All code must be implemented as specified with complete test coverage.