CODITECT Standard: Factual Grounding Requirements
Standard-ID: STD-FACTUAL-001
Version: 1.0.0
Status: APPROVED
Effective-Date: 2025-12-19
Enforcement: MANDATORY
Scope: All CODITECT factual claims and evidence-based outputs
Owner: AZ1.AI INC
Review-Cycle: Quarterly
Parent-Standard: CODITECT-STANDARD-TRUST-AND-TRANSPARENCY.md
Related-Standards:
- CODITECT-STANDARD-AMBIGUITY-HANDLING.md
- CODITECT-STANDARD-LOGICAL-INFERENCE.md
Related-ADRs:
- ADR-011-UNCERTAINTY-QUANTIFICATION-FRAMEWORK
- ADR-012-MOE-ANALYSIS-FRAMEWORK
Research-Foundation:
- docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md
Governing Principle
Every factual claim must be traceable to a source. No exceptions.
Ungrounded assertions erode trust. When a claim cannot be verified, it must be explicitly marked as inferred and the reasoning chain documented. Processing time is never an excuse for unverified claims.
1. Purpose and Scope
1.1 Purpose
This standard defines the requirements for grounding all factual claims in verifiable evidence, ensuring that:
- Every factual assertion has a source or is explicitly marked as inferred
- Sources are classified by reliability using a tiered system
- Citation formats are consistent and machine-parseable
- Claim verification is auditable through evidence trails
- Ungrounded claims are prohibited in production outputs
1.2 Scope
This standard applies to:
- All factual claims in CODITECT-generated content
- Technical recommendations and best practices
- Statistical assertions and benchmarks
- Historical facts and attributions
- Code examples claimed to follow standards or patterns
- Tool and library version information
1.3 Out of Scope
- Logical inferences (covered by CODITECT-STANDARD-LOGICAL-INFERENCE.md)
- Subjective opinions clearly marked as such
- Hypothetical scenarios explicitly identified
- User-provided information (responsibility of user)
2. Source Classification System
2.1 Reliability Tiers
All sources must be classified into one of four reliability tiers:
| Tier | Reliability | Description | Required Evidence |
|---|---|---|---|
| Tier 1 | 95-100% | Authoritative primary sources | Full citation required |
| Tier 2 | 85-94% | Reputable secondary sources | Citation + validation note |
| Tier 3 | 70-84% | Industry/community sources | Citation + reliability caveat |
| Inferred | <70% | No direct evidence | Full reasoning chain required |
2.2 Tier 1: Authoritative Primary Sources (95-100%)
Characteristics:
- Peer-reviewed academic publications
- Official documentation from technology creators
- Government/regulatory publications
- Standards body specifications (ISO, IEEE, W3C, etc.)
Examples:
- ACL, EMNLP, NeurIPS, ICLR conference papers
- Official React documentation (react.dev)
- OWASP Security Guidelines
- RFC specifications (IETF)
- ISO 27001 standards
Citation Format:
**Source:** [Title](URL)
**Venue:** [Journal/Conference/Organization]
**Date:** [Publication Date]
**Reliability:** Tier 1 (95%)
**Verification:** [How this was verified]
2.3 Tier 2: Reputable Secondary Sources (85-94%)
Characteristics:
- Industry leader publications (Google, Microsoft, AWS blogs)
- Well-established tech media (InfoQ, ThoughtWorks Radar)
- Recognized expert blogs with track record
- Corporate research publications
Examples:
- Google AI Blog
- AWS Architecture Blog
- ThoughtWorks Technology Radar
- Martin Fowler's blog
- Meta Engineering Blog
Citation Format:
**Source:** [Title](URL)
**Author/Org:** [Author or Organization]
**Date:** [Publication Date]
**Reliability:** Tier 2 (88%)
**Validation Note:** [Why this source is trusted]
2.4 Tier 3: Industry/Community Sources (70-84%)
Characteristics:
- Stack Overflow accepted answers with high votes
- Popular Medium/Dev.to articles
- GitHub README documentation
- Community tutorials and guides
Examples:
- Stack Overflow answers (>50 votes)
- Dev.to tutorials
- GitHub project READMEs
- Hashnode articles
Citation Format:
**Source:** [Title](URL)
**Author:** [Author]
**Date:** [Publication Date]
**Reliability:** Tier 3 (75%)
**Caveat:** [Limitations of this source]
**Cross-Reference:** [Additional source if available]
2.5 Inferred: No Direct Evidence (<70%)
When Used:
- No credible source found
- Source is too dated to be reliable
- Claim is derived through logical inference
- Domain heuristics applied
Requirements:
- Must use CODITECT-STANDARD-LOGICAL-INFERENCE.md format
- Must explicitly state "INFERRED" in certainty marker
- Must provide falsification criteria
- Must document assumptions
Citation Format:
**Certainty:** [X%] (INFERRED)
**Basis:** Logical inference from available evidence
**Reasoning Chain:** [See Section X]
**Assumptions:** [List of assumptions]
**Falsification:** [What would disprove this]
3. Citation Requirements
3.1 Required Fields by Tier
| Field | Tier 1 | Tier 2 | Tier 3 | Inferred |
|---|---|---|---|---|
| URL | Required | Required | Required | N/A |
| Title | Required | Required | Required | N/A |
| Author/Org | Required | Required | Optional | N/A |
| Date | Required | Required | Required | N/A |
| Venue | Required | Optional | Optional | N/A |
| Reliability % | Required | Required | Required | Required |
| Verification | Required | Optional | Optional | N/A |
| Caveat | Optional | Optional | Required | N/A |
| Reasoning Chain | N/A | N/A | N/A | Required |
3.2 Date Freshness Requirements
| Domain | Maximum Age | Action if Exceeded |
|---|---|---|
| AI/ML Research | 18 months | Flag as potentially outdated |
| Framework Versions | 12 months | Verify current version |
| Security Vulnerabilities | 6 months | Require current status check |
| API Documentation | 6 months | Verify against live docs |
| Industry Best Practices | 24 months | Note evolving landscape |
| Fundamental CS Concepts | No limit | N/A |
3.3 Inline Citation Format
For concise inline citations within text:
According to Kuhn et al. (NeurIPS 2023, Tier 1), semantic entropy provides
a reliable uncertainty measure for free-form text generation.
3.4 Full Citation Block
For detailed evidence blocks:
## Evidence: [Claim Statement]
**Certainty:** 92% (HIGH)
**Source Type:** Tier 1 - Peer-reviewed research
### Source Details
- **Title:** Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in NLG
- **URL:** https://arxiv.org/abs/2302.09664
- **Authors:** Kuhn, L., Gal, Y., Farquhar, S.
- **Venue:** NeurIPS 2023
- **Date:** 2023-06-15
### Verification
- Paper accessed and reviewed: 2025-12-19
- Claims verified against Section 4.2
- Methodology reproducible per published code
### Relevance
Directly supports the claim that semantic clustering improves
uncertainty estimation over token-level entropy.
4. Verification Protocols
4.1 URL Verification
Before citing any URL:
- Confirm URL is accessible (HTTP 200)
- Verify content matches cited claim
- Check for paywall/authentication requirements
- Note if archived version used
For Inaccessible URLs:
**Source Status:** Archived
**Archive URL:** [Wayback Machine link]
**Original URL:** [Original link]
**Archive Date:** [When archived]
4.2 Version Verification
For framework/library claims:
**Library:** React
**Claimed Version:** 18.2.0
**Verification Date:** 2025-12-19
**Current Latest:** [Check npm registry]
**Version Status:** Current | Outdated | Deprecated
4.3 Cross-Verification
For Tier 2-3 claims, cross-verify when possible:
**Primary Source:** [Source A]
**Cross-Reference:** [Source B confirms same claim]
**Agreement Level:** Full | Partial | Contradictory
5. Prohibited Patterns
5.1 Never Do This
| Anti-Pattern | Example | Why Wrong |
|---|---|---|
| Unattributed claims | "Studies show that..." | No specific study cited |
| Vague authority | "Experts recommend..." | No specific expert identified |
| Self-referential | "It is well known that..." | No evidence provided |
| Assumed consensus | "Everyone agrees that..." | No verification of consensus |
| Outdated references | "According to 2018 best practices..." | May be obsolete |
| Broken links | URL returns 404 | Unverifiable |
5.2 Examples of Violations
WRONG:
React is the best framework for building user interfaces because
it's the most popular and has the best performance.
Violations:
- "best" - subjective without criteria
- "most popular" - no citation
- "best performance" - no benchmark cited
CORRECT:
React is widely adopted for building user interfaces. According to
the 2024 State of JS Survey (Tier 2, 87%), React was used by 82.1%
of respondents. Performance benchmarks from Krausest (2024, Tier 2)
show React 18 with concurrent features achieving render times of
X ms in the JS Framework Benchmark.
**Certainty:** 88% (HIGH)
**Limitations:** Surveys represent self-selected respondents;
benchmarks measure synthetic workloads.
6. Evidence Schemas
6.1 JSON Schema for Evidence
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["claim", "certainty", "evidence"],
"properties": {
"claim": {
"type": "string",
"description": "The factual claim being made"
},
"certainty": {
"type": "object",
"required": ["score", "level", "basis"],
"properties": {
"score": {"type": "number", "minimum": 0, "maximum": 100},
"level": {"enum": ["HIGH", "MEDIUM", "LOW", "INFERRED"]},
"basis": {"enum": ["evidence_backed", "cross_verified", "inferred"]}
}
},
"evidence": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["url", "title", "tier", "reliability"],
"properties": {
"url": {"type": "string", "format": "uri"},
"title": {"type": "string"},
"tier": {"enum": [1, 2, 3]},
"reliability": {"type": "number", "minimum": 70, "maximum": 100},
"venue": {"type": "string"},
"date": {"type": "string", "format": "date"},
"verification_date": {"type": "string", "format": "date"},
"summary": {"type": "string"}
}
}
},
"missing_information": {
"type": "array",
"items": {"type": "string"},
"description": "Information that would increase certainty"
}
}
}
6.2 Markdown Template
## Claim: [Statement]
**Certainty:** [X%] ([LEVEL])
**Basis:** Evidence-backed | Cross-verified | Inferred
### Evidence
| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [Title](URL) | [1-3] | [X%] | [How it supports claim] |
| 2 | [Title](URL) | [1-3] | [X%] | [How it supports claim] |
### Gaps
- [Information that would increase certainty]
- [Additional verification needed]
### Verification Status
- [ ] URLs verified accessible
- [ ] Content confirms claim
- [ ] Date freshness checked
- [ ] Cross-references found (if Tier 2-3)
7. Compliance Scoring
7.1 Grading Criteria
| Grade | Score | Criteria |
|---|---|---|
| A | 95-100% | All claims cited, all Tier 1-2, verification complete |
| B | 85-94% | All claims cited, mostly Tier 1-2, minor gaps |
| C | 70-84% | Most claims cited, some Tier 3, some gaps |
| D | 60-69% | Many claims uncited, overreliance on Tier 3 |
| F | <60% | Multiple ungrounded claims |
7.2 Minimum Requirements
Production Outputs:
- Grade B (85%) minimum
- No more than 20% Tier 3 sources
- All inferred claims have reasoning chains
- All URLs verified within 30 days
8. Verification Checklist
8.1 Before Claiming
- Identified the specific claim being made
- Located credible source(s)
- Classified source by reliability tier
- Verified URL accessibility
- Confirmed content supports claim
- Checked date freshness
8.2 During Documentation
- Used correct citation format
- Included all required fields
- Added caveats for Tier 3
- Documented any gaps
- Calculated certainty score
8.3 Before Delivery
- All claims have sources or marked INFERRED
- No prohibited patterns present
- Cross-verification where applicable
- Grade meets minimum (B or higher)
- Evidence schema validates
9. Examples
9.1 Grade A Citation (Exemplary)
## Claim: Semantic entropy outperforms verbalized confidence for hallucination detection
**Certainty:** 95% (HIGH)
**Basis:** Evidence-backed with cross-verification
### Evidence
| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [Semantic Uncertainty](https://arxiv.org/abs/2302.09664) | 1 | 95% | Reports 0.74 AUROC vs 0.67 for verbalized confidence |
| 2 | [Detecting Hallucinations](https://arxiv.org/abs/2310.03368) | 1 | 93% | Confirms semantic clustering superiority in multi-domain evaluation |
### Verification
- URLs verified: 2025-12-19
- Papers reviewed in full
- Methodology confirmed reproducible
- No contradicting studies found in literature search
### Limitations
- Results specific to English language models
- Tested on GPT-3.5/4 and Llama 2 only
- May not generalize to domain-specific models
9.2 Grade C Citation (Needs Improvement)
## Claim: Most developers prefer React
**Certainty:** 72% (MEDIUM)
**Basis:** Evidence-backed (Tier 3)
### Evidence
| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [State of JS 2024](https://stateofjs.com) | 3 | 75% | Self-reported usage: 82% |
### Issues Identified
- Single source only
- Self-selected survey respondents
- "Prefer" not directly measured (usage ≠ preference)
- No cross-verification
### Improvement Actions
- Find additional survey sources
- Clarify "preference" vs "usage"
- Add caveat about survey methodology
10. Related Standards
| Standard | Relationship |
|---|---|
| CODITECT-STANDARD-TRUST-AND-TRANSPARENCY.md | Parent principle this standard implements |
| CODITECT-STANDARD-AMBIGUITY-HANDLING.md | When sources are ambiguous |
| CODITECT-STANDARD-LOGICAL-INFERENCE.md | When inference replaces direct evidence |
11. Research Foundation
This standard is grounded in peer-reviewed research:
| Research | Venue | Contribution |
|---|---|---|
| Chain-of-Verification | ACL 2024 | Multi-step evidence validation |
| FactScore | ACL 2023 | Fine-grained fact verification |
| HHEM | Vectara 2024 | Hallucination detection metrics |
| MiniCheck | ACL 2024 | Efficient claim verification |
Full citations: See docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md
Document Version: 1.0.0 Last Updated: 2025-12-19 Author: CODITECT Standards Team Enforcement: MANDATORY for all factual claims Review Date: 2026-03-19