Skip to main content

CODITECT Standard: Factual Grounding Requirements

Standard-ID: STD-FACTUAL-001
Version: 1.0.0
Status: APPROVED
Effective-Date: 2025-12-19
Enforcement: MANDATORY
Scope: All CODITECT factual claims and evidence-based outputs
Owner: AZ1.AI INC
Review-Cycle: Quarterly
Parent-Standard: CODITECT-STANDARD-TRUST-AND-TRANSPARENCY.md
Related-Standards:
- CODITECT-STANDARD-AMBIGUITY-HANDLING.md
- CODITECT-STANDARD-LOGICAL-INFERENCE.md
Related-ADRs:
- ADR-011-UNCERTAINTY-QUANTIFICATION-FRAMEWORK
- ADR-012-MOE-ANALYSIS-FRAMEWORK
Research-Foundation:
- docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md

Governing Principle

Every factual claim must be traceable to a source. No exceptions.

Ungrounded assertions erode trust. When a claim cannot be verified, it must be explicitly marked as inferred and the reasoning chain documented. Processing time is never an excuse for unverified claims.


1. Purpose and Scope

1.1 Purpose

This standard defines the requirements for grounding all factual claims in verifiable evidence, ensuring that:

  1. Every factual assertion has a source or is explicitly marked as inferred
  2. Sources are classified by reliability using a tiered system
  3. Citation formats are consistent and machine-parseable
  4. Claim verification is auditable through evidence trails
  5. Ungrounded claims are prohibited in production outputs

1.2 Scope

This standard applies to:

  • All factual claims in CODITECT-generated content
  • Technical recommendations and best practices
  • Statistical assertions and benchmarks
  • Historical facts and attributions
  • Code examples claimed to follow standards or patterns
  • Tool and library version information

1.3 Out of Scope

  • Logical inferences (covered by CODITECT-STANDARD-LOGICAL-INFERENCE.md)
  • Subjective opinions clearly marked as such
  • Hypothetical scenarios explicitly identified
  • User-provided information (responsibility of user)

2. Source Classification System

2.1 Reliability Tiers

All sources must be classified into one of four reliability tiers:

TierReliabilityDescriptionRequired Evidence
Tier 195-100%Authoritative primary sourcesFull citation required
Tier 285-94%Reputable secondary sourcesCitation + validation note
Tier 370-84%Industry/community sourcesCitation + reliability caveat
Inferred<70%No direct evidenceFull reasoning chain required

2.2 Tier 1: Authoritative Primary Sources (95-100%)

Characteristics:

  • Peer-reviewed academic publications
  • Official documentation from technology creators
  • Government/regulatory publications
  • Standards body specifications (ISO, IEEE, W3C, etc.)

Examples:

  • ACL, EMNLP, NeurIPS, ICLR conference papers
  • Official React documentation (react.dev)
  • OWASP Security Guidelines
  • RFC specifications (IETF)
  • ISO 27001 standards

Citation Format:

**Source:** [Title](URL)
**Venue:** [Journal/Conference/Organization]
**Date:** [Publication Date]
**Reliability:** Tier 1 (95%)
**Verification:** [How this was verified]

2.3 Tier 2: Reputable Secondary Sources (85-94%)

Characteristics:

  • Industry leader publications (Google, Microsoft, AWS blogs)
  • Well-established tech media (InfoQ, ThoughtWorks Radar)
  • Recognized expert blogs with track record
  • Corporate research publications

Examples:

  • Google AI Blog
  • AWS Architecture Blog
  • ThoughtWorks Technology Radar
  • Martin Fowler's blog
  • Meta Engineering Blog

Citation Format:

**Source:** [Title](URL)
**Author/Org:** [Author or Organization]
**Date:** [Publication Date]
**Reliability:** Tier 2 (88%)
**Validation Note:** [Why this source is trusted]

2.4 Tier 3: Industry/Community Sources (70-84%)

Characteristics:

  • Stack Overflow accepted answers with high votes
  • Popular Medium/Dev.to articles
  • GitHub README documentation
  • Community tutorials and guides

Examples:

  • Stack Overflow answers (>50 votes)
  • Dev.to tutorials
  • GitHub project READMEs
  • Hashnode articles

Citation Format:

**Source:** [Title](URL)
**Author:** [Author]
**Date:** [Publication Date]
**Reliability:** Tier 3 (75%)
**Caveat:** [Limitations of this source]
**Cross-Reference:** [Additional source if available]

2.5 Inferred: No Direct Evidence (<70%)

When Used:

  • No credible source found
  • Source is too dated to be reliable
  • Claim is derived through logical inference
  • Domain heuristics applied

Requirements:

  • Must use CODITECT-STANDARD-LOGICAL-INFERENCE.md format
  • Must explicitly state "INFERRED" in certainty marker
  • Must provide falsification criteria
  • Must document assumptions

Citation Format:

**Certainty:** [X%] (INFERRED)
**Basis:** Logical inference from available evidence
**Reasoning Chain:** [See Section X]
**Assumptions:** [List of assumptions]
**Falsification:** [What would disprove this]

3. Citation Requirements

3.1 Required Fields by Tier

FieldTier 1Tier 2Tier 3Inferred
URLRequiredRequiredRequiredN/A
TitleRequiredRequiredRequiredN/A
Author/OrgRequiredRequiredOptionalN/A
DateRequiredRequiredRequiredN/A
VenueRequiredOptionalOptionalN/A
Reliability %RequiredRequiredRequiredRequired
VerificationRequiredOptionalOptionalN/A
CaveatOptionalOptionalRequiredN/A
Reasoning ChainN/AN/AN/ARequired

3.2 Date Freshness Requirements

DomainMaximum AgeAction if Exceeded
AI/ML Research18 monthsFlag as potentially outdated
Framework Versions12 monthsVerify current version
Security Vulnerabilities6 monthsRequire current status check
API Documentation6 monthsVerify against live docs
Industry Best Practices24 monthsNote evolving landscape
Fundamental CS ConceptsNo limitN/A

3.3 Inline Citation Format

For concise inline citations within text:

According to Kuhn et al. (NeurIPS 2023, Tier 1), semantic entropy provides
a reliable uncertainty measure for free-form text generation.

3.4 Full Citation Block

For detailed evidence blocks:

## Evidence: [Claim Statement]

**Certainty:** 92% (HIGH)
**Source Type:** Tier 1 - Peer-reviewed research

### Source Details
- **Title:** Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in NLG
- **URL:** https://arxiv.org/abs/2302.09664
- **Authors:** Kuhn, L., Gal, Y., Farquhar, S.
- **Venue:** NeurIPS 2023
- **Date:** 2023-06-15

### Verification
- Paper accessed and reviewed: 2025-12-19
- Claims verified against Section 4.2
- Methodology reproducible per published code

### Relevance
Directly supports the claim that semantic clustering improves
uncertainty estimation over token-level entropy.

4. Verification Protocols

4.1 URL Verification

Before citing any URL:

  1. Confirm URL is accessible (HTTP 200)
  2. Verify content matches cited claim
  3. Check for paywall/authentication requirements
  4. Note if archived version used

For Inaccessible URLs:

**Source Status:** Archived
**Archive URL:** [Wayback Machine link]
**Original URL:** [Original link]
**Archive Date:** [When archived]

4.2 Version Verification

For framework/library claims:

**Library:** React
**Claimed Version:** 18.2.0
**Verification Date:** 2025-12-19
**Current Latest:** [Check npm registry]
**Version Status:** Current | Outdated | Deprecated

4.3 Cross-Verification

For Tier 2-3 claims, cross-verify when possible:

**Primary Source:** [Source A]
**Cross-Reference:** [Source B confirms same claim]
**Agreement Level:** Full | Partial | Contradictory

5. Prohibited Patterns

5.1 Never Do This

Anti-PatternExampleWhy Wrong
Unattributed claims"Studies show that..."No specific study cited
Vague authority"Experts recommend..."No specific expert identified
Self-referential"It is well known that..."No evidence provided
Assumed consensus"Everyone agrees that..."No verification of consensus
Outdated references"According to 2018 best practices..."May be obsolete
Broken linksURL returns 404Unverifiable

5.2 Examples of Violations

WRONG:

React is the best framework for building user interfaces because
it's the most popular and has the best performance.

Violations:

  • "best" - subjective without criteria
  • "most popular" - no citation
  • "best performance" - no benchmark cited

CORRECT:

React is widely adopted for building user interfaces. According to
the 2024 State of JS Survey (Tier 2, 87%), React was used by 82.1%
of respondents. Performance benchmarks from Krausest (2024, Tier 2)
show React 18 with concurrent features achieving render times of
X ms in the JS Framework Benchmark.

**Certainty:** 88% (HIGH)
**Limitations:** Surveys represent self-selected respondents;
benchmarks measure synthetic workloads.

6. Evidence Schemas

6.1 JSON Schema for Evidence

{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["claim", "certainty", "evidence"],
"properties": {
"claim": {
"type": "string",
"description": "The factual claim being made"
},
"certainty": {
"type": "object",
"required": ["score", "level", "basis"],
"properties": {
"score": {"type": "number", "minimum": 0, "maximum": 100},
"level": {"enum": ["HIGH", "MEDIUM", "LOW", "INFERRED"]},
"basis": {"enum": ["evidence_backed", "cross_verified", "inferred"]}
}
},
"evidence": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["url", "title", "tier", "reliability"],
"properties": {
"url": {"type": "string", "format": "uri"},
"title": {"type": "string"},
"tier": {"enum": [1, 2, 3]},
"reliability": {"type": "number", "minimum": 70, "maximum": 100},
"venue": {"type": "string"},
"date": {"type": "string", "format": "date"},
"verification_date": {"type": "string", "format": "date"},
"summary": {"type": "string"}
}
}
},
"missing_information": {
"type": "array",
"items": {"type": "string"},
"description": "Information that would increase certainty"
}
}
}

6.2 Markdown Template

## Claim: [Statement]

**Certainty:** [X%] ([LEVEL])
**Basis:** Evidence-backed | Cross-verified | Inferred

### Evidence

| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [Title](URL) | [1-3] | [X%] | [How it supports claim] |
| 2 | [Title](URL) | [1-3] | [X%] | [How it supports claim] |

### Gaps

- [Information that would increase certainty]
- [Additional verification needed]

### Verification Status

- [ ] URLs verified accessible
- [ ] Content confirms claim
- [ ] Date freshness checked
- [ ] Cross-references found (if Tier 2-3)

7. Compliance Scoring

7.1 Grading Criteria

GradeScoreCriteria
A95-100%All claims cited, all Tier 1-2, verification complete
B85-94%All claims cited, mostly Tier 1-2, minor gaps
C70-84%Most claims cited, some Tier 3, some gaps
D60-69%Many claims uncited, overreliance on Tier 3
F<60%Multiple ungrounded claims

7.2 Minimum Requirements

Production Outputs:

  • Grade B (85%) minimum
  • No more than 20% Tier 3 sources
  • All inferred claims have reasoning chains
  • All URLs verified within 30 days

8. Verification Checklist

8.1 Before Claiming

  • Identified the specific claim being made
  • Located credible source(s)
  • Classified source by reliability tier
  • Verified URL accessibility
  • Confirmed content supports claim
  • Checked date freshness

8.2 During Documentation

  • Used correct citation format
  • Included all required fields
  • Added caveats for Tier 3
  • Documented any gaps
  • Calculated certainty score

8.3 Before Delivery

  • All claims have sources or marked INFERRED
  • No prohibited patterns present
  • Cross-verification where applicable
  • Grade meets minimum (B or higher)
  • Evidence schema validates

9. Examples

9.1 Grade A Citation (Exemplary)

## Claim: Semantic entropy outperforms verbalized confidence for hallucination detection

**Certainty:** 95% (HIGH)
**Basis:** Evidence-backed with cross-verification

### Evidence

| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [Semantic Uncertainty](https://arxiv.org/abs/2302.09664) | 1 | 95% | Reports 0.74 AUROC vs 0.67 for verbalized confidence |
| 2 | [Detecting Hallucinations](https://arxiv.org/abs/2310.03368) | 1 | 93% | Confirms semantic clustering superiority in multi-domain evaluation |

### Verification

- URLs verified: 2025-12-19
- Papers reviewed in full
- Methodology confirmed reproducible
- No contradicting studies found in literature search

### Limitations

- Results specific to English language models
- Tested on GPT-3.5/4 and Llama 2 only
- May not generalize to domain-specific models

9.2 Grade C Citation (Needs Improvement)

## Claim: Most developers prefer React

**Certainty:** 72% (MEDIUM)
**Basis:** Evidence-backed (Tier 3)

### Evidence

| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [State of JS 2024](https://stateofjs.com) | 3 | 75% | Self-reported usage: 82% |

### Issues Identified

- Single source only
- Self-selected survey respondents
- "Prefer" not directly measured (usage ≠ preference)
- No cross-verification

### Improvement Actions

- Find additional survey sources
- Clarify "preference" vs "usage"
- Add caveat about survey methodology

StandardRelationship
CODITECT-STANDARD-TRUST-AND-TRANSPARENCY.mdParent principle this standard implements
CODITECT-STANDARD-AMBIGUITY-HANDLING.mdWhen sources are ambiguous
CODITECT-STANDARD-LOGICAL-INFERENCE.mdWhen inference replaces direct evidence

11. Research Foundation

This standard is grounded in peer-reviewed research:

ResearchVenueContribution
Chain-of-VerificationACL 2024Multi-step evidence validation
FactScoreACL 2023Fine-grained fact verification
HHEMVectara 2024Hallucination detection metrics
MiniCheckACL 2024Efficient claim verification

Full citations: See docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md


Document Version: 1.0.0 Last Updated: 2025-12-19 Author: CODITECT Standards Team Enforcement: MANDATORY for all factual claims Review Date: 2026-03-19