CODITECT Standard: Factual Grounding Requirements

Standard-ID: STD-FACTUAL-001
Version: 1.0.0
Status: APPROVED
Effective-Date: 2025-12-19
Enforcement: MANDATORY
Scope: All CODITECT factual claims and evidence-based outputs
Owner: AZ1.AI INC
Review-Cycle: Quarterly
Parent-Standard: CODITECT-STANDARD-TRUST-AND-TRANSPARENCY.md
Related-Standards:
  - CODITECT-STANDARD-AMBIGUITY-HANDLING.md
  - CODITECT-STANDARD-LOGICAL-INFERENCE.md
Related-ADRs:
  - ADR-011-UNCERTAINTY-QUANTIFICATION-FRAMEWORK
  - ADR-012-MOE-ANALYSIS-FRAMEWORK
Research-Foundation:
  - docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md

Governing Principle

Every factual claim must be traceable to a source. No exceptions.

Ungrounded assertions erode trust. When a claim cannot be verified, it must be explicitly marked as inferred and the reasoning chain documented. Processing time is never an excuse for unverified claims.

1. Purpose and Scope

1.1 Purpose

This standard defines the requirements for grounding all factual claims in verifiable evidence, ensuring that:

Every factual assertion has a source or is explicitly marked as inferred
Sources are classified by reliability using a tiered system
Citation formats are consistent and machine-parseable
Claim verification is auditable through evidence trails
Ungrounded claims are prohibited in production outputs

1.2 Scope

This standard applies to:

All factual claims in CODITECT-generated content
Technical recommendations and best practices
Statistical assertions and benchmarks
Historical facts and attributions
Code examples claimed to follow standards or patterns
Tool and library version information

1.3 Out of Scope

Logical inferences (covered by CODITECT-STANDARD-LOGICAL-INFERENCE.md)
Subjective opinions clearly marked as such
Hypothetical scenarios explicitly identified
User-provided information (responsibility of user)

2. Source Classification System

2.1 Reliability Tiers

All sources must be classified into one of four reliability tiers:

Tier	Reliability	Description	Required Evidence
Tier 1	95-100%	Authoritative primary sources	Full citation required
Tier 2	85-94%	Reputable secondary sources	Citation + validation note
Tier 3	70-84%	Industry/community sources	Citation + reliability caveat
Inferred	<70%	No direct evidence	Full reasoning chain required

2.2 Tier 1: Authoritative Primary Sources (95-100%)

Characteristics:

Peer-reviewed academic publications
Official documentation from technology creators
Government/regulatory publications
Standards body specifications (ISO, IEEE, W3C, etc.)

Examples:

ACL, EMNLP, NeurIPS, ICLR conference papers
Official React documentation (react.dev)
OWASP Security Guidelines
RFC specifications (IETF)
ISO 27001 standards

Citation Format:

**Source:** [Title](URL)
**Venue:** [Journal/Conference/Organization]
**Date:** [Publication Date]
**Reliability:** Tier 1 (95%)
**Verification:** [How this was verified]

2.3 Tier 2: Reputable Secondary Sources (85-94%)

Characteristics:

Industry leader publications (Google, Microsoft, AWS blogs)
Well-established tech media (InfoQ, ThoughtWorks Radar)
Recognized expert blogs with track record
Corporate research publications

Examples:

Google AI Blog
AWS Architecture Blog
ThoughtWorks Technology Radar
Martin Fowler's blog
Meta Engineering Blog

Citation Format:

**Source:** [Title](URL)
**Author/Org:** [Author or Organization]
**Date:** [Publication Date]
**Reliability:** Tier 2 (88%)
**Validation Note:** [Why this source is trusted]

2.4 Tier 3: Industry/Community Sources (70-84%)

Characteristics:

Stack Overflow accepted answers with high votes
Popular Medium/Dev.to articles
GitHub README documentation
Community tutorials and guides

Examples:

Stack Overflow answers (>50 votes)
Dev.to tutorials
GitHub project READMEs
Hashnode articles

Citation Format:

**Source:** [Title](URL)
**Author:** [Author]
**Date:** [Publication Date]
**Reliability:** Tier 3 (75%)
**Caveat:** [Limitations of this source]
**Cross-Reference:** [Additional source if available]

2.5 Inferred: No Direct Evidence (<70%)

When Used:

No credible source found
Source is too dated to be reliable
Claim is derived through logical inference
Domain heuristics applied

Requirements:

Must use CODITECT-STANDARD-LOGICAL-INFERENCE.md format
Must explicitly state "INFERRED" in certainty marker
Must provide falsification criteria
Must document assumptions

Citation Format:

**Certainty:** [X%] (INFERRED)
**Basis:** Logical inference from available evidence
**Reasoning Chain:** [See Section X]
**Assumptions:** [List of assumptions]
**Falsification:** [What would disprove this]

3. Citation Requirements

3.1 Required Fields by Tier

Field	Tier 1	Tier 2	Tier 3	Inferred
URL	Required	Required	Required	N/A
Title	Required	Required	Required	N/A
Author/Org	Required	Required	Optional	N/A
Date	Required	Required	Required	N/A
Venue	Required	Optional	Optional	N/A
Reliability %	Required	Required	Required	Required
Verification	Required	Optional	Optional	N/A
Caveat	Optional	Optional	Required	N/A
Reasoning Chain	N/A	N/A	N/A	Required

3.2 Date Freshness Requirements

Domain	Maximum Age	Action if Exceeded
AI/ML Research	18 months	Flag as potentially outdated
Framework Versions	12 months	Verify current version
Security Vulnerabilities	6 months	Require current status check
API Documentation	6 months	Verify against live docs
Industry Best Practices	24 months	Note evolving landscape
Fundamental CS Concepts	No limit	N/A

3.3 Inline Citation Format

For concise inline citations within text:

According to Kuhn et al. (NeurIPS 2023, Tier 1), semantic entropy provides
a reliable uncertainty measure for free-form text generation.

3.4 Full Citation Block

For detailed evidence blocks:

## Evidence: [Claim Statement]

**Certainty:** 92% (HIGH)
**Source Type:** Tier 1 - Peer-reviewed research

### Source Details
- **Title:** Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in NLG
- **URL:** https://arxiv.org/abs/2302.09664
- **Authors:** Kuhn, L., Gal, Y., Farquhar, S.
- **Venue:** NeurIPS 2023
- **Date:** 2023-06-15

### Verification
- Paper accessed and reviewed: 2025-12-19
- Claims verified against Section 4.2
- Methodology reproducible per published code

### Relevance
Directly supports the claim that semantic clustering improves
uncertainty estimation over token-level entropy.

4. Verification Protocols

4.1 URL Verification

Before citing any URL:

Confirm URL is accessible (HTTP 200)
Verify content matches cited claim
Check for paywall/authentication requirements
Note if archived version used

For Inaccessible URLs:

**Source Status:** Archived
**Archive URL:** [Wayback Machine link]
**Original URL:** [Original link]
**Archive Date:** [When archived]

4.2 Version Verification

For framework/library claims:

**Library:** React
**Claimed Version:** 18.2.0
**Verification Date:** 2025-12-19
**Current Latest:** [Check npm registry]
**Version Status:** Current | Outdated | Deprecated

4.3 Cross-Verification

For Tier 2-3 claims, cross-verify when possible:

**Primary Source:** [Source A]
**Cross-Reference:** [Source B confirms same claim]
**Agreement Level:** Full | Partial | Contradictory

5. Prohibited Patterns

5.1 Never Do This

Anti-Pattern	Example	Why Wrong
Unattributed claims	"Studies show that..."	No specific study cited
Vague authority	"Experts recommend..."	No specific expert identified
Self-referential	"It is well known that..."	No evidence provided
Assumed consensus	"Everyone agrees that..."	No verification of consensus
Outdated references	"According to 2018 best practices..."	May be obsolete
Broken links	URL returns 404	Unverifiable

5.2 Examples of Violations

WRONG:

React is the best framework for building user interfaces because
it's the most popular and has the best performance.

Violations:

"best" - subjective without criteria
"most popular" - no citation
"best performance" - no benchmark cited

CORRECT:

React is widely adopted for building user interfaces. According to
the 2024 State of JS Survey (Tier 2, 87%), React was used by 82.1%
of respondents. Performance benchmarks from Krausest (2024, Tier 2)
show React 18 with concurrent features achieving render times of
X ms in the JS Framework Benchmark.

**Certainty:** 88% (HIGH)
**Limitations:** Surveys represent self-selected respondents;
benchmarks measure synthetic workloads.

6. Evidence Schemas

6.1 JSON Schema for Evidence

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["claim", "certainty", "evidence"],
  "properties": {
    "claim": {
      "type": "string",
      "description": "The factual claim being made"
    },
    "certainty": {
      "type": "object",
      "required": ["score", "level", "basis"],
      "properties": {
        "score": {"type": "number", "minimum": 0, "maximum": 100},
        "level": {"enum": ["HIGH", "MEDIUM", "LOW", "INFERRED"]},
        "basis": {"enum": ["evidence_backed", "cross_verified", "inferred"]}
      }
    },
    "evidence": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["url", "title", "tier", "reliability"],
        "properties": {
          "url": {"type": "string", "format": "uri"},
          "title": {"type": "string"},
          "tier": {"enum": [1, 2, 3]},
          "reliability": {"type": "number", "minimum": 70, "maximum": 100},
          "venue": {"type": "string"},
          "date": {"type": "string", "format": "date"},
          "verification_date": {"type": "string", "format": "date"},
          "summary": {"type": "string"}
        }
      }
    },
    "missing_information": {
      "type": "array",
      "items": {"type": "string"},
      "description": "Information that would increase certainty"
    }
  }
}

6.2 Markdown Template

## Claim: [Statement]

**Certainty:** [X%] ([LEVEL])
**Basis:** Evidence-backed | Cross-verified | Inferred

### Evidence

| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [Title](URL) | [1-3] | [X%] | [How it supports claim] |
| 2 | [Title](URL) | [1-3] | [X%] | [How it supports claim] |

### Gaps

- [Information that would increase certainty]
- [Additional verification needed]

### Verification Status

- [ ] URLs verified accessible
- [ ] Content confirms claim
- [ ] Date freshness checked
- [ ] Cross-references found (if Tier 2-3)

7. Compliance Scoring

7.1 Grading Criteria

Grade	Score	Criteria
A	95-100%	All claims cited, all Tier 1-2, verification complete
B	85-94%	All claims cited, mostly Tier 1-2, minor gaps
C	70-84%	Most claims cited, some Tier 3, some gaps
D	60-69%	Many claims uncited, overreliance on Tier 3
F	<60%	Multiple ungrounded claims

7.2 Minimum Requirements

Production Outputs:

Grade B (85%) minimum
No more than 20% Tier 3 sources
All inferred claims have reasoning chains
All URLs verified within 30 days

8. Verification Checklist

8.1 Before Claiming

Identified the specific claim being made
Located credible source(s)
Classified source by reliability tier
Verified URL accessibility
Confirmed content supports claim
Checked date freshness

8.2 During Documentation

8.3 Before Delivery

All claims have sources or marked INFERRED
No prohibited patterns present
Cross-verification where applicable
Grade meets minimum (B or higher)
Evidence schema validates

9. Examples

9.1 Grade A Citation (Exemplary)

## Claim: Semantic entropy outperforms verbalized confidence for hallucination detection

**Certainty:** 95% (HIGH)
**Basis:** Evidence-backed with cross-verification

### Evidence

| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [Semantic Uncertainty](https://arxiv.org/abs/2302.09664) | 1 | 95% | Reports 0.74 AUROC vs 0.67 for verbalized confidence |
| 2 | [Detecting Hallucinations](https://arxiv.org/abs/2310.03368) | 1 | 93% | Confirms semantic clustering superiority in multi-domain evaluation |

### Verification

- URLs verified: 2025-12-19
- Papers reviewed in full
- Methodology confirmed reproducible
- No contradicting studies found in literature search

### Limitations

- Results specific to English language models
- Tested on GPT-3.5/4 and Llama 2 only
- May not generalize to domain-specific models

9.2 Grade C Citation (Needs Improvement)

## Claim: Most developers prefer React

**Certainty:** 72% (MEDIUM)
**Basis:** Evidence-backed (Tier 3)

### Evidence

| # | Source | Tier | Reliability | Summary |
|---|--------|------|-------------|---------|
| 1 | [State of JS 2024](https://stateofjs.com) | 3 | 75% | Self-reported usage: 82% |

### Issues Identified

- Single source only
- Self-selected survey respondents
- "Prefer" not directly measured (usage ≠ preference)
- No cross-verification

### Improvement Actions

- Find additional survey sources
- Clarify "preference" vs "usage"
- Add caveat about survey methodology

Standard	Relationship
CODITECT-STANDARD-TRUST-AND-TRANSPARENCY.md	Parent principle this standard implements
CODITECT-STANDARD-AMBIGUITY-HANDLING.md	When sources are ambiguous
CODITECT-STANDARD-LOGICAL-INFERENCE.md	When inference replaces direct evidence

11. Research Foundation

This standard is grounded in peer-reviewed research:

Research	Venue	Contribution
Chain-of-Verification	ACL 2024	Multi-step evidence validation
FactScore	ACL 2023	Fine-grained fact verification
HHEM	Vectara 2024	Hallucination detection metrics
MiniCheck	ACL 2024	Efficient claim verification

Full citations: See docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md

Document Version: 1.0.0 Last Updated: 2025-12-19 Author: CODITECT Standards Team Enforcement: MANDATORY for all factual claims Review Date: 2026-03-19

Governing Principle​

1. Purpose and Scope​

1.1 Purpose​

1.2 Scope​

1.3 Out of Scope​

2. Source Classification System​

2.1 Reliability Tiers​

2.2 Tier 1: Authoritative Primary Sources (95-100%)​

2.3 Tier 2: Reputable Secondary Sources (85-94%)​

2.4 Tier 3: Industry/Community Sources (70-84%)​

2.5 Inferred: No Direct Evidence (<70%)​

3. Citation Requirements​

3.1 Required Fields by Tier​

3.2 Date Freshness Requirements​

3.3 Inline Citation Format​

3.4 Full Citation Block​

4. Verification Protocols​

4.1 URL Verification​

4.2 Version Verification​

4.3 Cross-Verification​

5. Prohibited Patterns​

5.1 Never Do This​

5.2 Examples of Violations​

6. Evidence Schemas​

6.1 JSON Schema for Evidence​

6.2 Markdown Template​

7. Compliance Scoring​

7.1 Grading Criteria​

7.2 Minimum Requirements​

8. Verification Checklist​

8.1 Before Claiming​

8.2 During Documentation​

8.3 Before Delivery​

9. Examples​

9.1 Grade A Citation (Exemplary)​

9.2 Grade C Citation (Needs Improvement)​

10. Related Standards​

11. Research Foundation​

Governing Principle

1. Purpose and Scope

1.1 Purpose

1.2 Scope

1.3 Out of Scope

2. Source Classification System

2.1 Reliability Tiers

2.2 Tier 1: Authoritative Primary Sources (95-100%)

2.3 Tier 2: Reputable Secondary Sources (85-94%)

2.4 Tier 3: Industry/Community Sources (70-84%)

2.5 Inferred: No Direct Evidence (<70%)

3. Citation Requirements

3.1 Required Fields by Tier

3.2 Date Freshness Requirements

3.3 Inline Citation Format

3.4 Full Citation Block

4. Verification Protocols

4.1 URL Verification

4.2 Version Verification

4.3 Cross-Verification

5. Prohibited Patterns

5.1 Never Do This

5.2 Examples of Violations

6. Evidence Schemas

6.1 JSON Schema for Evidence

6.2 Markdown Template

7. Compliance Scoring

7.1 Grading Criteria

7.2 Minimum Requirements

8. Verification Checklist

8.1 Before Claiming

8.2 During Documentation

8.3 Before Delivery

9. Examples

9.1 Grade A Citation (Exemplary)

9.2 Grade C Citation (Needs Improvement)

10. Related Standards

11. Research Foundation