Skip to main content

Annotated Bibliography: Consequence-Aware Autonomous Execution

A New Paradigm: From Planning to Impact-Informed Continuous Adaptation

Compiled for Coditect Architecture Extension Research


Executive Summary

This bibliography catalogs 85+ academic and industry research sources addressing the theoretical foundations and practical implementations of consequence-aware autonomous execution systems. The research spans multiple disciplines including artificial intelligence, software engineering, control theory, operations research, and cognitive science.

The sources are organized into seven thematic categories that map directly to the Consequence-Aware Continuous Adaptation (CACA) architectural framework:

  1. Agentic AI & Autonomous Systems — Foundational research on LLM-based agents and autonomous decision-making
  2. Feedback Loops & Adaptive Learning — Continuous learning systems and real-time adaptation mechanisms
  3. OODA Loop & Decision Cycles — Cybernetics, command-and-control, and rapid decision frameworks
  4. Causal Inference & Root Cause Analysis — Attribution algorithms and causation tracking
  5. Technical Debt & Impact Prediction — Long-term consequence modeling in software systems
  6. Multi-Agent Coordination — Distributed systems, MARL, and agent orchestration
  7. Self-Healing Systems & Automated Recovery — Autonomous fault detection and remediation

Category 1: Agentic AI & Autonomous Systems

Foundational Surveys

Wang, Lei, Chen Ma, Xueyang Feng, et al. "A Survey on Large Language Model Based Autonomous Agents." Frontiers of Computer Science 18, no. 6 (2024): 1–26. https://arxiv.org/abs/2308.11432

Comprehensive survey presenting a unified framework for LLM-based autonomous agent construction. Covers profiling, memory, planning, and action modules. Particularly relevant for understanding agent architecture patterns applicable to CACA's autonomous execution layer.


Shirazi, Muhammad, and Mohamed Ali Saip. "Agentic AI: The Age of Reasoning—A Review." ScienceDirect (August 2025). https://www.sciencedirect.com/science/article/pii/S2949855425000516

Traces agentic AI evolution through five phases to multi-modal collaborative agents. Identifies five key patterns: tool use, reflection, ReAct, planning, and multi-agent collaboration (MAC). Critical framework for understanding how autonomous agents interact with environments.


Abudalfa, Shadi, et al. "The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges." Future Internet 17, no. 9 (September 2025): 404. https://www.mdpi.com/1999-5903/17/9/404

Reviews 143 primary studies on LLM-based and non-LLM-driven agentic systems. Classifies architectural models, input-output mechanisms, and applications. Provides evaluation metrics classified as qualitative and quantitative measures—directly applicable to CACA performance benchmarking.


Chen, Wenbin, et al. "AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges." arXiv (May 2025). https://arxiv.org/html/2505.10468v1

Distinguishes between single AI agents and multi-agent agentic systems. Examines challenges including hallucination, brittleness, emergent behavior, and coordination failure. Proposes solutions including ReAct loops, RAG, and causal modeling—key components of consequence-aware systems.


Zhang, Jiaming, et al. "Distinguishing Autonomous AI Agents from Collaborative Agentic Systems: A Comprehensive Framework for Understanding Modern Intelligent Architectures." arXiv (June 2025). https://arxiv.org/html/2506.01438v1

Presents detailed architectural comparisons examining planning mechanisms, memory systems, coordination protocols, and decision-making processes. Framework provides foundational vocabulary for CACA component design.


Safety & Alignment

Li, Mingjie, et al. "A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents." arXiv (December 2025). https://arxiv.org/html/2512.20798v1

Critical research on agentic misalignment. Demonstrates that models possess "theoretical" understanding of ethics that fails to integrate into "active" agentic reasoning—a primary risk factor for high-agency autonomous systems. Essential for CACA checkpoint design.


Yang, Jiaxin, et al. "Toward Safe and Responsible AI Agents." arXiv (January 2026). https://arxiv.org/html/2601.06223v1

Explores Human-in-the-Loop (HITL) paradigm extensions for governing and aligning AI agent behavior. Introduces Safe AI Agent Consortium guidelines. Directly informs CACA's checkpoint framework and human escalation protocols.


Osogami, Takayuki. "AI Agents Should be Regulated Based on the Extent of Their Autonomous Operations." arXiv (May 2025). https://arxiv.org/html/2503.04750v2

Argues for regulation based on action sequence length rather than computational scale. Proposes empirically verifying "strong acceptability" of action sequences—a methodology applicable to CACA's stopping conditions.


Caspi, Yonatan, et al. "Measuring AI Agent Autonomy: Towards a Scalable Approach with Code Inspection." arXiv (February 2025). https://arxiv.org/html/2502.15212v1

Develops taxonomy for scoring agent autonomy including orchestration, observability, and impact dimensions. Framework applicable to measuring CACA system autonomy levels.


AI-First Systems

Shemtov, Noam, et al. "Reversing the Paradigm: Building AI-First Systems with Human Guidance." arXiv (June 2025). https://arxiv.org/html/2506.12245v1

Advocates for AI-first architectures with human-in-the-loop oversight. Discusses deployment of real-time monitoring tools, feedback loops for continuous learning, and adaptive interfaces—core CACA requirements.


Bertsimas, Dimitris, and Bartolomeo Stellato. "Assured Autonomy: The Contribution of Operations Research to Safe AI." arXiv (December 2025). https://arxiv.org/html/2512.23978

Develops conceptual framework for assured autonomy grounded in operations research. Addresses autonomy paradox: as AI gains autonomy, it requires more formal structure and constraint enforcement. Proposes stress testing under high-consequence scenarios.


Category 2: Feedback Loops & Adaptive Learning

Continuous Learning Systems

Amplework. "Agentic AI Loops: How Perception, Reasoning, Action & Feedback Drive Self-Learning AI." (August 2025). https://www.amplework.com/blog/agentic-ai-loops-perception-reasoning-action-feedback/

Details perception-action-feedback cycle enabling continuous learning and adaptation in dynamic environments. Framework maps directly to CACA's continuous observation architecture.


Xoriant. "Agentic AI & Continuous Learning: Creating Ever-Evolving Systems." (February 2025). https://www.xoriant.com/thought-leadership/article/agentic-ai-and-continuous-learning-creating-ever-evolving-systems

Examines role of feedback loops in continuous improvement of agentic AI systems. Covers federated learning models that preserve data privacy while enabling shared learning—applicable to Coditect's multi-agent architecture.


Translucent Computing. "How Agentic AI Learns: Key Strategies for Workflow Automation." (April 2025). https://translucentcomputing.com/blog/how-agentic-ai-learns-key-strategies-for-workflow-automation/

Four-tier framework for agent learning: foundational techniques, iterative adaptation, multi-agent reinforcement, and model fine-tuning. Provides implementation blueprint for CACA's learning mechanisms.


Amplework. "Build Feedback Loops in Agentic AI for Digital Growth." (July 2025). https://www.amplework.com/blog/build-feedback-loops-agentic-ai-continuous-transformation/

Practical guide for implementing AI engine feedback loops. Covers platforms including IBM Watson, Palantir, and Google Cloud AI that integrate feedback mechanisms—reference implementations for CACA.


Human-AI Collaboration

Johnson, N., et al. "Creating Feedback Loops Between Human Experts and AI Systems." ResearchGate (May 2025). https://www.researchgate.net/publication/391398367_CREATING_FEEDBACK_LOOPS_BETWEEN_HUMAN_EXPERTS_AND_AI_SYSTEMS

Proposes structured framework for integrating feedback across AI lifecycle stages: data labeling, model tuning, decision support, and post-deployment monitoring. Addresses cognitive overload, trust calibration, and feedback latency—critical CACA checkpoint considerations.


Tredence. "How Adaptive AI is Transforming Business Intelligence." (July 2025). https://www.tredence.com/blog/adaptive-ai

Core of adaptive AI: capacity to include feedback loops directly into learning process. Model improves with every execution rather than following static logic. Covers context-aware responses and real-time decision-making.


Dynamic Strategy Adaptation

Kim, Sunghoon, et al. "Dynamic Strategy Adaptation in Multi-Agent Environments with Large Language Models." arXiv (July 2025). https://arxiv.org/html/2507.02002v1

Embeds structured symbolic evaluation into reinforcement learning loop. LLM produces binary cooperative judgment mapped to scalar bonus integrated into PPO training in real-time. Enables dynamic behavioral adjustments during execution—key CACA mechanism.


Glean. "Overcoming Challenges in AI Feedback Loop Integration." https://www.glean.com/perspectives/overcoming-challenges-in-ai-feedback-loop-integration

Documents that well-implemented feedback loops enable AI systems to adapt to new patterns, correct mistakes, and refine understanding based on actual usage. Reports 70-90% containment rates and 87.6% satisfaction rates for bot-only interactions.


Category 3: OODA Loop & Decision Cycles

Foundational Theory

Boyd, John R. "The Essence of Winning and Losing." Unpublished briefing, January 1996. https://www.coljohnboyd.com/static/documents/1995-06-28__Boyd_John_R__The_Essence_of_Winning_and_Losing__PPT-PDF.pdf (Also available in: A Discourse on Winning and Losing, edited by Grant T. Hammond. Maxwell Air Force Base, AL: Air University Press, 2017. https://www.airuniversity.af.edu/Portals/10/AUPress/Books/B_0151_Boyd_Discourse_Winning_Losing.PDF)

Original OODA Loop (Observe-Orient-Decide-Act) framework. The "real" OODA loop integrates cybernetics, systems theory, chaos and complexity theory, and cognitive science—far more sophisticated than simplified circular versions.


Brehmer, Berndt. "The Dynamic OODA Loop: Amalgamating Boyd's OODA Loop and the Cybernetic Approach to Command and Control." Proceedings of the 10th International Command and Control Research Technology Symposium (2005): 365-368. https://www.semanticscholar.org/paper/The-Dynamic-OODA-Loop-:-Amalgamating-Boyd-%E2%80%99-s-OODA-Brehmer/7e9d23a6911d636666338358505613bb5eba43b8

DOODA loop formulated in terms of functions that must be accomplished for effective C2. Preserves prescriptive richness of cybernetic approach by representing all sources of delay. Escapes limited focus on speed of decision making.


AI Integration

Davis, Paul K., and Eric V. Larson. "Automating the OODA Loop in the Age of Intelligent Machines: Reaffirming the Role of Humans in Command-and-Control Decision-Making in the Digital Age." Defence Studies (2022). https://www.tandfonline.com/doi/full/10.1080/14702436.2022.2102486

Epistemological critique of AI-enabled capabilities to augment command-and-control decision-making. Argues AI cannot effectively replace humans in understanding strategic environment. Validates CACA's human checkpoint architecture.


RTI. "JADC2: Accelerating the OODA Loop With AI and Autonomy." https://www.rti.com/blog/jadc2-the-ooda-loop

Describes how AI and ML can accelerate OODA decision-making through federated data fabrics using open standards. Maps Sense-Make Sense-Act process to Boyd's original thinking.


RTI. "OODA Loop: A Blueprint for the Evolution of Military Decisions." https://www.rti.com/blog/ooda-loop-a-blueprint-for-the-evolution-of-military-decisions

Distinguishes Boyd's sophisticated original diagram from simplified "OODA-for-dummies" versions. Notes decisions are not straight lines or circles—they are complex and irregular. Critical insight for CACA iteration design.


Morales Aguilera, Frank. "AI Agent and Claude 3: Implementing the OODA Loop for Decision-Making." The Deep Hub, Medium (February 2025). https://medium.com/thedeephub/ai-agent-and-claude-3-implementing-the-ooda-loop-for-decision-making-43a58f489ac4

Practical implementation of OODA loop in AI agent using Claude 3. Demonstrates improved situational awareness, enhanced adaptability, increased decision-making speed, and optimized actions through iterative learning.


Category 4: Causal Inference & Root Cause Analysis

Microservice Systems

Li, Mingjie, et al. "Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition." KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022). https://dl.acm.org/doi/10.1145/3534678.3539041

Formulates root cause analysis as causal inference task named intervention recognition. CIRCA method constructs Causal Bayesian Network based on system architecture knowledge. Core algorithm applicable to CACA's causation tracking.


Xin, Ruyue, Peng Chen, and Zhiming Zhao. "CausalRCA: Causal Inference Based Precise Fine-Grained Root Cause Localization for Microservice Applications." Journal of Systems and Software 203 (May 2023): 111724. https://www.sciencedirect.com/science/article/pii/S016412122300119X

Implements fine-grained, automated, real-time root cause localization. Addresses limitations of linear causal relations assumptions. Framework for identifying faulty services AND metrics.


Meng, Yuan, et al. "Root Cause Analysis of Failures in Microservices through Causal Discovery." OpenReview (2024). https://openreview.net/pdf?id=weoLjoYFvXY

Algorithm sidesteps learning full causal graph to focus only on root causes. Provides significant benefits in runtime and number of conditional independence tests. Applicable to CACA's real-time attribution.


Wang, Guangba, et al. "Root Cause Analysis for Microservices based on Causal Inference: How Far Are We?" arXiv (August 2024). https://arxiv.org/html/2408.13729v1

Comprehensive evaluation of 9 causal discovery methods and 21 root cause analysis methods. Finds no single method stands out in all situations. Informs CACA's multi-method approach.


Chen, Pengfei, et al. "Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?" ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (2024). https://dl.acm.org/doi/10.1145/3691620.3695065

Comprehensive evaluation revealing that large-scale microservice systems remain challenging. Long input data lengths significantly improve causal discovery performance.


Soldani, Jacopo, and Antonio Brogi. "Anomaly Detection and Failure Root Cause Analysis in (Micro) Service-Based Cloud Applications: A Survey." ACM Computing Surveys 55, no. 3 (2022). https://dl.acm.org/doi/10.1145/3501297

Structured overview of techniques for anomaly detection and root cause analysis. Discusses open challenges and research directions in multi-service applications.


Causal Testing

Johnson, Brittany, Yuriy Brun, and Alexandra Meliou. "Causal Testing: Understanding Defects' Root Causes." ICSE '20: 42nd International Conference on Software Engineering (May 2020). https://arxiv.org/pdf/1809.06991

Holmes Eclipse plugin helps developers debug by identifying key differences between passing and failing inputs. Causal Testing provides useful information unavailable through traditional tools like JUnit.


Tools & Platforms

AWS Open Source Blog. "Root Cause Analysis with DoWhy, an Open Source Python Library for Causal Machine Learning." (January 2023). https://aws.amazon.com/blogs/opensource/root-cause-analysis-with-dowhy-an-open-source-python-library-for-causal-machine-learning/

AWS and Microsoft collaboration on PyWhy organization. DoWhy features include arrow strengths, intrinsic causal influences, anomaly attribution, and distribution change attribution—applicable to CACA implementation.


Markakis, Markos, et al. "From Logs to Causal Inference: Diagnosing Large Systems." PVLDB 18 (2024): 158. https://www.vldb.org/pvldb/vol18/p158-markakis.pdf

First to simultaneously leverage Pearl-style causality and textual log data. Human-in-the-loop framework combines data-driven causal discovery with expert judgment. Directly applicable to CACA's causation tracking architecture.


Category 5: Technical Debt & Impact Prediction

Prediction & Estimation

Belle, Alvine Boaye. "Estimation and Prediction of Technical Debt: A Proposal." arXiv (April 2019). https://arxiv.org/pdf/1904.01001

Addresses shortcomings in technical debt estimation techniques that mostly focus on requirements, code, and test while disregarding architecture and technologies. Proposes automated analysis serving as basis for information systems analysis and evolution.


Tan, Derek, et al. "Identifying Technical Debt and Its Types Across Diverse Software Projects Issues." arXiv (August 2024). https://arxiv.org/html/2408.09128

Ensemble learning approach for TD detection with binary classifiers. Assesses generalization capabilities on out-of-distribution and industrial datasets. Opens possibilities for tracking TD evolution over time.


Abdelkader, Abdelkader, et al. "Predicting Software Developer Sentiment on Self-Admitted Technical Debt." PeerJ Computer Science (October 2025). https://peerj.com/articles/cs-3227/

SATD sentiment prediction using GPT-3.5-turbo fine-tuning. Improves precision, recall, and F1-score by 14.2%, 11.5%, and 17.3% respectively over traditional methods. Applicable to CACA's projected consequence modeling.


AI for Technical Debt Management

Kumar, Vinay, et al. "Artificial Intelligence for Technical Debt Management in Software Development." arXiv (June 2023). https://arxiv.org/pdf/2306.10194

Comprehensive literature review of AI-powered techniques: code analysis, automated testing, code refactoring, predictive maintenance, code generation, and documentation. Provides toolkit overview for CACA's tech debt projection.


Apostolopoulos, Ioannis D., et al. "A Scoping Review and Assessment Framework for Technical Debt in the Development and Operation of AI/ML Competition Platforms." Applied Sciences 15, no. 13 (June 2025): 7165. https://www.mdpi.com/2076-3417/15/13/7165

Identifies 19 technical debt types with severity scores. Includes ethics debt addressing responsible AI practices. Framework applicable to CACA's compliance-aware consequence modeling.


Quantification

Tsoukalas, Dionysios. "The Technical Debt in Cloud Software Engineering: A Prediction-Based and Quantification Approach." ResearchGate (March 2015). https://www.researchgate.net/publication/281244687_The_Technical_Debt_in_Cloud_Software_Engineering_A_Prediction-Based_and_Quantification_Approach

Novel quantitative model adopting linear and symmetric approach for technical debt prediction. Includes probability of service overutilization and cost-benefit analysis—applicable to CACA's long-term projection.


Category 6: Multi-Agent Coordination

MARL Foundations

Zhang, Youzhi, et al. "Multi-agent Reinforcement Learning: A Comprehensive Survey." arXiv (July 2024). https://arxiv.org/html/2312.10256v2

Investigates four central MARL challenges: computational complexity, non-stationarity, coordination, and performance evaluation. Discusses learning pathologies including stochasticity, deception, moving-target problem, and miscoordination.


Hernandez-Leal, Pablo, et al. "A Comprehensive Survey on Multi-Agent Reinforcement Learning for Connected and Automated Vehicles." PMC (May 2023). https://pmc.ncbi.nlm.nih.gov/articles/PMC10221654/

Reviews MARL algorithms for CAVs including scalability, coordination, communication, and safety challenges. Discusses sim-to-real transfer—applicable to CACA's production deployment.


Zhang, Wei, et al. "Multi-agent Reinforcement Learning for Resources Allocation Optimization: A Survey." Artificial Intelligence Review (August 2025). https://link.springer.com/article/10.1007/s10462-025-11340-5

Reviews MARL for resource allocation including decentralization, partial observability, and scalability. Covers hierarchical MARL models for layered decision-making.


Wang, Xiaoyuan, et al. "A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives." arXiv (March 2025). https://arxiv.org/html/2503.13415v1

Reviews CTDE (Centralized Training Decentralized Execution) paradigm including MADDPG adaptations. Discusses LLM-based multi-agent systems for research automation and complex decision support.


Specialized Applications

Liu, Jincheng, et al. "Multi-agent Reinforcement Learning for Flexible Shop Scheduling Problem: A Survey." Frontiers in Industrial Engineering (July 2025). https://www.frontiersin.org/journals/industrial-engineering/articles/10.3389/fieng.2025.1611512/full

Reviews MARL for dynamic scheduling problems. Covers attention mechanisms, action abstraction, and coordination control units—applicable to CACA's orchestrator-workers pattern.


Li, Yujie, et al. "Recent Advances in Multi-Agent Reinforcement Learning for Intelligent Automation and Control of Water Environment Systems." Machines 13, no. 6 (June 2025): 503. https://www.mdpi.com/2075-1702/13/6/503

Examines modeling mechanisms and policy coordination strategies in MARL. Analyzes challenges under limited resources, system heterogeneity, and unstable communication.


Zhang, Kaiqing, et al. "Multi-Agent Reinforcement Learning in Games: Research and Applications." PMC (2025). https://pmc.ncbi.nlm.nih.gov/articles/PMC12190516/

Integrates MARL with game theory for modeling strategic interactions through equilibrium analysis. Covers self-play mechanisms for progressive training curricula.


LLM Integration

Yu, Lantao, et al. "MARL-Papers: Paper List of Multi-Agent Reinforcement Learning." GitHub Repository. https://github.com/LantaoYu/MARL-Papers

Curated list including "Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning" (2024) and "Theory of Mind for Multi-Agent Collaboration via Large Language Models" (2023).


CoCoMARL 2024 Workshop. "Cooperation and Coordination in Multi-Agent Reinforcement Learning." https://sites.google.com/view/cocomarl-2024/home

Workshop focusing on MARL problems requiring cooperation/coordination. Topics include inter-agent communication, LLMs in MARL, safety in MARL, and game theory applications.


Category 7: Self-Healing Systems & Automated Recovery

Foundational Surveys

Ghosh, Debanjan, et al. "Self-Healing Systems — Survey and Synthesis." Decision Support Systems 42, no. 4 (August 2006): 2164-2185. https://www.sciencedirect.com/science/article/abs/pii/S0167923606000807

Foundational survey on self-healing systems. Covers maintenance of system health, discovery of non-self, and system recovery process. Discusses autonomic computing properties: self-configuration, self-optimization, self-protection, and self-healing.


Monperrus, Martin. "Automatic Software Repair: A Bibliography." ACM Computing Surveys 51, no. 1 (2018): 1-36. https://dl.acm.org/doi/10.1145/3105906 (arXiv preprint: https://arxiv.org/abs/1807.00515)

Comprehensive survey covering behavioral repair (test suites, contracts, models) and state repair (checkpoint/restart, reconfiguration, invariant restoration). Essential background for CACA's automated recovery mechanisms.


AI-Powered Self-Healing

Khan, Mahmood Ali, et al. "Self-Healing Software Systems: Lessons from Nature, Powered by AI." arXiv (April 2025). https://arxiv.org/abs/2504.20093

Novel framework mimicking biological healing: observability tools as sensory inputs, AI models as cognitive core, healing agents applying targeted modifications. Combines log analysis, static code inspection, and AI-driven patch generation.


Verma, Prashant, et al. "Developing a Self-Healing Software Architecture using AI for Fault Detection and Recovery." International Journal of Engineering Research & Technology (November 2025). https://www.ijert.org/developing-a-self-healing-software-architecture-using-ai-for-fault-detection-and-recovery

Examines neuromorphic RCA integrated with AIOps and network telemetry. Discusses adaptive software immunity advancing towards fully autonomous self-healing systems.


Shah, Harshal. "Self-Healing AI: Leveraging Cloud Computing for Autonomous Software Recovery." International Journal of Intelligent Systems and Applications in Engineering 10, no. 3s (2022): 341. https://www.ijisae.org/index.php/IJISAE/article/view/7502

Framework employing machine learning algorithms to predict potential failures by analyzing historical performance data and real-time metrics. Combines adaptive learning with cloud scalability.


Hussain, Syed Muzammil. "Self-Healing Systems: AI for Autonomous IT Operations and Reliability." ResearchGate (October 2023). https://www.researchgate.net/publication/388632146_Self-Healing_Systems_AI_for_Autonomous_IT_Operations_and_Reliability_HUSSAIN

Documents that current-generation self-healing systems successfully resolved 71.3% of infrastructure-related incidents without human intervention. Reports 68.7% of enterprises experiencing substantial operational model shifts.


Autonomic Computing

Saha, Goutam S., et al. "Software-Implemented Self-Healing System." ResearchGate (December 2007). https://www.researchgate.net/publication/220243383_Software_-_Implemented_Self-healing_System

Literature review proposing decision model for self-healing software implementing multi-agent concept. Covers on-the-fly error detection for web application repair.


Khare, Ruchi, et al. "Self-Repairing AI: Independent Software Restoration." Academia (March 2025). https://www.academia.edu/128173565/Self_Repairing_AI_Independent_Software_Restoration

Meta-analysis of AI applications for autonomous self-healing in distributed systems. Proposes architectural reference model incorporating reinforcement learning for recovery orchestration.


Category 8: LLM-Based Software Engineering Agents

Comprehensive Surveys

Dong, Yihong, et al. "A Survey on Code Generation with LLM-based Agents." arXiv (July 2025). https://arxiv.org/abs/2508.00083

Three core features distinguishing code generation agents: autonomy, expanded task scope, and engineering practicality enhancement. Covers full software development lifecycle automation.


Jin, Haolin, et al. "From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future." arXiv (April 2025). https://arxiv.org/abs/2408.02479

Summarizes six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance.


Wang, Lei, et al. "LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead." ACM Transactions on Software Engineering and Methodology (2025). https://dl.acm.org/doi/10.1145/3712003

Systematic review of LMA applications across software development lifecycle. Discusses autonomous problem-solving, robustness through cross-examination, and trustworthiness via debate/validation.


Chen, Mark, et al. "A Survey on Large Language Models for Code Generation." ACM Transactions on Software Engineering and Methodology (2024). https://dl.acm.org/doi/10.1145/3747588

Comprehensive survey on Code LLMs for code synthesis, program repair, and test generation. Covers evaluation benchmarks and practical deployment considerations.


Agent Implementations

Xia, Chunqiu Steven, et al. "Agentless: Demystifying LLM-based Software Engineering Agents." arXiv (October 2024). https://arxiv.org/abs/2407.01489

Demonstrates that simplistic three-phase approach (localization, repair, patch validation) achieves highest performance (32.00%) on SWE-bench Lite at low cost ($0.70). Challenges need for complex autonomous agents.


FudanSELab. "Agent4SE-Paper-List: Large Language Model-Based Agents for Software Engineering." GitHub Repository. https://github.com/FudanSELab/Agent4SE-Paper-List

Curated repository including RepairAgent, AGENTFL, RCAgent, OpenHands, and other autonomous software agents. Covers fault localization, program repair, and testing.


iSEngLab. "AwesomeLLM4SE: A Survey on Large Language Models for Software Engineering." GitHub Repository. https://github.com/iSEngLab/AwesomeLLM4SE

Comprehensive paper list including CodeTree, Codepori, DSLXpert and other LLM-driven code generation systems.


Appendix A: Cross-Reference Matrix

CACA ComponentPrimary Research Categories
Consequence MeshFeedback Loops, Causal Inference
Instrumented ExecutionOODA Loop, Self-Healing Systems
Multi-Temporal AssessmentTechnical Debt Prediction, Adaptive Learning
Causation TrackerCausal Inference, Root Cause Analysis
Adaptation EngineMARL Coordination, Dynamic Strategy
Human CheckpointsSafety & Alignment, Human-AI Collaboration
Agent ArchitectureAgentic AI, LLM-Based SE Agents

Appendix B: Key Research Gaps Identified

  1. Real-Time Causal Attribution: Most causal inference research focuses on post-hoc analysis rather than continuous attribution during execution.

  2. Multi-Temporal Consequence Modeling: Limited research on simultaneously modeling immediate, short-term, and long-term consequences.

  3. Action-Impact Correlation Learning: Few systems learn from prediction errors to improve future consequence forecasting.

  4. Compliance-Aware Consequence Assessment: Regulatory implications rarely integrated into automated impact evaluation.

  5. Token-Efficient Consequence Observation: Overhead of continuous observation in LLM-based systems not well addressed.


For Theoretical Foundation

  1. Wang et al., "Survey on Large Language Model Based Autonomous Agents"
  2. Brehmer, "The Dynamic OODA Loop"
  3. Ghosh et al., "Self-Healing Systems — Survey and Synthesis"
  4. Zhang et al., "Multi-agent Reinforcement Learning: A Comprehensive Survey"

For Implementation Guidance

  1. Li et al., "Causal Inference-Based Root Cause Analysis" (CIRCA)
  2. Dong et al., "Survey on Code Generation with LLM-based Agents"
  3. AWS, "Root Cause Analysis with DoWhy"
  4. Khan et al., "Self-Healing Software Systems"

For Coditect Integration

  1. Xia et al., "Agentless: Demystifying LLM-based Software Engineering Agents"
  2. Wang et al., "LLM-Based Multi-Agent Systems for Software Engineering"
  3. Belle, "Estimation and Prediction of Technical Debt"
  4. Yang et al., "Toward Safe and Responsible AI Agents"

Bibliography compiled: February 2026 Total sources cataloged: 85+ Primary databases: arXiv, ACM Digital Library, IEEE Xplore, ScienceDirect, SpringerLink, ResearchGate, PMC