close
DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Image Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones Build AI Agents That Are Ready for Production
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
Build AI Agents That Are Ready for Production

New Trend Report "Security by Design": Learn AI-powered threat detection, SBOM adoption, and much more❗️

Join our live webinar on May 5 to learn how to build and run LLM inference on Kubernetes without the guesswork.

Team Management

Development team management involves a combination of technical leadership, project management, and the ability to grow and nurture a team. These skills have never been more important, especially with the rise of remote work both across industries and around the world. The ability to delegate decision-making is key to team engagement. Review our inventory of tutorials, interviews, and first-hand accounts of improving the team dynamic.

icon
Latest Premium Content
Trend Report
Developer Experience
Developer Experience
Refcard #216
Java Caching Essentials
Java Caching Essentials
Refcard #394
AI Automation Essentials
AI Automation Essentials

DZone's Featured Team Management Resources

Building a Video Evidence Layer: Moment Indexing With Timecoded Retrieval

Building a Video Evidence Layer: Moment Indexing With Timecoded Retrieval

By Punitha Ponnuraj
Video has become a default knowledge source in many organizations. Whether it is trainings, internal demos, walkthroughs, webinars, or support screen recordings, most of the times, video is the only place where a procedure was ever explained end-to-end. It's fine, until we need one step in the video again, not the whole video, just one step. Our requirement in that moment isn't a summary of the video; it is: 'Tell me what to do, and show me exactly where it happens. Most systems still treat video as a linear timeline, and timelines are fundamentally difficult to query. Even when you find the right section, it is hard to verify and share. Text search solved this for documents by making retrieval direct and citeable. Video is harder. Chapters and transcripts help with navigation, but they do not reliably answer the core question: given a query, locate the exact segment that supports the answer and cite it. This article describes a practical pattern for doing that: build a Video Evidence Layer that indexes a video as small, retrievable moments and returns answers with timecoded evidence. The Problem: The Transcript Gap Most Video RAG implementations treat recordings as long-form transcripts. That baseline fails for two reasons: transcripts don’t eliminate timeline scrubbing, and they miss visual-only knowledge (UI paths, error codes, configuration values). The bigger issue is grounding. Without an evidence layer, LLMs will sometimes invent timestamps, which breaks the verification loop. What Good Looks Like A useful system moves from conversational summaries to actionable evidence . When a user asks: “Where do they fix the missing Advanced Mode option?” , the response should be granular: "Enable Advanced Mode in Settings → Developer Options. Evidence: 07:18–07:26, If the option is missing, update firmware first. Evidence: 12:04–12:22" Every claim should point to a segment the user can open immediately. The Solution: The Moment Indexing Pattern To achieve this, we move from a linear file to a "tiled" vector index. We define a Moment as a discrete, retrievable unit of knowledge, mostly 20–90 seconds long, short enough to cite, long enough to carry context. Moments become the atomic unit for retrieval, citation, and verification. The Moment Schema A moment record is the control surface the system uses to cite evidence. It should contain: Time anchors: t_start and t_end (non-negotiable) Textual layer: Aligned transcript slice + OCR text from frames Visual layer: Factual frame captions and/or Visual embeddings Metadata: Short summary, Video ID, and ACL/Provenance tags This schema treats each Moment as a multimodal unit, not a transcript fragment. By combining aligned audio text with OCR and lightweight visual descriptors, retrieval can operate on what is shown as well as what is said, which is where transcript-only indexing typically fails. A moment record can be stored as JSON (time anchors + transcript + OCR + visual cues + ACL), but the exact fields are less important than enforcing time-anchored evidence. Two Rules for Reliability Rule 1: Timecodes are retrieved, not generated. The model may format citations, but time ranges must come from retrieved moment records. Rule 2: No claim without a cited moment. If retrieval does not return supporting evidence, the system must abstain (“Evidence not found”) rather than infer. Implementation Architecture The implementation follows a pipeline as below: At a high level, the pipeline looks like this: Extract signals (ASR + frames/OCR)Build and enrich moments (overlap + embeddings)Store (vector + metadata) and answer (retrieve + fuse + evidence-lock) Common Failure Modes and Fixes Two common issues can occur, even in a small pilot. Boundary Cuts Steps often span moment boundaries. Fixed, non-overlapping cuts can return partial evidence. Use a sliding window with overlap (e.g., 60s window with 20s overlap). At query time, fuse adjacent high-scoring moments into a single cited span (or cite both contiguous ranges). UI-Heavy / Visual Steps Transcript retrieval underperforms when the key information is on screen. Moments need visual signals: OCR for on-screen text (menu labels, error codes, values)Short factual frame captions for UI stateVisual embeddings when audio is sparse or vague This allows retrieval to work on what is shown, not only what is spoken. Extension: Video Library Retrieval Users can search across a library and expect the system to identify the right videos before locating the right moments. This can be handled as a two-stage retrieval flow: Pick candidate videos (metadata filters and/or aggregated video embeddings), then retrieve moments within them. Apply ACL filters before the model sees results. Production Realities: Cost and Quality Two things decide whether this could work in production: cost containment and evidence quality. Cost: Tier your enrichment. OCR for all content; reserve expensive visual captioning / vision embeddings for high-value, UI-heavy libraries. Quality: Noisy audio and overlapping speakers degrade ASR alignment which lowers recall even when the right moment exists. Non-negotiables: ACL enforced at retrieval time; evidence-locked citations (cite only retrieved time ranges). Conclusion Chapters and transcripts are useful when a user already has a direction. A Video Evidence Layer supports the opposite case: when a user has a question and needs the segment that supports the answer. By shifting from linear timelines to indexed moments , we transform a library of "black box" recordings into a granular, evidence-backed knowledge base. This approach ensures that technical video content is no longer just something to watch, it is something to query, verify, and share with absolute precision. More
Beyond the Black Box: Implementing “Human-in-the-Loop” (HITL) Agentic Workflows for Regulated Industries

Beyond the Black Box: Implementing “Human-in-the-Loop” (HITL) Agentic Workflows for Regulated Industries

By Rahul Kumar Thatikonda
The Technical Hook Autonomous agents exhibit failure patterns analogous to those in distributed systems: not through isolated catastrophic errors, but via a cascade of locally justifiable actions that collectively result in globally unsafe states. Prompt injection in AI systems parallels a forged remote procedure call (RPC) syntactically valid input that traverses multiple processing layers before inducing an unauthorized state transition. As illustrated in Figure 1, this architectural risk is mitigated by the "Commit Boundary," which prevents adversarial inputs from reaching sensitive executors by validating every intent against a deterministic schema. When extended with capabilities such as tool invocation and long-term planning, these agents manifest failure modes like confused deputy scenarios and privilege escalation, which are neutralized by the layered enforcement framework depicted in the diagram. Figure 1: Neutralizing agentic attack vectors The Architecture Pattern In the initial phases of agent development, teams typically prioritize "tool correctness" (ensuring the agent invokes the correct API) and "model correctness" (verifying the accuracy of generated text). However, in regulated domains, this prioritization is misaligned. The primary architectural consideration must be: where is the commit boundary established, and what deterministic controls govern state transitions across that boundary? A proven architectural pattern for high-compliance sectors such as financial services, the Defense Industrial Base (DIB), and industrial control systems follows this sequence: Agent → Policy Gate → Human Review → Executor As illustrated in the high-level sequence (Agent → Policy Gate → Human Review → Executor), the architecture depicted in Figure 2 operationalizes the 'Commit Boundary' design pattern. This pattern establishes a structural separation between probabilistic agent decision-making and deterministic system operations through a layered policy enforcement framework. By integrating human-in-the-loop (HITL) oversight as a validation gate before the final executor, the system ensures that every state transition in a regulated environment is bounded, traceable, and attributable. Figure 2: The commit boundary architecture pattern The "Commit Boundary" Design Pattern The commit boundary demarcates the transition from advisory output to executable action. Within agentic workflows, direct modification of production state by the agent must be prohibited. Instead, the agent generates a structured action request subject to the following deterministic stages: Typed and validated against a fixed schema to ensure syntactic and semantic integrity, Scored and classified according to predefined risk tiers using rule-based or statistical models, Submitted for human evaluation when risk thresholds exceed defined limits, Processed exclusively by an execution service operating under least-privilege principles and supporting idempotent operations, Persisted in an immutable log to maintain a verifiable audit trail resilient to model iteration or retraining. This approach does not hinder AI deployment; rather, it applies established engineering rigor—commonly enforced in database schema changes, payment processing, or privileged system modifications to agentic systems. It mandates formalization of intent, systematic evaluation, conditional approval, and controlled state mutation, ensuring compliance, traceability, and operational safety. Implementation Step 1: Typed Action Schemas as Governance Gates Without a deterministic specification of an agent’s intended state transition, auditability is unattainable. Unstructured natural language does not constitute a verifiable audit record, but it introduces ambiguity and risk. The foundational improvement lies in routing all state-modifying operations through rigorously defined, typed schemas, establishing the schema as the boundary layer between probabilistic decision-making and deterministic system enforcement. Presented below is a streamlined Pydantic model for a TypedActionRequest. This design is intentionally prescriptive: it decouples agent intent from system execution, ensuring that state transitions, particularly those involving Controlled Unclassified Information (CUI) or sensitive financial data, proceed after passing validation checks. By embedding policy logic directly into the schema, we provide auditors and incident responders with a verifiable record of causal provenance: identifying exactly what was triggered, the justification provided, and the evidence used to authorize the action. Python from pydantic import BaseModel, Field, HttpUrl, model_validator from enum import Enum from typing import Any, Dict, List class DataSensitivity(str, Enum): PUBLIC = "public" INTERNAL = "internal" CUI = "cui" # Controlled Unclassified Information (NIST 800-171) PII = "pii" class ActionType(str, Enum): WRITE = "write" READ = "read" DELETE = "delete" ACCESS_GRANT = "access_grant" TRANSFER = "transfer" CONFIG_CHANGE = "config_change" class TypedActionRequest(BaseModel): """ The formal 'intent' packet that crosses the commit boundary. Separates probabilistic agent 'thought' from deterministic execution. """ actor: str = Field(..., description="Authenticated principal for the agent session") action: ActionType target_system: str = Field(..., description="System-of-record (e.g., Jira, GitHub, SAP)") sensitivity: DataSensitivity justification: str = Field(..., min_length=20, description="Auditable reasoning") evidence_urls: List[HttpUrl] = Field(default_factory=list, description="Links to tickets/logs") # Critical for distributed safety: prevents the agent from re-running an action idempotency_key: str = Field(..., min_length=16) @model_validator(mode="after") def enforce_governance(self) -> "TypedActionRequest": """ Enforces policy as code. This ensures that no state transition occurs without a verifiable audit trail. [cite: 750, 761] """ # Rule: State-changing actions MUST have associated evidence (e.g., a ticket URL) if self.action != "read" and not self.evidence_urls: raise ValueError("State-changing actions must include evidence_urls for audit integrity") # Rule: Restrict sensitive data (CUI/PII) to specific hardened targets if self.sensitivity in {DataSensitivity.CUI, DataSensitivity.PII}: if "public" in self.target_system.lower(): raise ValueError(f"High-sensitivity {self.sensitivity} cannot target public systems") return self The Necessity of Schemas as the Sole Auditable Conduit to Probabilistic Systems In regulated contexts, compliance does not derive from inspecting model weights or outputs. It arises from examining system behavior specifically: the identity of the requester, the data accessed, the control policies applied, approval lineage, and the precise operation executed. Typed schemas enable: Deterministic interpretation (eliminating ambiguity in action semantics), Reproducible change evaluation (structured diffs support accurate review), Uniform logging (consistent field presence across events), Enforceable policy integration (attribute-based routing and controls via explicit fields such as action type and data sensitivity), Long-term stability (while models evolve frequently, the schema remains a constant reference). This approach directly supports compliance with security frameworks emphasizing access governance, audit integrity, configuration control, and sensitive data handling, such as NIST SP 800-171, which mandates protection of Controlled Unclassified Information (CUI) in nonfederal information systems (NIST Computer Security Resource Center). Implementation Step 2: Tiered Risk Routing — Managing Reviewer Fatigue Through Deterministic Logic Deploying "Human-in-the-Loop" (HITL) as a universal mandate requiring approval for every agent action proves ineffective in operational environments. This approach leads to reviewer overload, processing delays, habitual approvals without scrutiny, and ultimately results in de facto reversion to full automation. A more sustainable solution is tiered risk routing: a deterministic mechanism that evaluates action requests using a quantifiable risk score, automatically executing low-risk actions while escalating medium- and high-risk actions to designated human review levels. Crucially, risk assessment must be derived from explicit, traceable data attributes rather than subjective judgment. The following example outlines a concrete risk_score function. It classifies actions into one of four pathways — AUTO, PEER_REVIEW, SECURITY_REVIEW, or LEGAL_COMPLIANCE based primarily on data sensitivity and action category, with incremental risk adjustments applied for privileged access modifications, financial transactions, and irreversible operations. Python from enum import Enum from typing import Tuple class ReviewTier(str, Enum): AUTO = "auto" PEER_REVIEW = "peer_review" SECURITY_REVIEW = "security_review" LEGAL_COMPLIANCE = "legal_compliance" def calculate_risk_tier(req: TypedActionRequest) -> Tuple[int, ReviewTier, str]: """ Scans a proposed action and routes it to the appropriate governance tier. Returns: (score 0-100, tier, rationale) """ score = 0 rationale = [] # 1. Sensitivity Bias: CUI/PII/PCI are first-class routing signals sensitivity_map = { "internal": 10, "cui": 35, # Controlled Unclassified Information (NIST 800-171) "pii": 45, "pci": 60 } score += sensitivity_map.get(req.sensitivity.value, 0) rationale.append(f"Sensitivity: {req.sensitivity}") # 2. Action Impact: Destructiveness and privilege changes increase risk action_risk_weights = { "read": 0, "write": 15, "config_change": 35, "access_grant": 50 # High-risk: alters the security posture } score += action_risk_weights.get(req.action.value, 0) rationale.append(f"Action: {req.action}") # 3. Contextual Overrides: Large transactions or admin requests if req.parameters.get("amount_usd", 0) >= 10000: score += 20 rationale.append("Large financial transfer") if req.parameters.get("grants_admin", False): score += 25 rationale.append("Admin privilege escalation") if req.dry_run: score -= 15 # Mitigating factor: non-mutating validation rationale.append("Safe-mode (Dry Run)") score = max(0, min(100, score)) # 4. Deterministic Routing Logic # Hard Escalation: Any state change to CUI data triggers a Security Review if req.sensitivity == "cui" and req.action != "read": return score, ReviewTier.SECURITY_REVIEW, "Sensitive state-change mandate" if score < 20: return score, ReviewTier.AUTO, "Low-impact automated execution" elif score < 50: return score, ReviewTier.PEER_REVIEW, "Standard peer oversight" elif score < 75: return score, ReviewTier.SECURITY_REVIEW, "High-risk security audit required" return score, ReviewTier.LEGAL_COMPLIANCE, "Critical compliance review required" Figure 3: Deterministic risk-tiering logic As demonstrated by the decision logic in Figure 3, these tiered outcomes provide the structural foundation for governed automation. To ensure this system remains robust in a production environment, there are two critical implementation considerations to address: Preserve the rationale string: The rationale string generated during scoring must be preserved within the review package and audit log. This enables clear responses to queries such as, “Why was Security Review triggered?” with an objective, reproducible justification.Strict parameter schemas: Unstructured key-value inputs introduce hidden risk through invalidated fields. Instead, model parameters as a controlled API: define schemas, maintain versioning, document allowable fields, and reject unrecognized keys — particularly in high-risk systems. Mapping System Architecture to Regulatory Requirements This phase operationalizes policy by transforming external regulatory mandates into enforceable system-level constraints, ensuring architectural alignment with compliance objectives. NIST SP 800-171 Rev. 3 as a Constraint Model at the Commit Boundary NIST SP 800-171 Rev. 3 establishes security requirements for safeguarding Controlled Unclassified Information (CUI) in nonfederal information systems (NIST Computer Security Resource Center). Compliance is not achieved through documentation alone but through architectural enforcement of authentication, authorization, auditing, and data handling controls. Architectural mechanisms enabling compliance: Access control and least privilege: The Executor operates as a distinct service identity with minimal, role-specific permissions. The agent does not possess production credentials, thereby limiting the potential impact of compromise or misuse during unauthorized access attempts.Audit and accountability: A verifiable audit trail is generated through structured data elements - TypedActionRequest, risk assessment score, human approval decision, and execution outcome. This transforms ambiguous autonomous actions into auditable, deterministic state transitions.Configuration management and change control: Configuration modifications are formalized as CONFIG_CHANGE actions, requiring passage through the commit boundary. This converts untracked configuration updates into governed, inspectable change operations.Controlled CUI handling: Data sensitivity labels (e.g., PUBLIC, INTERNAL, CUI) function as multi-purpose controls, influencing routing decisions, determining log redaction policies, and restricting permissible execution endpoints based on classification. NIST IR 8596 as Constraints on AI Cybersecurity Outcomes NIST IR 8596 functions as a cybersecurity framework profile tailored to artificial intelligence, aligning AI-specific risk factors with measurable cybersecurity objectives. (NIST Publications) Its primary operational implication underscores a frequently overlooked engineering principle: AI implementations extend beyond statistical models that constitute complex, interconnected systems requiring comprehensive, end-to-end security measures. Architectural implications include: Deployment of a Policy Gate as a dedicated enforcement point for mitigating AI-unique threats, including prompt injection, unauthorized tool invocation, data leakage via agent actions, and uncontrolled autonomous execution paths. Implementation of the Human Review Queue is not as a default fallback mechanism reliant on human judgment, but as a programmatically triggered control governed by deterministic decision rules. Elevation of the Audit Trail to a core security component, enabling pattern detection, forensic analysis, and iterative control refinement through reliable, structured records - eliminating reliance on heuristic interpretation of agent behavior. Colorado SB24-205 as Constraints on High-Risk Decision Systems Colorado’s SB24-205 (Consumer Protections for Artificial Intelligence) establishes legal obligations for systems classified as “high-risk,” mandating reasonable safeguards against known or foreseeable risks of algorithmic discrimination, with enforceability beginning on the date specified in the legislation. (Colorado General Assembly). From an engineering perspective, compliance translates into: Mandatory traceability and governance mechanisms for agentic workflows involved in high-stakes domains such as credit, employment, insurance, or housing that remain effective across model iterations and system updates. Reliance on architecture-defined intent typing, rule-based routing logic, and immutable audit logging to generate auditable evidence of input factors, proposed actions, approval authorities, and executed outcomes forming the foundational data layer required to support fairness assessments and regulatory inquiries following adverse events. (Colorado General Assembly). Open Problems and Challenges Reviewer Fatigue Stems From Systemic Design Limitations, Not Individual Capacity Constraints When every operational decision triggers a manual review ticket, cognitive load accumulates, leading to reviewer exhaustion and approval decisions driven by habit rather than scrutiny. While tiered routing mitigates volume, it must be complemented by: Action aggregation and differential summaries: present reviewers with compact, change-focused diffs instead of full execution logs.Idempotent retry mechanisms: eliminate redundant approvals caused by transient infrastructure failures by ensuring repeat executions do not trigger new review cycles.Statistical sampling with retrospective audits: for low-risk automated operations, implement a policy to validate risk models and detect policy drift without introducing latency. Evaluating Long-Horizon Agent Behavior Lacks Deterministic Tractability Even with well-defined commit boundaries, agents operating over extended timeframes introduce evaluation challenges. The primary risk is not isolated, faulty actions but sequences of individually valid steps that, collectively, violate safety invariants, mirroring emergent failure modes in distributed systems. Effective countermeasures include: Resource and action budgets: impose limits on state-modifying operations per session, resource, or time interval.Policy-enforced invariants: embed logical constraints in approval gates (e.g., prohibiting simultaneous privilege escalation and audit disablement).Causal trace propagation: attach persistent trace identifiers to all actions, enabling accurate reconstruction of execution lineage independent of model-generated memory. Architect’s Note In a corporate setting, achieving audit-ready AI requires designing agent systems as untrusted components operating within trusted, governed workflows. Persuasion does not stem from claims of model sophistication; it derives from demonstrable system properties - specifically, that all state transitions are bounded, traceable, and attributable, and that operational integrity is preserved even under model failure, adversarial input, or stochastic error. Speed need not be sacrificed at the point of execution, provided human oversight is strategically allocated. Focus human judgment where its marginal impact is greatest: handling sensitive data, executing privileged actions, transferring value, and implementing irreversible operations. All other processes should be engineered for safe automation via deterministic validation gates, rigidly defined schemas, and idempotent execution mechanisms. If human review queues accumulate indeterminate cases, the root cause typically lies in underspecified routing logic, overly permissive action definitions, or systemic gaps being offset by manual intervention. Furthermore, the audit trail must be treated as a first-class deliverable. It should be structured for efficient querying, cryptographically immutable, and aligned with business-level identifiers rather than low-level technical logs. When accountability is demanded, “Why did this occur?”, the response must not rely on reconstructing intent from chat histories or inferring significance from raw tool invocations. Instead, a unified, time-ordered record must exist, linking typed user intent, policy-based decision logic, required human approvals, execution outcomes, and resulting state changes — all anchored by identity, idempotency keys, and temporal sequence. This is the foundation for deploying agentic systems in regulated environments without exposing organizational risk to opaque, unverifiable processes. References and Further Reading arXiv 2601.17548: Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems (Maloyan & Namiot, Jan 2026) - A comprehensive systematization of 42 attack techniques and the relative failure of current defenses against adaptive injection.NIST SP 800-171 Rev. 3: Protecting Controlled Unclassified Information in Nonfederal Systems and Organizations (National Institute of Standards and Technology, May 2024) - The federal security requirement for safeguarding CUI in non-federal systems, serving as the primary constraint model for industrial AI.NIST IR 8596 (IPD): Cybersecurity Framework Profile for Artificial Intelligence (Cyber AI Profile) (National Institute of Standards and Technology, December 2025) - Guidelines for managing AI-specific risk factors using the NIST Cybersecurity Framework 2.0.OWASP LLM Top 10: OWASP Top 10 for Large Language Model Applications (OWASP Foundation, 2025) - The definitive industry classification for AI vulnerabilities, specifically addressing "Excessive Agency" (LLM08) and "Prompt Injection" (LLM01).Colorado SB24-205: Concerning Consumer Protections in Interactions With Artificial Intelligence Systems (Colorado General Assembly, May 2024) - The first comprehensive U.S. state law mandating "reasonable care" and impact assessments for high-risk AI decision systems.FINRA 2026 Report: Annual Regulatory Oversight Report: Dedicated Generative AI Section (FINRA, Dec 2025) - Clarifies that firms must maintain recordkeeping and supervision even for "agentic" automated support workflows. More
AI in Enterprise Content Workflows: What You Need to Know
AI in Enterprise Content Workflows: What You Need to Know
By Jake Miller
Building a Unified API Documentation Portal with React, Redoc, and Automatic RAML-to-OpenAPI Conversion
Building a Unified API Documentation Portal with React, Redoc, and Automatic RAML-to-OpenAPI Conversion
By Sreedhar Pamidiparthi
Shifting Bottleneck: How AI Is Reshaping the Software Development Lifecycle
Shifting Bottleneck: How AI Is Reshaping the Software Development Lifecycle
By Ralf Huuck
Mastering GitHub Copilot in VS Code: Ask, Edit, Agent and the Build–Refine–Verify Workflow
Mastering GitHub Copilot in VS Code: Ask, Edit, Agent and the Build–Refine–Verify Workflow

Most developers meet GitHub Copilot as a “smart autocomplete” that occasionally guesses the next line of code. Used that way, it’s nice — but you’re leaving a lot of value on the table. Inside VS Code, Copilot offers multiple modes of interaction designed for different stages of development: Chat Panel: Ask – use this for questions and explanationsEdit – use this for deliberate code changes.Agent – use this for autonomy, multi-step workIn-Editor Support: Ghost Text (Tab Completions) – fast, inline suggestionsInline Chat – targeted, context-rich refactoring If you understand when to use each, you can build a practical workflow: Build, Refine, Verify. This article walks through these modes, how they differ, and how to combine them into a repeatable development pattern you can trust. The Three Chat Panel Modes: Ask, Edit, Agent The Chat Panel is your main hub for high-level conversations with Copilot. It has three distinct modes that serve different purposes. 1. Ask Mode: Questions and Explanations Use Ask when you’re thinking, not editing. Ask mode is for understanding, exploring, and clarifying. It’s a safe space: Copilot won’t touch your files; it only answers in text and code snippets. Typical prompts: “How does this function work?”“What is the syntax for a flexbox?”“Explain this TypeScript error.”“What’s a good way to structure feature flags in React?” Result: You get answers, explanations, and code blocks you can copy manually. This is ideal when: You’re learning an unfamiliar API or library.You want a quick conceptual explanation (e.g., async/await, RxJS observables).You’re exploring options before committing to any code changes. Think of Ask mode as your embedded Stack Overflow + tutor. No risk, no edits, just information. 2. Edit Mode: Deliberate Code Changes Use Edit when you know what to change and want Copilot to implement it. In Edit mode, you’re giving Copilot a specific instruction about your codebase, and it will propose concrete file edits — still under your control. Example prompts: “Rename this variable across these two files.”“Refactor this class into smaller functions.”“Convert this callback-based API to async/await.”“Add null checks for user input in this file.” Result: Copilot updates your code in place, but the intent is surgical: you already understand the change; you just want help executing it consistently and quickly. Use Edit mode when: You have a clear, well-defined change.You need to apply that change across multiple files.You’re doing repetitive or mechanical refactors (renames, pattern changes, adding logs, etc.). It’s the “do the thing I already decided on” mode. 3. Agent Mode: Autonomy and Multi-Step Tasks Use Agent when you want Copilot to figure out the how and where. Agent mode is where Copilot becomes more autonomous. You describe an outcome, and Copilot breaks it into steps: editing files, creating new ones, and even running terminal commands (when allowed). Example prompts: “Create a task manager app.”“Add a user registration flow with email verification.”“Set up a basic Express server with JWT-based authentication.”“Generate a CI pipeline for this project using GitHub Actions.” Result: The Agent: Proposes a plan: “I will create A, modify B, run C…”Suggests file edits and new files.Can run commands in the terminal (e.g., install dependencies, run tests) if you confirm. Use Agent mode for: Greenfield scaffolding (new apps, services, components).Large, multi-step features.Initial project setup and boilerplate-heavy tasks. You’re still the tech lead: you approve steps and review diffs, but the Agent does the heavy lifting. Huge, all-in-one prompts perform worse than small, focused tasks. A far better approach is to talk to Agent like you would to a junior developer. In-Editor Interactions: Speed and Context Once you leave the Chat Panel and are deep in your code, the interaction style changes. Now it’s about momentum and precision inside your files. Ghost Text (Tab Completions): Momentum While Typing Ghost Text is the gray, inline suggestion that appears as you type. This is Copilot in its original, “autocomplete on steroids” form. Use it for: Boilerplate (loop structures, handlers, simple CRUD endpoints).Repetitive patterns (similar test cases, validation rules).Documentation and comments (docstrings, JSDoc, README snippets). If completions don’t seem to appear, ensure they’re enabled: Press Cmd + Shift + P (macOS) or Ctrl + Shift + P (Windows/Linux).Type: GitHub Copilot: Toggle CompletionsMake sure completions are enabled. Tab Inline Chat (Cmd+I/Ctrl+I): Targeted Refactoring Inline Chat brings Copilot right to your cursor with local context. How it works: Highlight the code you want to work on.Press Cmd+I (macOS) or Ctrl+I (Windows/Linux).Describe your intent: “Add priority levels to this list.”“Optimize this loop for large input sizes.”“Convert this to use a switch statement.”“Add better error handling here.” Inline Chat is ideal for: Local logic improvements.Iterating on algorithms.Enhancing error handling or logging.Adding small features in a specific function or block. Compared with Edit mode, Inline Chat feels more “in the flow”: you’re looking at the exact code, selecting it, and asking Copilot to transform it. The Build, Refine, Verify Workflow To get the most out of all these modes, tie them together into a simple three-step workflow: Build, Refine, Verify. 1. Build: Start Broad With Agent Begin with Agent mode when you’re facing a blank screen or a large new feature. “Create a task manager app.”“Add a ‘Projects’ feature to this dashboard, with CRUD endpoints and a basic UI.”“Set up database migrations for this service.” Let the Agent: Scaffold the project or feature.Create new directories, initial models, basic routes, or components.Wire up minimal working paths (e.g., one end-to-end flow). The goal is to defeat the blank page problem and get a working baseline quickly. 2. Refine: Get Specific With Inline Chat and Edit Once the structure exists, it’s time to refine and improve. Use: Inline Chat for local improvements: “Add filtering by status and due date to this query.”“Add priority levels (low, medium, high) to this list and sort accordingly.”“Improve the error messages returned by this API.” Edit mode for broader, planned changes: “Rename TaskItem to TodoItem across the project.”“Extract this monolithic function into smaller utilities in a utils folder.”“Switch this module from CommonJS to ES modules.” In this stage, you’re iterating on correctness, readability, performance, and maintainability. 3. Speed Up: Use Ghost Text to Fill the Gaps While refining, lean on Ghost Text to: Fill in obvious code patterns (e.g., additional test cases once it sees the first one).Write simple handlers, DTOs, or interfaces.Generate comments or docstrings from function names and parameters. This keeps you in flow. You decide the structure; Copilot fast-follows your intent. 4. Always Verify: Diff View as Your Safety Net Regardless of mode, there’s a non-negotiable final step: Verify. Before accepting changes — especially from Agent or Inline Chat — inspect the Diff view: Red = lines removed.Green = lines added. Check for: Unintended logic changes.Hidden side effects (e.g., changed function signatures, altered validations).Security or performance pitfalls (e.g., missing input validation, inefficient loops). Treat Diff view as your review gate: If it’s not clear within a few seconds what changed and why, step back.Ask Copilot (in Ask mode) to explain the diff: “Explain this change in plain English.”“Does this modification affect existing consumers of this function?” Copilot accelerates coding, but you remain the responsible engineer. Verification is where your judgement comes in. Putting It All Together Here’s how a realistic Copilot-powered session can look: Ask: “What’s a simple architecture for a task manager app with Node.js and React?” Agent: “Create a basic task manager app with backend in Express and frontend in React, including CRUD operations.” Refine with Inline Chat/Edit: Inline: “Add priority levels and due dates to tasks in this component.”Edit: “Rename Task to Todo across backend and frontend.” Speed with Ghost Text: Let Copilot autocomplete repetitive tests and API wrappers. Verify with Diff view: Review every proposed change.Run tests (manually or via Agent) and confirm behavior. Used this way, Copilot doesn’t replace your skills — it amplifies them.

By Hanna Labushkina
From Command Lines to Intent Interfaces: Reframing Git Workflows Using Model Context Protocol
From Command Lines to Intent Interfaces: Reframing Git Workflows Using Model Context Protocol

My recent journey into agentic developer systems has been driven by a desire to understand how AI moves from passive assistance to active participation in software workflows. In an earlier article, AI Co-creation in Developer Debugging Workflows, I explored how developers and AI systems collaboratively reason about code. As I went deeper into this space, I came across the Model Context Protocol (MCP) and became keen to understand what this component is and why it is important. I noticed that MCP was frequently referenced in discussions about agentic systems, yet rarely explained in a concrete, developer-centric way. This article is a direct outcome of that learning process, using a practical Git workflow example to clarify the role and value of MCP in intent-driven developer tooling. What Is an MCP Server? At a conceptual level, an MCP server acts as a control plane between an AI assistant and external systems. Rather than allowing an LLM to issue arbitrary API calls, the MCP server implements the Model Context Protocol and exposes a constrained, well-defined set of capabilities that the model can invoke. As illustrated in the diagram, the AI assistant functions as an MCP client, issuing structured MCP requests that represent user intent. The MCP server receives these requests, validates them against exposed capabilities and permissions, and translates them into concrete API calls or queries against external systems such as databases, version control platforms, or document stores. The results are then returned to the model as structured context, enabling subsequent reasoning or follow-up actions. This intermediary role is critical. The MCP server is not merely a proxy; it enforces permission boundaries, operation granularity, and deterministic execution. By separating intent expression from execution logic, MCP reduces the risk of unsafe or unintended actions while enabling AI systems to operate on real developer tools in a controlled manner. In effect, the MCP server bridges conversational AI and operational systems, making intent-driven workflows both practical and governable. Case Study: Intent-Driven Git Workflows Using GitHub MCP in VS Code To ground the discussion, this section presents a concrete case study using the open-source github-mcp-server, integrated into Visual Studio Code via GitHub Copilot Chat. The goal of this case study is not to demonstrate feature completeness, but to illustrate how MCP enables intent-first interaction for common GitHub workflows. MCP Server Registration in VS Code MCP servers are configured at the workspace or user level using a dedicated configuration file. In this setup, the GitHub MCP server is registered by adding an MCP configuration file under the VS Code workspace: .vscode/mcp.json JSON { "servers": { "github": { "url": "https://api.githubcopilot.com/mcp/" } } } This configuration declares GitHub as an MCP server and points the IDE’s MCP client to a remote endpoint. Once registered, the IDE can discover the capabilities exposed by the GitHub MCP server and make them available to the chat interface as structured tools. Authentication via OAuth Approval When the MCP server is first invoked, VS Code initiates an OAuth flow with GitHub. In this case, authentication was completed by approving access through a browser-based login using GitHub credentials (username and password, followed by any configured multi-factor authentication). This OAuth-based flow has several important properties: Credentials are not stored directly in the MCP configuration.Permissions are scoped to the approved application.Token issuance and rotation are handled by the GitHub authorization system. Once authorization is complete, the MCP server can securely execute GitHub operations on behalf of the user, subject to the granted scopes (these are listed as tools when configuring the MCP server). Alternative Authentication: Personal Access Tokens In addition to browser-based OAuth authorization, the GitHub MCP server can also be configured using a GitHub Personal Access Token (PAT). This approach is useful when explicit credential control is required or when OAuth approval is not feasible in a given environment. In this setup, the MCP configuration declares an Authorization header and prompts the user to supply the token securely at runtime, rather than hardcoding it in the file. .vscode/mcp.json (PAT-based authentication) JSON { "servers": { "github": { "type": "http", "url": "https://api.githubcopilot.com/mcp/", "headers": { "Authorization": "Bearer ${input:github_mcp_pat}" } } }, "inputs": [ { "type": "promptString", "id": "github_mcp_pat", "description": "GitHub Personal Access Token", "password": true } ] } This configuration has two practical advantages. First, the token is not committed to source control because it is collected via an interactive prompt. Second, it makes the authentication mechanism explicit and portable across environments while keeping the MCP server endpoint unchanged. After the token is provided, the IDE can invoke GitHub MCP capabilities through the same intent-driven prompts used in the OAuth-based setup. Verifying MCP Server Initialization in VS Code After adding the MCP configuration, it is important to verify that the GitHub MCP server is correctly initialized and running. Visual Studio Code exposes MCP server lifecycle events directly in the Output panel, which serves both as a validation mechanism and a primary debugging surface. Once the .vscode/mcp.json file is detected, VS Code attempts to start the configured MCP server automatically. In the Output tab, selecting the “MCP: github” channel shows detailed startup logs, including server initialization, connection state, authentication discovery, and tool registration. The logs confirm several important stages: The GitHub MCP server transitions from Starting to RunningOAuth-protected resource metadata is discoveredThe GitHub authorization server endpoint is identifiedThe server responds successfully to the initialization handshakeA total of 40 tools are discovered and registered These log entries provide concrete evidence that the MCP server is active and that its capabilities are available to the IDE. They also offer visibility into the OAuth flow, making it clear when authentication is required and when it has been successfully completed. From a practical standpoint, the Output panel becomes essential when troubleshooting MCP integrations. Configuration errors, authentication failures, or capability discovery issues surface immediately in these logs, allowing developers to debug MCP setup issues without leaving the IDE or guessing at silent failures. Executing GitHub Operations Through Intent Once the GitHub MCP server is configured and running, GitHub operations become available inside the IDE as structured capabilities. Using Visual Studio Code with GitHub Copilot Chat, prompts expressed in natural language are translated into constrained GitHub operations via the github-mcp-server. Repository Discovery Prompt: “List all repos in my GitHub account.” The assistant invokes the repository-listing capability and returns the results directly in the IDE, validating authentication and MCP capability discovery. Pull Request Creation Prompt: “Create a PR.” Because the request is underspecified, the assistant asks for required parameters, including repository, change source, title, description, and base branch. After responding with: “react-storybook-starter, staged changes, PR title – Add a dummy commit, PR description none, merge to master” the assistant creates a branch, commits the staged changes, and opens a pull request. The PR is confirmed with its repository identifier. Repository Creation Prompt: “Create a new repo in mvmaishwarya. Repo name: problems-and-prep. Repo is public.” The MCP server executes the repository creation operation and returns confirmation that the public repository has been successfully provisioned. Observations from Intent-Driven Execution Across these examples, several consistent behaviors emerge. First, the assistant requests clarification only when required by the operation’s schema, avoiding unnecessary dialogue. Second, all actions are executed through explicitly exposed MCP capabilities rather than inferred or free-form API calls. Finally, the IDE remains the primary workspace, reducing context switching between terminals, browsers, and documentation. Together, these interactions demonstrate how MCP enables GitHub workflows to shift from command-driven procedures to intent-driven execution while maintaining safety, transparency, and developer control.

By Aishwarya Murali
Top 5 Payment Gateway APIs for Indian SaaS: A Developer’s Analysis
Top 5 Payment Gateway APIs for Indian SaaS: A Developer’s Analysis

As Indian SaaS companies, e-commerce platforms, and service providers increasingly target global markets, the need for robust international payment integration has become paramount. While numerous payment gateways offer cross-border capabilities, the developer experience and the specific API features required to handle these transactions efficiently — especially given India’s unique compliance landscape — vary significantly. Simply processing a charge isn’t enough. Developers need APIs that elegantly handle multiple currencies, diverse global payment methods, stringent security protocols such as 3D Secure 2.0, and, crucially, provide programmatic access to the data required for Indian regulatory needs like the Foreign Inward Remittance Certificate (FIRC). Manual processes for compliance or reconciliation simply don’t scale. This article provides a technical deep dive into the APIs of five major payment gateways active in India, evaluating their suitability for developers building applications that require international payment acceptance. We focus on API design, core international payment features, developer experience (DX), and the critical aspect of handling compliance programmatically. The API Litmus Test: Key Criteria for Evaluation When assessing an international payment gateway API from an Indian developer’s perspective, the following factors are critical. API Design & Developer Experience (DX) Architecture: Is the API truly RESTful, with predictable, resource-oriented URLs and standard HTTP methods?Documentation: Is the API reference comprehensive, accurate, and easy to navigate? Are there clear code examples, tutorials, and quickstart guides relevant to international payments?SDKs: Are well-maintained SDKs available for major backend languages (Node.js, Python, Java, PHP, Ruby)? Do they provide convenient abstractions over raw API calls?Sandbox environment: How closely does the sandbox mimic the production environment, especially for testing international card flows, 3DS challenges, and currency conversions? Is it reliable and easy to provision test credentials?Developer support: How responsive and technically adept is the support team when developers face integration issues? Multi-Currency & FX Handling via API Currency support: Does the API allow creating charges directly in major international currencies (USD, EUR, GBP, etc.)?FX rate transparency: Can applicable foreign exchange rates be fetched or previewed via the API?Settlement data: How clearly does the API, or related webhooks, expose the final settlement amount in INR, including any applied FX rates or fees? Payment Method Integration (API Level) International cards: How straightforward is the API flow for accepting major international card networks (Visa, Mastercard, Amex)?Other global methods: Does the API support integrating other relevant methods, such as PayPal, easily if required? Security & 3DS2 Integration APIs PCI compliance: Does the provider offer solutions (such as hosted fields or dedicated SDKs) that minimize the developer’s PCI compliance burden?3D secure 2.0: How does the API manage mandatory 3DS2 flows for relevant international transactions? Does it provide clear status updates via webhooks or callbacks for authentication success, failure, or challenge flows?Fraud prevention APIs: Are there endpoints for retrieving fraud risk scores, passing custom transaction metadata for risk analysis, or configuring fraud rules programmatically? Compliance & Settlement Data via API (Critical for India) FIRC data retrieval: Can the essential data points required for FIRC generation — such as UTR number, purpose code, transaction ID, settlement amount, and FX rate — be accessed programmatically via API endpoints or reliably delivered through webhooks? Or does this require manual report downloads?Reconciliation: Do the settlement APIs or reports provide sufficient detail (for example, linking settlements back to original transaction IDs) to enable automated reconciliation of international payments credited to an Indian bank account? The API Deep Dive: Comparing Five International Payment Gateways Let’s examine how five popular gateways stack up based on these API-centric criteria. 1. Razorpay International Payments Positioning: Optimized for Indian businesses — SaaS, e-commerce, and services — going global. API analysis: Razorpay offers a largely RESTful API. Creating international charges involves specifying the currency parameter, with support for 130+ currencies. The documentation is generally clear, with dedicated sections for international payments and code examples in multiple languages. SDKs are available for major platforms. Strengths (API focus): Compliance automation: Razorpay’s key differentiator. While direct API endpoints for all FIRC data points are still evolving, the platform provides crucial identifiers — such as razorpay_payment_id, settlement details (settlement_id, utr) — via webhooks and dedicated Settlement APIs. This facilitates programmatic reconciliation and compliance data collection. Features like the MoneySaver Export Account aim to improve FX transparency, often reflected in settlement details accessible via API. Additionally, the international payment gateway handles international card payments reliably, with minimal downtime.Unified domestic/international payments: Indian payment methods (UPI, Netbanking) and international cards are handled through a relatively consistent API structure, reducing integration complexity. Potential weaknesses (API focus): The sandbox environment, while functional, may not always replicate all edge cases for international 3DS flows across card issuers. Advanced FX rate querying may not be fully exposed via API. Verdict: A strong choice for Indian developers prioritizing integrated compliance and a unified API for domestic and international payments. The programmatic access to settlement data is a significant advantage, and the MoneySaver Export Account is a cost-effective alternative to traditional bank transfers. 2. Stripe (Global) Positioning: The feature-rich global standard. API analysis: Stripe’s API — especially PaymentIntents — is widely regarded as a gold standard for design, consistency, and documentation. It is highly flexible, supporting complex international scenarios, multiple currencies, and a broad range of global payment methods. SDKs and developer tooling are excellent. Strengths (API focus): Flexibility and power: Granular control over the payment lifecycle, including 3DS handling, and support for many international payment methods beyond cards.Developer experience: Best-in-class documentation, client libraries, CLI tooling, and sandbox environment. Extensive webhook support enables real-time updates. Potential weaknesses (API focus): Indian compliance via API: Programmatically extracting FIRC-related data — such as the exact UTR number from Indian settlement batches — can be challenging. It often requires parsing settlement reports obtained manually or via indirect APIs (for example, the Reporting API), adding complexity compared to India-focused providers. Purpose code management might also be less integrated at the API level. Verdict: An excellent API for complex global payment flows and experienced teams. However, developers must plan for additional work to automate India-specific compliance requirements. 3. PayPal Positioning: Widely trusted globally, with varying API depth. API analysis: PayPal provides modern REST APIs for checkouts and card processing (where available). Integration typically involves redirects or JavaScript SDKs. Multi-currency handling is a core capability. Strengths (API focus): Global recognition: Integrating the PayPal wallet via API or SDK is straightforward and benefits from strong global user trust.Broad currency support: Native multi-currency support across APIs. Potential weaknesses (API focus): API complexity: Direct international card processing (beyond PayPal wallet payments) can be more complex or have limited availability compared to Stripe or Razorpay. Indian compliance via API: Similar to Stripe, retrieving FIRC-related settlement data (like UTR) programmatically often requires specific reporting endpoints or manual report downloads. Auto-withdrawal can further complicate reconciliation. Verdict: Essential if PayPal wallet support is a priority. For direct card processing, carefully evaluate API capabilities and the feasibility of automating Indian compliance workflows. 4. 2Checkout (Verifone) Positioning: Focused on global e-commerce and digital goods. API analysis: 2Checkout provides APIs for global e-commerce use cases, supporting multiple currencies and international payment methods. Documentation covers order creation, payments, and subscriptions. Strengths (API focus): Global payment methods: Strong support for region-specific international payment methods.E-commerce features: APIs often include features relevant to e-commerce, such as tax handling and localized checkout features. Potential weaknesses (API focus): DX and modernity: API design and developer experience may feel less modern or intuitive compared to Stripe or Razorpay.Indian compliance via API: Accessing Indian settlement details (such as UTRs for FIRC) programmatically may be less straightforward and insufficiently documented for Indian compliance needs. Verdict: A viable option for global e-commerce businesses, but requires careful evaluation of API endpoints and processes for automating Indian compliance and reconciliation. 5. CCAvenue Positioning: Established Indian player with international capabilities. API analysis: CCAvenue supports international payments and multi-currency processing. Historically, integrations relied on form posts or proprietary protocols, though newer APIs may be available. Strengths (API focus): Local market expertise: Deep understanding of the Indian banking ecosystem.Multi-currency processing: Supports international currencies with INR settlement. Potential weaknesses (API focus): API Design and DX: Older integrations may feel less developer-friendly. Documentation can be less comprehensive or harder to navigate.Compliance data via API: Programmatic access to granular settlement data (such as UTRs for FIRC) may be limited or require manual report handling. Verdict: Reliable, especially for businesses already using CCAvenue domestically, but developers should carefully assess the latest APIs with a focus on DX and automated access to compliance data. API Feature Matrix: Quick Comparison for Developers Gateway API Design Multi-Currency API Ease FIRC Data via API? SDK Quality Docs Clarity Sandbox Quality Razorpay Int'l Mostly RESTful Excellent Yes (Partial/Via Settlements API/Webhooks) Excellent Excellent Good Stripe (Global) Excellent (REST) Good Indirect (Via Reporting API/Manual) Excellent Excellent Excellent PayPal REST Good (REST) Good Indirect (Via Reporting/Manual) Good Good Good 2Checkout (Verifone) Fair-Good Good Likely Indirect Fair Fair Fair-Good CCAvenue Varies (Legacy/New) Fair Likely Indirect/Manual Fair Fair Fair Note: “FIRC Data via API?” refers to the ease of programmatically obtaining identifiers such as UTRs for automated compliance, not merely the existence of the data in reports. Conclusion: Selecting the Best API for Your International Stack Choosing an international payment gateway API requires balancing global feature richness with local operational realities. Global powerhouses (Stripe, PayPal): Offer flexible, feature-rich APIs ideal for complex international scenarios. However, automating India-specific compliance — especially FIRC data retrieval — often requires additional engineering effort.India-optimized solutions (Razorpay): Aim to bridge this gap by combining international payment capabilities with built-in or well-exposed compliance pathways via APIs and webhooks, reducing development and operational overhead.Specialized players (2Checkout, CCAvenue): Provide essential functionality but may lag in API modernity, DX, or programmatic access to India-specific compliance data. Ultimately, the best API depends on your team’s expertise, payment flow complexity, and how critical automated compliance is to your operations. Before committing, thoroughly test sandbox environments — focusing on international card flows with 3DS2, currency handling, and, most importantly, your ability to programmatically retrieve transaction and settlement data required for FIRC and reconciliation. The API that makes this lifecycle easiest to manage in code is likely your best long-term choice.

By Sarang S Babu
UX Research in Agile Product Development: Making AI Workflows Work for People
UX Research in Agile Product Development: Making AI Workflows Work for People

During my eight years working in agile product development, I have watched sprints move quickly while real understanding of user problems lagged. Backlogs fill with paraphrased feedback. Interview notes sit in shared folders collecting dust. Teams make decisions based on partial memories of what users actually said. Even when the code is clean, those habits slow delivery and make it harder to build software that genuinely helps people. AI is becoming part of the everyday toolkit for developers and UX researchers alike. As stated in an analysis by McKinsey, UX research with AI can improve both speed (by 57%) and quality (by 79%) when teams redesign their product development lifecycles around it, unlocking more user value. In this article, I describe how to can turn user studies into clearer user stories, better agile AI product development cycles, and more trustworthy agentic AI workflows. Why UX Research Matters for AI Products and Experiences For AI products, especially LLM-powered agents, a single-sentence user story is rarely enough. Software Developers and product managers need insight into intent, context, edge cases, and what "good" looks like in real conversations. When UX research is integrated into agile rhythms rather than treated as a separate track, it gives engineering teams richer input without freezing the sprint. In most projects, I find three useful touchpoints: Discovery is where I observe how people work todayTranslation is where those observations become scenario-based stories with clear acceptance criteriaRefinement is where telemetry from live agents flows back into research and shapes the next set of experiments A Practical UX Research Framework for Agile AI Teams To keep this integration lightweight, I rely on a framework that fits within normal sprint cadences. I begin by framing one concrete workflow rather than a broad feature; for example "appointment reminder calls nurses make at the start of each shift." I then run focused research that can be completed in one or two sprints, combining contextual interviews, sample call listening, and a review of existing scripts. The goal is to understand decisions, pain points, and workarounds. Next, I synthesize findings into design constraints that developers can implement directly. Examples include "Never leave sensitive information in voicemail" or "Escalate to a human when callers sound confused." Working with software developers, product managers, and UX designers, I map each constraint to tests and telemetry so the team can see when the AI agent behaves as intended and when it drifts. Also Read: The Benefits of AI Micromanagement UX Research Framework for Agile AI Product Development Technical Implementation: From Research to Rapid Prototyping One advantage of modern AI development is how quickly engineering can move from research findings to working prototypes. The gap between understanding the problem and having something testable has shrunk dramatically. Gartner projects that by 2028, 33% of enterprise software will embed agentic AI capabilities driving automation and more productivity. When building AI agents, I have worked with teams using LLMs or LLM SDKs to stand up functional prototypes within a single sprint. The pattern typically looks like this: UX research identifies a workflow and its constraints, then developers configure the agent using the SDK's conversation flow tools, prompt templates, and webhook integrations. Within days, I have a working prototype that real users can evaluate. This is where UX research adds the most value to rapid prototyping. SDKs handle the technical heavy lifting, such as speech recognition, text-to-speech, and turn-taking logic. But without solid research, developers and PMs end up guessing business rules and conversation flows. When I bring real user language, observed pain points, and documented edge cases into sprint planning, the engineering team can focus on what matters: building an agent that fits how people work. The same holds true for text-based agents. LLM SDKs let developers wire up conversational agents quickly, but prompt engineering goes faster when you have actual user phrases to work from. Guardrails become obvious when you have already seen where conversations go sideways. Also Read: Bounded Rationality: Why Time-Boxed Decisions Keep Agile Teams Moving How UX Research Changes Agile AI Development Incorporating UX research into agile AI work changes how teams plan and ship software. Deloitte's 2025 State of Generative AI in the Enterprise series notes that organizations moving from proofs of concept into integrated agentic systems are already seeing promising ROI. In my experience, the shift happens in two key areas. The first change is in how I discuss the backlog with engineering and product teams. Instead of starting from a list of features, I start from observed workflows and pain points. Software developers and PMs begin to ask better questions: How often does this workflow occur? What happens when it fails? Where would automation genuinely help rather than just look impressive in a demo? The second change is in how I judge success. Rather than looking only at LLM performance metrics or deployment counts, I pay attention to human-centric signals. Did the AI agent reduce manual calls for nurses that week? Did fewer financial operations staff report errors in their end-of-day checks? Those questions anchor agile AI decisions in users' lived experience. Use Case: Voice AI Agent for Routine Calls I built a voice AI agent to support routine inbound and outbound calls in healthcare and financial services. In my user research, I found that clinical staff and operations analysts spent large parts of their shifts making scripted reminder and confirmation calls. Staff jumped between systems, copied standard phrases, and often skipped documentation when queues spiked. I ran contextual interviews with nurses and operations staff over two sprints. I sat with them during actual call sessions, noted where they hesitated, and asked why certain calls took longer than others. One nurse told me she dreaded callbacks for no-shows because patients often got defensive. That single comment shaped how we designed the escalation logic. Based on these observations, I scoped an AI agent with clear boundaries. It would dial numbers, read approved scripts, capture simple responses like "confirm" or "reschedule," log outcomes in the primary system, and escalate to a human when callers sounded confused or emotional. Each constraint came directly from something I observed or heard in research. The "escalate when confused" rule, for example, came from watching a staff member spend four minutes trying to calm a patient who misunderstood an automated message. We treated the research findings as acceptance criteria in the backlog. Developers could point to a specific user quote or observed behavior behind every rule. When questions came up during sprint reviews, I could pull up the interview notes rather than guess. The AI agent cut manual call time, reduced documentation errors by more than 50%, and made collaboration between teams and end users more consistent. Because I started from real workflow observations and built in human escalation paths, adoption was smoother than previous automation attempts and increased by 35% in one quarter. Voice AI Agent Case Study Why This Approach Works UX research gives agile AI development a focused user perspective that directly supports developer cycles. When teams work from real workflows and constraints, they write less speculative code, reduce rework, and catch potential failures earlier. McKinsey's work on AI-enabled product development points out that teams redesigning their Agile AI product development and with UX research expertise tend to see more user-centric decision-making leading to better product experiences. Knowing this, and in my opinion, you do not have to trade one for the other. Agile AI teams that work this way stay closer to their users without slowing down. Key Takeaways If you are beginning to build or refine LLM-powered agents, here is a realistic next step. Pick one narrow workflow. Study how work happens today. Run a small research-driven experiment. Use telemetry and follow-up conversations to refine each iteration. AI delivers lasting value only when it is integrated thoughtfully into how people and teams already operate. By treating UX research as a first-class part of agile AI development, you bring the user's perspective into every sprint and make your development lifecycle more responsive to real needs. UX research helps agile AI teams start from real workflows instead of abstract features, leading to more focused and effective agentic workflowsIntegrating Research into each agile AI product development sprint gives teams clearer constraints, reduces rework, and supports higher quality releasesModern LLMs accelerate prototyping, but the quality of your agentic AI workflows depends on how well you understand the AI workflows before you define requirements and write code

By Priyanka Kuvalekar
The Coming Shift From Bigger AI Models to Smaller, Faster Ones
The Coming Shift From Bigger AI Models to Smaller, Faster Ones

Bigger isn’t always better, especially when it comes to AI models. They are larger, more capable, and more resource-intensive, utilizing bigger models to deliver enhanced reasoning, summarization, and even code generation capabilities. The size and scalability of gen AI models have their limits. Larger models are designed to work best with open-ended problems, which are, by nature, often countered in chats. However, when an AI-powered product, such as a CRM system, is using AI models, the problem that the product is solving is actually very much fixed and highly structured. It has deviated substantially from the original chat format, which would require AI models to define the problem and come up with the steps to a solution themselves. As we look forward to 2026, we can expect to see a more nimble system design. AI is transitioning from research to production, particularly in enterprise ecosystems, and the limitations of LLMs are beginning to show. Latency, cost, and lack of control are making it more difficult to harness LLMs for fixed business workflows. Using LLMs to address routine business issues is like using a sledgehammer to crack a nut – you don’t need that much AI processing power. When Smaller Is Better Let's take AI-powered customer support for e-commerce, for example, which is one of the most popular business use cases of GenAI. When implementing an AI customer support agent, the first instinct would be to deploy a large thinking model like GPT-5 Thinking or Sonnet 4.5 to handle the full customer inquiry, since these thinking models are supposedly powerful enough to do everything, including understanding customer tone, interpreting requests, generating empathetic responses, checking inventory, processing returns, and escalating complex issues. However, when this is actually implemented, there are some key issues: The response is slow. Larger thinking models are often slower than smaller models. This may be a smaller problem for email support, but a very big issue for chat support. It's expensive. Larger models may cost 10 times as much as smaller models, processing the exact same input.It's inconsistent. Using larger models may correctly answer customer inquiries 90% of the time, but it's very difficult to improve on the last 10% since we have so little control over "how" the model thinks. The next wave of AI systems will prioritize architecture over scale. It’s time to adopt smaller, faster, more specialized AI models engineered to work together as modular components to address specific business problems. The Bigger Brain Fallacy For the past five years, developers have been focused on optimizing “thinking” AI models that can handle open-ended reasoning using conversational language. LLMs that support such thinking models are great for free-form tasks, such as ideation, creative writing, and complex logic. They are less well-suited for structured, rules-based applications, such as CRM, ERP, and e-commerce, yet organizations are adapting LLMs for rules-based workflows. The problem space for many business issues is well-defined within a specific workflow. LLMs are ideal for freeform reasoning, but the task of AI is actually usually clearly defined; there is not much free reasoning needed to create a path to the solution; it’s to execute that path efficiently and predictably, with consideration for constraints like cost and latency. For interactive systems to deal with issues such as routine customer issues, businesses need predictability and consistency, not opaque AI geniuses. Modular Means More Efficiency Rather than adopting behemoth AI models, it makes better sense to break the problem into a sequence of narrower AI tasks, each handled by a specific, lightweight AI model. Each of these smaller models performs a discrete, well-defined function. Together, they can be assembled into a composable workflow that outperforms LLMs for well-defined functions. Assembling a swarm of task-specific models optimizes speed, cost, and reliability. For example, we already have a clear set of rules on how customer inquiries should be processed. Let's do a high-level overview on how we can use small models to divide and conquer: Intent classification – Use an intent classifier at the beginning with a tiny model. Its only job is to read the customer message and identify what the customer wants, whether it is refunding, order tracking, product info, etc.Policy enforcement – Depending on the intent classifier, run a predefined SOP according to its category. Let's say the customer is asking for a refund; it can first run a small model to check store return policies. It can either accept or reject the request, ask for more information, or escalate and route to a human support.Data interaction – If the refund is accepted, run a model to generate an action to check and update customer order data in the database.Response generation – Based on the result of the updated order, the AI drafts a response using a small model or even sends a simple reply to the customer using a template without even using AI. While there are multiple model calls, each one is smaller, faster, and cheaper than using a single LLM. This approach could reduce processing time by 70% and cut costs by over 50%. The simpler the query, the shorter the time and the lower the cost. It’s also easier to debug. Since each function has a specific responsibility, developers can observe and test outcomes. Each component can be individually benchmarked to identify the weak points. The accuracy of this swarm of smaller models' approach is much better than the single larger thinking model approach in most cases, because the smaller models are asked to do one much simpler and specific job, and they have a much smaller chance of hallucinating. It also has many fewer output degrees of freedom and a clearer success criterion, which reduces the number of ways that things can go wrong. A Return to Classic Software Principles Using a modular approach may seem familiar. Rather than treating AI systems as black boxes, this marks a return to classic software engineering, where developers can create transparent and measurable elements. In one example, each model behaves like a microservice. Observable metrics such as latency, cost per token, and accuracy are tracked at every stage. Classifiers or text generators can be swapped out without having to retrain the entire system. Workflows can be reconfigured based on user context or business logic. This modular approach aligns AI with modern DevOps practices. Deployment pipelines can be extended to include model components. Monitoring tools can log model-level performance, error rates, and drift. The result is AI development as an iterative engineering approach rather than building a black box. The resulting systems are not only faster and more predictable but also easier to maintain at scale. The use cases for the largest AI adopters are mostly very suitable for this type of swarm of smaller models approach. The top 30 OpenAI customers have already used more than 1 trillion AI tokens. For most of these companies, AI usage is well-defined, so they would likely benefit from using a swarm of small models. Duolingo is one of the companies in the top 30 list. The company is utilizing AI for language learning, which doesn’t require much critical thinking. What it does need is consistent ways to generate responses in multiple languages. A swarm of structured, repeatable tasks is all that’s needed. Generative AI was designed to address the bigger challenge of utilizing natural language processing (NLP). Most AI applications are taking advantage of that capability, but in 2026, we can expect to see a shift from AI model size to system design. The most advanced products will be defined by their architecture rather than the number of parameters. The key to success is intelligently and efficiently orchestrating specialized models to address specific business outcomes. AI is entering the DevOps era. The future won’t be built using a single giant brain, but a network of distributed micro-intelligences working together at machine speed.

By Chia-fu Kuo
When Leadership Blocks Your Pre-Mortem
When Leadership Blocks Your Pre-Mortem

TL;DR: The Pre-Mortem Leadership resistance to your pre-mortem reveals whether your organization’s operating model prioritizes comfortable narratives over preventing failure. This article shows you how to diagnose cultural dysfunction and decide which battles to fight. The Magic Of Risk Mitigation Without Passing Blame There’s a risk technique that takes 60 minutes, costs nothing, and surfaces problems other planning methods miss. It’s been field-tested for nearly two decades. Teams that use it catch catastrophic issues while there’s still time to act. Most organizations never run one. When you try to introduce it, the people who complain loudest about projects failing are likely the same ones who will kill it in the meeting. The technique is the pre-mortem, and the resistance you hit tells you more about your organization than any risk register. The Basics (In Case You Haven’t Run a Pre-Mortem) Traditional risk planning asks “what might go wrong?” A pre-mortem flips it: Assume your initiative already failed. It’s six months from now, the project is a smoking crater, and you’re gathering the team to explain what happened. That shift from “might fail” to “did fail” breaks something open. People stop hedging. The risks they’ve been too politically careful to mention in a typical planning session suddenly make it into the room: the technical debt everyone knows about, but nobody wants to raise, the stakeholder who will torpedo this in month four, the assumption the whole plan depends on that nobody has actually validated. The pre-mortem technique is simple: In a 60-minute session, everyone first writes down their own reasons for failure. You cluster them, vote on the critical ones, then dig in: What does that failure actually look like?What early warnings would we see?What can we do this week to prevent it? What’s the backup plan? You walk out with a shared understanding of what could kill this initiative and concrete actions you can take immediately. Not a document to file. Actual insight. My tip: Liberating Structures work very well in this context; think of TRIZ, for example. Objectives from Leadership Level Against Pre-Mortems Interestingly, the pre-mortem technique is not as popular as we think. On the contrary, any facilitator who suggests a pre-mortem may face serious opposition from leadership. The top-three objections are: 1. “We Don’t Have Time for Another Workshop” When you hear this, you’re not hearing a scheduling problem. You’re hearing a confession. What they’re saying: Calendars are packed, we’re under pressure to deliver, and an hour spent imagining failure is an hour not spent building. What they’re confessing: Planning in this organization is theater. We can’t tell the difference between looking busy and being effective. We have time for roadmap sessions and strategy off-sites that produce nothing but slide decks, but not for 60 minutes that might actually prevent failure. Ask yourself: if you don’t have 60 minutes to pressure-test a significant initiative before you commit resources to it, what are you doing in all those other meetings? If you can’t spare an hour for thinking, you’re not planning, but performing planning for an audience. You always have time for what you actually value. This objection shows that the organization values the appearance of progress over its substance. 2. “This Is Too Negative, It Will Demotivate People” This one is my favorite because it’s pure magical thinking dressed up as leadership wisdom. What they are saying: We need to project confidence. Dwelling on failure becomes self-fulfilling. Teams need positive energy. What they are actually revealing: We have confused optimism with competence. We believe reality is negotiable, that if we just maintain the right attitude, the laws of physics, market dynamics, and technical constraints will join our efforts and make us successful. The problem, of course, is that reality doesn’t care about your team’s morale. Your competitors aren’t checking your confidence level before they move. Technical debt doesn’t vanish because you chose not to discuss it. I have watched this play out repeatedly. Teams that can only stay motivated by avoiding hard truths aren’t resilient; they’re brittle. The first time they hit a problem they didn’t prepare for, the whole structure collapses. Motivation built on denial shatters the moment you encounter reality. The most motivated teams I have seen are those that know precisely what they are up against and have a plan to deal with it. And if that is not working, they can pivot rapidly to another plan. Confidence that survives contact with reality requires facing reality first. 3. “We Already Manage Risk” This objection is the most revealing because it exposes a category error in the organization’s thinking. What they are saying: The PMO maintains risk registers. We have governance processes. Project reviews happen. Therefore, a pre-mortem looks like duplication. What they are missing: They have mistaken the artifact for the activity. Having a risk register is not the same as having risk awareness. It is the difference between owning a fire extinguisher and understanding how fires start. Look at the risk registers in your organization. You will often see the same five entries on every project: “scope creep,” “resource constraints,” “stakeholder alignment,” “technical complexity,” and “timeline pressure.” Not wrong. Just useless. Too generic to act on, too obvious to provide insight, too abstract to prevent anything. A pre-mortem asks different questions. It focuses on what will kill this particular initiative in this context. It uses collective intelligence from everyone who knows something critical about what could go wrong, not one person filling out a template alone, thereby creating alignment and a shared understanding of the risk situation. You are not duplicating risk management. You’re doing it for the first time. Conclusion: What You Learn from a Pre-Mortem’s Rejection When leadership blocks a pre-mortem with one of these objections, pay attention. You are learning more about the system you are operating in than about the technique. The pattern is consistent: The organization prefers comfortable narratives to uncomfortable truths. It would rather maintain the fiction of control than develop the capability to handle what is coming. No facilitation method fixes that. If leadership can’t spare 60 minutes for critical thinking, or believes acknowledging problems creates them, or thinks documentation equals understanding, you face a cultural dysfunction that runs deeper than your initiative’s risk profile. You can still use that information. You can make better decisions about where to invest your energy, which battles are worth fighting, and whether this organization is serious about the outcomes it claims to want. Sometimes, the most valuable thing a pre-mortem shows you is that nobody in charge actually wants to know why a project might fail.

By Stefan Wolpers DZone Core CORE
Beyond the Vibe: Why AI Coding Workflows Need a Framework
Beyond the Vibe: Why AI Coding Workflows Need a Framework

For decades, software development has been a story of evolving methodologies. We moved from the rigid assembly line of Waterfall to the collaborative, iterative cycles of Agile and Scrum. Each shift was driven by a need to better manage complexity. Today, we stand at a similar inflection point. A new, powerful collaborator has joined the team: Artificial Intelligence. The initial rush to use AI has led to a chaotic, improvisational style of work many call “vibe coding.” It’s fast, it’s exciting, but as many teams are discovering, it’s not sustainable. Just as Agile brought structure to team collaboration, a new generation of AI-native frameworks is emerging to bring structure, predictability, and professionalism to human-AI collaboration. Hidden Costs of Unstructured AI Use The hype around AI productivity is real. Studies show developers can code up to 55% faster with AI assistants. But these headline numbers mask a darker, more expensive reality for teams that lack a formal process. The 70% rejection rate: Industry data shows that while AI tools suggest code constantly, developers reject or discard approximately 70% of these suggestions (Source: GitClear, Netcorp 2025). Every rejected suggestion represents wasted compute cycles, direct token costs, and a developer’s time spent sifting through noise instead of building.The quality nosedive: A 2024 analysis found that unstructured AI-assisted coding was linked to a four-fold increase in code duplication and a rise in “code churn”, brittle and non-reusable code that inflates technical debt and creates future maintenance nightmares (Source: GitClear). Without a guiding framework, the developer’s mental load doesn’t disappear. It shifts from writing code to constantly vetting, debugging, and refactoring a stream of unpredictable AI output. The Rise of AI-Native Frameworks To counter this chaos, a new category of tools and methodologies is taking shape. These AI-native frameworks provide the guardrails and structured workflows needed to turn a powerful but erratic AI tool into a reliable engineering partner. The core idea is to move from a conversational, “vibe-driven” approach to an intent-driven one, where your plan becomes a version-controlled artifact that guides the AI. We are seeing this trend emerge in various forms: Spec-Driven Workflows like GitHub’s Spec-kit.Agile-Inspired Methodologies like the BMad Method.Test-Driven Development (TDD) Partners like Aider.Autonomous Agentic Systems like MetaGPT and SWE-agent. While all these frameworks share common goals, their approaches can be quite different. To illustrate this, let’s zoom in on two prominent examples, Spec-kit and the BMad Method. Understanding their distinct philosophies, the first one is tactical and developer-centric, whereas the other is strategic and team-oriented. A Tale of Two Philosophies Spec-kit focuses on feature-level “spec-to-code” generation, whereas the BMad Method focuses on Full project lifecycle management from idea to QA.Spec-kit is primarily for individual developers, whereas the BMad Method is preferable for the entire agile team (PMs, architects, Devs).Spec-kit is capable of rapidly and reliably scaffolding code from a clear, version-controlled specification, whereas the BMad Method is great in integrating AI agents into existing Agile/Scrum processes at a strategic, cross-functional level. This comparison shows there isn’t a single “best” framework, only the one that best fits the task at hand. You wouldn’t use a full project plan to fix a typo, nor would you build a new microservice based on a one-line prompt. Adopting a Framework-Based Approach Before picking a specific tool, the first step is to adopt the mindset. Before your team starts its next AI-assisted project, ask these questions: How do we define our intent? Is there a formal process for creating a specification or plan before we prompt the AI to write code?What is the human’s role? Is the developer positioned as a clear-eyed reviewer and approver at critical checkpoints?Is the process repeatable? Are our prompts and plans version-controlled?How do we enforce quality? Do we have a mechanism to ensure the AI adheres to our architectural patterns and coding standards? Best-of-Both-Worlds Solution The choice isn’t always ‘either/or.’ The real power of these structured approaches lies in their modularity, allowing teams to combine them to create a workflow that fits their unique needs. A hybrid approach can leverage BMad’s strategic planning with Spec-kit’s tactical execution prowess. Here’s how it could work: Phase 1: Strategic and sprint planning (BMad) Use the BMad Business Analyst and Architect agents to define the project’s vision, create a detailed PRD, and establish the high-level system design.The BMad Scrum Master then breaks down the PRD into user stories for the upcoming sprint. Phase 2: Feature implementation (Spec-kit) A developer picks up a user story from the sprint backlog.They use this user story as the initial prompt for Spec-kit’s /specify command to create a detailed, executable specification.They then run through the /plan, /tasks, and /implement phases to generate high-quality, compliant code that perfectly matches the spec. Phase 3: Quality assurance and integration (BMad) The code generated by Spec-kit is submitted for review.The BMad QA Agent is then invoked to perform an initial review, checking the implementation against the original user story and acceptance criteria, completing the loop. This hybrid model creates a seamless workflow where high-level project management flows directly into low-level, spec-driven code generation, giving you end-to-end control, consistency, and quality. Moving from vibe coding to a structured framework is the next logical step in the evolution of our industry. It’s how we transform AI from a clever shortcut into a strategic asset that delivers predictable, high-quality, and cost-effective results. It’s how we build the future, responsibly.

By Akash Lomas
How Does a Scrum Master Improve the Productivity of the Development Team?
How Does a Scrum Master Improve the Productivity of the Development Team?

The role of a Scrum Master is to establish Scrum, and the Scrum Master is accountable for the Scrum Team’s effectiveness. Thus, it is quite tempting to ask how a Scrum Master can help improve the productivity of the development team. But, in a complex working environment like software development, productivity is often not the right measure to showcase all the complexities of software developers’ knowledge work. In simple working environments, productivity means a ratio of output to input. The traditional idea is to know how much is achieved (output) with a given amount of resources (inputs), largely in numbers, and the focus is on maximizing the output. That’s why, in traditional project management of software development projects, stakeholders evaluate the development team’s productivity based on the lines of code. Or, even today in Agile project management, stakeholders with a traditional mindset ask for the number of story points per iteration, known as Sprint Velocity. But productivity in a complex working environment like Agile software development is not linear. Factors like customer satisfaction, business value, and project success matter more than working at the highest efficiency. It is because if a software is not able to deliver the intended business value or solve the exact customer problems, there is no use in building it fast, in the least possible time. It will be a waste of time and money. Having said that, it does not mean there are no opportunities to improve productivity. There are operational efficiencies that can hinder productivity. And it is the responsibility of a Scrum Master to address the operational inefficiencies to improve the productivity of the development team because actions of a Scrum Master have a direct impact on the team’s efficient functioning. In this post, we will look at the four primary ways a Scrum Master can help improve the productivity of the Scrum Team. I would rather call it ways to ‘improve effectiveness’ because we also have to focus on ensuring the development team delivers the software of the highest business value and customer satisfaction most effectively. Four Ways a Scrum Master Improves Development Team Productivity Here are four ways a Scrum Master can contribute: 1. Facilitating Scrum Each Scrum event (Sprint, Sprint Planning, Daily Scrum, Sprint Review, and Sprint Retrospective) has a purpose. The official Scrum guide says, “Each event in Scrum is a formal opportunity to inspect and adapt. Events are used in Scrum to minimize the need for meetings not defined in Scrum.” And it is true. Modern-day complexities in software development, such as customer-centric product development, changing market trends, and competitors' developments, require continuous collaboration among developers, stakeholders, and product owners to inspect and adapt. Too many meetings can hinder the productivity of the developers. By facilitating each Scrum event at the right time and in the right order, the Scrum Master eliminates unnecessary meetings, ensuring the team communicates, inspects, and adapts at the right time to produce the most valuable work. These Scrum meetings also provide an opportunity to address a team’s operational inefficiencies, resulting in improved productivity. Let’s understand it by an example. The purpose of Sprint planning is to bring clarity and consensus on what needs to be done for the development team. It must happen at the beginning of the sprint to ensure everyone has clarity and a mutually agreed and shared understanding of the Definition of Done (DOD), Product goal, Sprint goal, Increments to be delivered, and External dependencies. The Scrum Master ensures that all key stakeholders (Product Owner, Developers, and Scrum Master) are present at the sprint planning meeting, and their concerns are addressed. Similarly, for each Scrum ceremony — Daily Scrum, Sprint Review, and Sprint Retrospective — the Scrum Master ensures it serves its intended purpose. By facilitating these events, the Scrum Master ensures the team works most effectively, resulting in improved productivity while delivering the most effective work. 2. Removing Impediments Scrum focuses on getting feedback early and often from the customers. This is the reason why sprints are of short duration. If there are any blockers, obstacles, or other impediments to obtaining early and frequent customer feedback, it is the responsibility of the Scrum Master to remove those impediments. To give you an example of an impediment, consider that the deployment of the increment is delayed due to some external dependencies, such as a bureaucratic deployment process or complex dependency chains with other teams. This delays the customer feedback that can prevent potential improvements in the next sprint. It is the responsibility of the Scrum Master to streamline deployment processes, remove blockers, and gather feedback from customers early. This is one example of a hindrance. Hindrances could be anything from an unclear Definition of Done to a poor estimate of Story points, a lack of required technological resources, and context switching. 3. Empowering the Team to be Self-organizing A Scrum Team is a self-organizing team. It means the developers are the ones who decide: What work to do?When to do the work?How to do the work?How do engineers, designers, and testing experts work together?Who does the work?What technologies to use?What architecture and UX to use? Even a Scrum Master does not dictate the way development teams organize, plan, and manage the work. The 11th principle of Agile Manifesto says, “The best architectures, requirements, and designs emerge from self-organizing teams.” However, it is definitely the responsibility of the Scrum Master to coach the development team in self-organization and cross-functionality. The Scrum Master has to ensure the team is collaborating effectively and accountable. To achieve this, the Scrum Master can create an environment that fosters open collaboration, where the Scrum Team collaborates on solving problems independently and feels psychologically safe and encouraged to contribute. This autonomy and accountability remove operational inefficiencies and promote faster decision-making. If a team needs resources, guidance, and any support, the Scrum Master is there as a facilitator and a servant leader to provide the resources the team needs to function optimally and effectively. Based on the experience of the Scrum Team, the involvement of the Scrum Master varies. Having said that, it is supposed that the best Scrum Teams are capable of self-organizing, planning, identifying, adapting, and resolving their own impediments. It is the fine balance of authority and autonomy that a Scrum Master needs to master. 4. Removing Barriers Between Stakeholders and a Scrum Team Software development does not go as smoothly as it appears on paper. It is challenging to bring all the stakeholders on the same page. That’s exactly why the Scrum Team has a Scrum Master. They bridge the gap between the Scrum Team, the Product Owner, and the Organization. The Scrum Master facilitates collaboration among stakeholders as requested or needed and helps them understand the complexities of each other’s work. This improves the flow of work by addressing the complex issues, securing necessary resources, and bringing clarity to the priorities, needs, and expectations. Conclusion Productivity is not the goal of the Scrum Team, but effectiveness is. It is because ultimately nothing is more wasteful than building software that no one wants. And undoubtedly, the actions of a Scrum Master have a direct impact on the team’s productivity, efficiency, and effectiveness. By leading the team in Scrum, addressing the operational inefficiencies, and facilitating collaboration among the stakeholders, a Scrum Master can help improve the productivity of the development team.

By Sandeep Kashyap
AI Code Generation: The Productivity Paradox in Software Development
AI Code Generation: The Productivity Paradox in Software Development

Measuring and improving developer productivity has long been a complex and contentious topic in software engineering. With the rapid rise of AI across nearly every domain, it's only natural that the impact of AI tooling on developer productivity has become a focal point of renewed debate. A widely held belief suggests that AI could either render developers obsolete or dramatically boost their productivity — depending on whom you ask. Numerous claims from organizations linking layoffs directly to AI adoption have further intensified this perception, casting AI as both a disruptor and a catalyst. In this article, we'll examine the current landscape and delve into recent studies and surveys that investigate how AI is truly influencing developer productivity. Studies Let's explore the findings from the studies below, which assess the impact of AI tooling on developer productivity. Study #1: Experienced Open-Source Developer Productivity To evaluate the impact of AI coding assistant tools on the productivity of experienced open-source developers, a randomized controlled trial (RCT) was conducted from February to June 2025 using the tools. A total of 16 developers with an average of 5 years of experience were chosen to complete a total of 246 tasks in mature projects. These tasks were randomly assigned among developers, with either AI tools being allowed or disallowed, respectively. Before starting tasks, developers forecast that task completion time would decrease by 24% with AI. After completing the task, developers estimated that with AI, the completion time had been reduced by 20%. However, on the contrary, the study found that allowing AI actually increased task completion time by 19%. Moreover, these results are in stark contradiction of experts prediction of task completion time reduction of up to ~39%. Below is the summary of the prediction and findings mismatch: Experts and study participants misjudged the speedup of AI tooling. Image courtesy of respective research. Although the study concludes that AI tooling slowed developers down, it could be due to a variety of factors, with five key factors for observed slowdown listed below: Over-optimism about AI usefulness (Direct productivity loss). Developers are free to use AI tools as they see fit, but their belief that AI boosts productivity is often overly optimistic. They estimate a 20–24% time reduction from AI, even when the actual impact may be neutral or negative, potentially leading to overuse.High developer familiarity with repositories (Raises developer performance). AI assistance tends to be less helpful, and may even slow developers down, on tasks where they have high prior experience and need fewer external resources. Developers report AI as more beneficial for unfamiliar tasks, suggesting its value lies in bridging knowledge gaps rather than enhancing expert workflows.Large and complex repositories (Limits AI performance). Developers report that LLM tools struggle in complex environments, often introducing errors during large-scale edits. This aligns with findings that AI performs worse in mature, large codebases compared to simpler, greenfield projects.Low AI reliability (Limits AI performance). Developers accept less than ~44% of AI-generated code, often spending significant time reviewing, editing, or discarding it. Even accepted outputs require cleanup, with ~75% reading every line and ~56% making major changes, leading to notable productivity loss.Implicit repository context (Limits AI performance, raises developer performance). AI tools often struggle to assist effectively in mature codebases due to a lack of developers' tacit, undocumented knowledge. This gap leads to less relevant suggestions, especially in nuanced cases like backward compatibility or context-specific edits. Due to these factors, the gains of auto-code generation are offset considerably, and thus the significant contrast in perceived/forecasted and actual results in developer productivity is exposed. Also, with the AI tooling, the developer is required to spend additional time on prompting, reviewing AI-generated suggestions, and integrating code outputs with complex codebases. Thus, adding to the overall completion time. See below for average time spent per activity — with and without AI tooling. Average time spent per activity. Image courtesy of respective research. Takeaway: The study reveals a perception gap where AI usage subtly hampers productivity, despite users believing otherwise. While findings show a slowdown in large, complex codebases, researchers caution against broad conclusions and emphasize the need for rigorous evaluation as AI tools and techniques continue to evolve. Thus, the study should merely be considered as a data point in evaluation and not a verdict. Study #2: GitClear The GitClear study analyzed ~211 million structured code changes from 2020 to 2024 to assess how AI-assisted coding impacts developer productivity. It categorized changes — like added, moved, copied/pasted, and churned lines — using GitClear's Diff Delta model to track short-term velocity versus long-term maintainability. Duplicate block detection was introduced to measure how often AI-generated code repeats existing logic. The methodology links rising output metrics to declining code reuse, revealing hidden costs in perceived productivity gains. Below is the trend of code operations and code churn by year as cited in the report. GitClear AI Code Quality Research — Code operations and code churn by year. Image courtesy of respective research. The following points can be inferred from the study: Increased code output: AI-assisted development led to a significant rise in the number of lines added, up 9.2% YoY in 2024. This could be perceived as an increase in developer productivity due to faster code generation and higher task (ticket) completion throughput. However, the key question remains — are the added lines of code required in the first place?Decline in refactoring (“moved” code): “Moved” lines — an indicator of refactoring — dropped nearly 40% YoY in 2024, falling below 10% for the first time. This can be attributed to the developer accepting the AI-generated code as-is and skipping the effort to refactor (to save time). Moreover, AI tools rarely suggest refactoring due to limited context windows, and thus fuel the overall drop.Surge in copy-and-pasted and duplicated code. Copy/pasted lines exceeded moved lines in 2024, with a 17.1% increase YoY. Commits with duplicated blocks (≥5 lines) rose 8x in 2024 compared to 2022. 6.66% of commits now contain such blocks. This, too, can be attributed to the developer accepting the AI-generated code as-is without much effort to keep the code DRY.Increased churn in newly added code. Churn — code revised within 2–4 weeks — increased 20–25% in 2024, i.e., developers are revisiting new code more frequently. This also implies that although the code output surged with AI tooling, due to low quality, code is being revised sooner than it used to happen earlier (when no or limited AI tooling was utilized). Takeaway: The rise in AI-generated code has led to a parallel increase in copy-pasted fragments, duplication, and churn — while refactoring efforts have notably declined. This trend signals a deterioration in overall code quality. Many organizations still gauge developer productivity by metrics like lines of code added or tasks completed. However, these indicators can be easily inflated by AI, often at the expense of long-term maintainability. The result is bloated codebases with higher duplication, reduced clarity, and an expanded surface area for bugs. While AI may boost short-term development velocity, the trade-off is accumulating technical debt and diminished code quality — costs that will surface over time in the form of increased maintenance burden and reduced agility. Surveys While studies often rely on data-driven methodologies, these approaches can sometimes be questioned for their assumptions or limitations. Surveys, on the other hand, offer direct insight into developer sentiment and can help bridge gaps that traditional studies might overlook. In the sections below, we explore findings from independent surveys that assess the impact of AI tools on developer productivity. Survey #1: StackOverflow In its 2025 annual developer survey, Stack Overflow received over 49k responses, covering various aspects, including AI tooling and its related impact. Do note that I, too, was one of the respondents. Among respondents, overall AI tool usage surged to ~84% from ~76% the previous year. The AI tool positive sentiment however dropped by ~10 percentage points signaling a trust deficit by the developers— more on this later. AI tools usage and sentiment. Image courtesy of respective survey results. Among the respondents, ~46% of users actively distrust the accuracy of AI tools. Moreover, ~66% cited that the AI tools solution is not up to mark, and ~45% cited that these solutions require additional debugging time. This clearly means the developer requires additional effort to understand, debug, and potentially refine AI-generated code, effectively increasing overall task completion time. Although trust in AI tools' ability to handle complex tasks surged by ~6 percentage points, this could be due to AI tool enhancements or to developers' overall lack of trust in AI tools' accuracy. Thus, totally avoiding AI tools for any complex tasks given the risk it carries in terms of quality and other aspects. Given the significant trust deficit in the accuracy of AI tools, the decline in positive sentiment seen in the previous section could be well related. Trust in AI tools accuracy and ability to handle complex tasks. Image courtesy of respective survey results. Frustrations with AI tools and humans as the ultimate arbiters of quality and correctness. Image courtesy of respective survey results. Even though the AI agents adoption isn't mainstream more than half of the respondents ~52% cited productivity gains. The AI Agents perhaps could be a space worth watching for as they are relatively new thus a lot of potential enhancement could follow in upcoming years. Moreover, given the contextual information it utilizes to generate code, they seem promising over the simpler AI tools. AI agents and impact on work productivity. Image courtesy of respective survey results. Takeaway: The survey revealed a sharp rise in AI tool adoption accompanied by a notable drop in positive sentiment highlighting a growing trust deficit. Majority of respondents expressing active distrust in AI tool accuracy, due to subpar solutions, suggesting that AI-generated code often demands extra effort to refine and validate. This offsets the productivity gain from faster code generation. Interestingly, trust in AI tools' ability to handle complex tasks rose, reflecting cautious optimism rather than full confidence. Developers still see themselves as the ultimate judges of code quality, reinforcing the need for human oversight. Meanwhile, AI agents — though not yet widely adopted — show early promise. Their use of contextual information positions them as a potentially more reliable and efficient evolution of current AI tooling. Survey #2: Harness Harness surveyed 500 engineering leaders and practitioners to assess various parameters, including the impact of AI on developer productivity. Although the surveyed participants showed overall positive sentiments towards AI tooling and its adoption, 92% also highlighted the associated risks. In an independent related observation, the risks are corroborated. AI Missteps and Impact Radius. Image courtesy: https://martinfowler.com/articles/exploring-gen-ai/13-role-of-developer-skills.html Almost two-thirds of respondents mentioned that they spend more time debugging AI-generated code and/or resolving security vulnerabilities. AI tooling may also generate code that includes outdated dependencies or insecure coding patterns, requiring developers to spend time updating and patching these vulnerabilities. This significantly increases the developer overhead and potentially offsets a considerable part of the productivity gains with AI tooling. Two-third of respondent requires more time debugging AI generated code and/or resolving security vulnerabilities. Image courtesy of respective survey results. About 59%nearly half offsets the gains due to rework or additional efforts 59% of developers experience deployment problems with AI tooling involved. Image courtesy of respective survey results. Since 60% of the respondents don't evaluate the effectiveness of the tools, it's quite challenging to relate it to developer productivity altogether. 60% of respondents don't evaluate the effectiveness of AI tooling. Image courtesy of respective survey results. Takeaway: The survey reveals a nuanced picture of AI's impact on developer productivity. While most respondents expressed optimism about AI tooling, but also flagged significant risks. Notably, the majority reported spending more time debugging AI-generated code and addressing security vulnerabilities — contradicting the assumption that AI always boosts efficiency. Deployment issues further compound the overhead, with many encountering frequent rework. The lack of tool effectiveness evaluation by many respondents underscores the challenge of accurately measuring productivity gains. Overall, the findings highlight that AI adoption demands careful oversight to avoid offsetting its intended benefits. Conclusion The studies and surveys analyzed paint a complex picture of AI's role in software development, revealing that perceived productivity gains often mask deeper issues. While AI tools may accelerate coding tasks, they also introduce duplication, churn, and technical debt — especially in large codebases — undermining long-term maintainability. Trust in AI-generated code remains fragile, with developers frequently needing to debug and refine outputs. This erodes efficiency, offsets gain from faster code generation, and highlights the importance of human oversight. Crucially, coding represents only a fraction of the overall software delivery cycle. Improvements in cycle time don't necessarily translate to gains in lead time. Sustainable productivity demands more than speed — it requires thoughtful architecture, strategic reuse, and vigilant monitoring of maintainability metrics. In essence, AI can be a powerful accelerator, but without deliberate human intervention, its benefits risk being short-lived. References and Further Reads Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer ProductivityGitClear Code Quality Study — 2024 | 2025Harness — State of Software DeliverySO Developer Survey 2025Role of Developer Skills in Agentic Coding

By Ammar Husain DZone Core CORE
AIOps to Agentic AIOps: Building Trustworthy Symbiotic Workflows With Human-in-the-Loop LLMs
AIOps to Agentic AIOps: Building Trustworthy Symbiotic Workflows With Human-in-the-Loop LLMs

Editor’s Note: The following is an article written for and published in DZone’s 2025 Trend Report, Intelligent Observability: Building a Foundation for Reliability at Scale. Imagine a world where the 3:00 AM PagerDuty alert doesn’t lead to a frantic scramble, but rather to a concise summary of the problem, a vetted solution, and a one-click button to approve the fix. This transformative capability represents the next frontier of AIOps (artificial intelligence for IT operations), powered by agentic AI systems that are designed to perceive, reason, act, and learn. This shift promises a significant reduction in mean time to resolution (MTTR) but critically relies on human-in-the-loop (HITL) safeguards to ensure accountability and prevent issues like AI hallucinations. This tutorial will serve as a practical, step-by-step guide for engineers and tech ops teams and leaders. We will attempt to illustrate and sketch out the details of how to construct a scalable, secure, resilient workflow using large language model (LLM) agents for smart alert triage, context summarization, and most importantly, gated runbook execution. Such an agentic mechanism should eventually be the pioneering framework for the upcoming self-healing systems. Prerequisites of Agentic AIOps Before beginning the construction of the first line of code, we need to define the theoretical base, factors, and conditions for agentic AIOps to be an option. Defining Agentic AIOps Agentic AIOps fundamentally overhauls the relationship between AI and digital operations. It goes beyond models, which are just classifiable or predictive ones. The cornerstone of this evolution is a software entity that possesses four key attributes, allowing it to move beyond passive observation: Perception – the ability to ingest and understand data from the environment (e.g., observability data)Reasoning – the use of an LLM and structured data (tools) to formulate a goal and planAction – the capacity to execute the plan through external tools (e.g., calling an API, running a script)Learning – the ability to refine its performance based on feedback from its actions and human input For this model, rather than simply reacting to an alert, the agent is taking proactive ownership of a defined task (e.g., diagnosing a microservice failure and suggesting a well-tested, workable fix). Understanding the Human-in-the-Loop Approach HITL is the critical safety lock on AI autonomy. For AIOps, it creates a division of duties that is distinct: Agents handle routine – High-volume, low-risk, and repetitive tasks — like classifying alerts, fetching diagnostic context, and correlating related events — are fully automated.Humans authorize risk – Actions taken to alter the operating environment of a production system — like creating new configurations, restarting work, rolling back a deployment, or changing a configuration — must be processed through an HR gate that is entirely human controlled. This architecture assures responsibility and allows the agent to correct or pause its action if its reasoning is wrong (a hallucination) or if the proposed action violates a critical business constraint. It transforms the SRE from “firefighter” to “trusted approver” and “AI model trainer.” The Observability Trifecta LLM agents are only as good as the situation they find themselves in. In order to engage in sophisticated reasoning, the system should reach into an observability trifecta: logs, traces, and metrics. This suite of data makes up the context trifecta, which, when synthesized by an LLM, can mirror a full incident bridge meeting — in brief paragraphs — for a human team or an advanced agent. Technical Stack Overview Implementation of agentic AIOps involves integration of multiple component parts that have to be combined to realize the success. The general high-level stack like this one generally contains the components found in Table 1: Component Category Example Technologies/Concepts Purpose in Agentic AIOps Foundation LLM Enterprise-governed models (e.g., any cloud hosted) The “brain” for reasoning, summarization, and action planning Agent framework LangGraph, LangChain, etc. Provides the state machine and abstraction layer for defining agent personas and orchestrating their collaboration Observability/data OpenTelemetry, Prometheus, vector database Ingestion and storage of the logs, traces, and metrics needed for LLM context retrieval Security and gating Role-based access control (RBAC), Open Policy Agent (OPA) Enforcing security policies, defining human approval rights, and implementing Policy as Code for automated action checks Execution/automation Ansible, Terraform, Kubernetes APIs such as CI/CD tools The interface for agents to execute approved, low-risk, or gated actions HITL interface Slack/Teams APIs, PagerDuty/Jira integration The human-facing communication and approval channel for all high-risk actions Table 1. AgenticOps technical stack System Architecture and Innovations The architecture of an agentic AIOps system must be engineered for both speed and safety. It’s a structured pipeline designed to manage the flow of data, intelligence, and execution with multiple funnel points for human and policy oversight. High-Level Design Figure 1 describes this intricate apparatus consisting of five key modules: the ingestion layer, agentic core, HITL gatekeeper, execution later, and feedback loop. Figure 1. Agentic AIOps system architecture Ingestion Layer This is the front door for all the operational data. To support this layer, we use real-time data from existing observability tools — whether they conform to the OpenTelemetry specification or utilize proprietary agents — to stream. The most important innovation here is getting ready for the LLM. The logs, traces, and metadata are indexed and translated to embeddings, which are stored in a vector database (the operational “knowledge base”) to perform rapid semantic retrieval. Agentic Core This is the primary intelligence engine. A multi-agent unit run by a Supervisor Agent: Triage Agent is the first responder, which classifies the incoming alert (e.g., seriousness, service owner) and correlates it with either recent deployment events or related past incidents.Summarizer Agent uses retrieval-augmented generation (RAG) to query the vector database and pull relevant logs, traces, and metrics and synthesize a coherent, plain-language incident summary.Runbook Proposer Agent either takes the summary of the context and maps it to a set of preapproved, executable runbooks or generates a new runbook or a new action script by looking at the previous ones (e.g., based on historical resolving scenarios). HITL Gatekeeper This is the critical safety barrier. The proposed action (e.g., Restart the Payment Service) is translated into a crisp approval card and sent through a standalone Slack/Teams bot to the on-call SRE. Additionally, the system should interface with an escalation system like PagerDuty so that the correct person will be notified within the SLO. This gate completely limits the agent execution privileges. Execution Layer The action is run after human approval. This layer was created for defensive deployment techniques: Gated executor is the component that runs the actual action (e.g., if I’m calling my Kubernetes cluster by API).Canary/rollback, used for essential changes, wraps the action in defensive mechanisms (e.g., Istio traffic splitting, any other rolling rollout strategy) to test the change on a small slice of users, the canary. It is preconfigured to roll back in real time when health checks go south. Feedback Loop The system is intended to learn from every interaction. The output of the human as approval or rejection, success or failure of execution, and the final MTTR are all fed back into a training module. This employs a version of Reinforcement Learning from Human Feedback (RLHF) to refine the agents’ reasoning and runbook proposal strategies periodically. Multi-Agent Collaboration The architecture uses a collaborative “team” rather than a single monolithic agent. While the Supervisor Agent is responsible for the flow, the specialized agents are designed to “debate” or cross-validate their conclusions. For example, the Triage Agent classifies the alert. The Summarizer Agent might challenge that classification if the underlying logs suggest another primary service is failing. This internal, automatic peer review process greatly increases the accuracy of our recommendation; it eliminates any single point of failure in the reasoning chain. Hybrid RAG and GraphRAG Conventional RAG only fetches chunks of text based on its semantic similarity. But this is inadequate for AIOps. A service failure is about logs and dependencies. This problem was resolved by GraphRAG, which layers a knowledge graph over the data. The underlying data in GraphRAG covers all the dependencies of services — for example, “The Checkout Service relies on the Inventory Service and Payment Gateway.” The agent first queries the graph during an alert to comprehend the impact and upstream/downstream dependencies. Then, that RAG query looks only for the logs and traces on the affected services detected by the graph. Since the agent is reasoning over structure and text in parallel, this combination results in far faster and more accurate root cause analysis. Zero-Trust Gating A zero-trust principle needs to guide the execution layer; no action, even if it is authorized by a human, is inherently safe, unless its intent and context are validated against policy. This is made possible via Policy as Code, using such tools as Open Policy Agent (OPA). Dynamic checks of the proposed action payload: Scope check – Is the behavior only within the service specified in the alert? (e.g., was the agent attempting to restart the entire cluster for one pod failure?)RBAC check – Does the approving human in fact have the appropriate security role via the HITL Gatekeeper to authorize this? Context check – Is the present time window (e.g., peak sales hour) relevant to prohibit this high-risk action? The execution environment is then not released until these automated, dynamic policy checks are completed. Step-by-Step Implementation Guide This section provides a conceptual walk-through of building the agentic workflow using common open-source principles and tools. Step 1: Set Up the Dev Environment Begin by establishing the foundation: Dependencies – Install the necessary LLM libraries, the agent framework (e.g., LangGraph), and client libraries for observability data access.Observability config – Configure a basic microservice application to emit logs, traces, and metrics via OpenTelemetry. This is the standard abstraction layer that future-proofs the system against vendor lock-in.Sample alert generation – Create a simple trigger mechanism (e.g., a script that injects an alert into a Kafka queue or an alert manager) to simulate an incident flow. Step 2: Build the Data Ingestion Pipeline The goal is to prepare the operational data for semantic search: Stream data – Use an OpenTelemetry Collector or similar tool to funnel the raw logs, traces, and metrics.Vectorization – For unstructured data (logs), use an embedding model to convert the text into high-dimensional vectors.Storage – Store these vectors in a vector database. This allows the Summarizer Agent to pose sophisticated, natural language questions or requests, such as “Show me all logs related to user authentication errors in the last 15 minutes that correlate with an increase in P99 latency.” Step 3: Implement the Triage Agent This agent must be fast and accurate as it sets the stage for the entire response: Define persona – The agent’s LLM prompt should define a clear persona, such as “Alert Triage Specialist,” instructing it to be concise, factual, and strictly adhere to the company’s severity classification policy.Classification and correlation – The agent’s first action is a classification API call (using the LLM) to assign severity and service ownership. Its second action is to query the vector database to identify related events (e.g., a recent deployment, a network change, another concurrent alert).Output – This is a structured JSON object containing the service_name, severity_level, and a list of correlated_event_IDs. Step 4: Develop the Context Summarizer This context summarizer agent turns raw data into actionable intelligence. It implements the RAG pipeline: Retrieval – Using the Triage Agent’s output (service name, correlated events), the Summarizer Agent executes a vector search and a graph search (Hybrid RAG) to pull all relevant logs, traces, and metrics.Generation – The LLM is prompted with the retrieved data and instructed, “Based on the following raw operational data, generate a single, non-speculative, three-paragraph summary covering: 1) What is failing? 2) When did it start? 3) What is the likely root cause (with evidence)?”Output – This is the final concise incident summary ready for human consumption. Step 5: Create the Runbook Proposer Runbook mapping – The agent is given access to a runbook repository (preferred to be a structured JSON list or private GitHub repository) where each entry is tagged with its service, failure_type, and execution_payload.Reasoning – The agent uses the Summarizer’s context to select the most appropriate runbook. For an unprecedented event, it might generate a proposed new runbook, clearly labeled as “Experimental.”Output – This is a structured proposal containing the runbook_ID, a justification for its selection, and the execution_payload (e.g., a shell script, Kubernetes YAML patch). Step 6: Integrate HITL Gates This is the most critical step for building trust: Approval workflow – The proposed solution is sent to a Slack/Teams bot, which acts as a secure intermediary. It shows a Triage Summary, Context Summary, and Runbook Proposal with action buttons: “Approve & Execute” and “Reject & Escalate” — or something else that suits your WoW style of working.Policy checks – Before the “Approve” button triggers any action, the Zero-Trust Gatekeeper (powered by OPA) checks the action’s details against current policies. This engine verifies the human’s role and the context of the action.Defensive execution– Once both policy and human approval are given, the execution layer automatically adds predefined safeguards to the runbook’s execution: Canary deployment – If the action involves a deployment, it’s initially rolled out to only 5% of traffic.Auto-rollback – Preconfigured health checks start immediately. If any fail, an automatic rollback is triggered, stopping the execution and alerting the human. Step 7: Put the Whole Workflow Together The Supervisor Agent connects everything using a State Machine (easily managed by a framework like LangGraph): Flow – The supervisor defines how things move from one stage to the next, as shown in Figure 2.End-to-end testing – Use the simulated alerts from Step 1 to test the entire process. The goal is to ensure that a simple alert leads to a complete context summary and a proposed action that needs human approval before it’s carried out. This confirms both the speed of the intelligence and the reliability of the safety features. Figure 2. Agentic AIOps full workflow Conclusion: The Future Is Agentic Observability This step-by-step guide illustrates that the ongoing transition is more than just an upgrade to agentic AIOps; it fundamentally redefines digital engineering and operations. This shift represents a potential paradigm in how we handle and respond to intelligence within our digital operations. The future of IT operations hinges on fostering a collaborative partnership between human experts and intelligent AI agents. By embedding HITL principles, we create systems that are not only powerful and fast but more importantly, are safe, trustworthy, and accountable. The agent handles the cognitive load of sifting through petabytes of data, and the human retains ultimate authority over the live environment. The journey toward this self-healing, agentic environment doesn’t require a “big bang” overhaul. Start small by identifying one persistent friction point in your incident response lifecycle — perhaps the tedious, repetitive task of context summarization — apply the agentic principles, and build a secure, gated workflow from there. This model is the foundation for proactive resilience, which is an evolution that elevates digital engineering from reactive fixes to intelligent self-management. The age of the autonomous yet overseen operations agent has arrived. This is an excerpt from DZone’s 2025 Trend Report, Intelligent Observability: Building a Foundation for Reliability at Scale.Read the Free Report

By Pratik Prakash DZone Core CORE

Top Team Management Experts

expert thumbnail

Otavio Santana

Award-winning Software Engineer and Architect,
OS Expert

Otavio is an award-winning software engineer and architect passionate about empowering other engineers with open-source best practices to build highly scalable and efficient software. He is a renowned contributor to the Java and open-source ecosystems and has received numerous awards and accolades for his work. Otavio's interests include history, economy, travel, and fluency in multiple languages, all seasoned with a great sense of humor.

The Latest Team Management Topics

article thumbnail
SPACE Framework in the AI Era: Why Developer Productivity Metrics Need a Rethink Right Now
AI coding tools boost commit metrics, but hide deeper issues. Learn how the SPACE framework reveals real developer productivity beyond traditional DevOps metrics.
April 21, 2026
by Sreejith Velappan
· 822 Views
article thumbnail
AI-Powered Dev Workflows: How SWEs Are Shipping Faster in 2026
Boost your velocity with AI-orchestrated workflows. Learn best practices for prompt engineering, automated reviews, and secure code generation.
April 17, 2026
by Jubin Abhishek Soni DZone Core CORE
· 1,271 Views · 1 Like
article thumbnail
The Platform or the Pile: How GitOps and Developer Platforms Are Settling the Infrastructure Debt Reckoning
By a technology correspondent who has spent the better part of a decade watching engineering teams drown in YAML they wrote themselves.
April 15, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 1,976 Views
article thumbnail
The Hidden Engineering Cost of XML in Enterprise Development Workflows
XML is still common in enterprise systems, but manual creation and schema changes create avoidable errors and debugging overhead.
April 14, 2026
by Moeez Ayub
· 1,228 Views
article thumbnail
Building a State-Driven Workflow Engine for AI Applications
This article presents a Shell and Node pattern that decouples state, nodes, and routing into composable units, instead of traditional if/else chains.
April 7, 2026
by horus he
· 2,365 Views · 2 Likes
article thumbnail
Building a Video Evidence Layer: Moment Indexing With Timecoded Retrieval
Learn the Moment Indexing Pattern to build a Video Evidence Layer using OCR and ASR that provides verifiable, timecoded answers for knowledge management.
March 31, 2026
by Punitha Ponnuraj
· 1,543 Views · 2 Likes
article thumbnail
Beyond the Black Box: Implementing “Human-in-the-Loop” (HITL) Agentic Workflows for Regulated Industries
Stop "Black Box" AI risk. Master the Commit Boundary pattern to secure autonomous agents. Build "Audit-Ready" AI with typed schemas and risk-scoring.
March 18, 2026
by Rahul Kumar Thatikonda
· 2,807 Views · 3 Likes
article thumbnail
AI in Enterprise Content Workflows: What You Need to Know
AI streamlines enterprise content workflows by automating document handling, enhancing accuracy, insights, and efficiency while reducing manual effort.
March 11, 2026
by Jake Miller
· 2,792 Views · 2 Likes
article thumbnail
Building a Unified API Documentation Portal with React, Redoc, and Automatic RAML-to-OpenAPI Conversion
Learn how to build a modern static API documentation portal that supports both OpenAPI 3.x and RAML 1.0 specifications with automatic conversion.
March 11, 2026
by Sreedhar Pamidiparthi
· 4,785 Views
article thumbnail
Shifting Bottleneck: How AI Is Reshaping the Software Development Lifecycle
Continuous data-driven insights are critical for a successful and lasting AI adoption that goes beyond the initial hype phase.
March 10, 2026
by Ralf Huuck
· 2,871 Views · 2 Likes
article thumbnail
Mastering GitHub Copilot in VS Code: Ask, Edit, Agent and the Build–Refine–Verify Workflow
GitHub Copilot in VS Code isn’t “just autocomplete.” It’s a set of interaction modes — Ask, Edit, and Agent in the Chat Panel.
February 26, 2026
by Hanna Labushkina
· 1,865 Views
article thumbnail
From Command Lines to Intent Interfaces: Reframing Git Workflows Using Model Context Protocol
Model Context Protocol enables intent-driven GitHub workflows in the IDE, replacing command sequences with safe, structured natural language interactions.
February 20, 2026
by Aishwarya Murali
· 1,569 Views
article thumbnail
The Human Bottleneck in DevOps: Automating Knowledge with AIOps and SECI
DevOps pipelines are often automated, yet the operations side remains surprisingly manual. Here’s a framework to reduce toil using AIOps and the SECI model.
February 13, 2026
by Dippu Kumar Singh
· 1,394 Views · 1 Like
article thumbnail
Serverless Is Not Cheaper by Default
A clear-eyed breakdown of serverless costs — why they’re hidden, when they make sense, and how to choose between functions and containers before surprises hit your bill.
February 13, 2026
by David Iyanu Jonathan
· 1,926 Views · 1 Like
article thumbnail
Safe Vibe Coding in 2026: Mastering the Workflow With Cursor and Automated Guardrails
After a vibe-coded feature crashed in production, we built a safe 2026 workflow using Cursor, automated tests, and guardrails.
February 5, 2026
by Rajiv Gadda
· 735 Views
article thumbnail
Modernizing Applications with the 7 Rs Strategy – A CTO's Guide
The 7 Rs framework helps you choose the smartest path to modernize every application, reduce legacy risk, and align your tech stack with future business goals.
January 29, 2026
by Pranay Parmar
· 2,005 Views · 1 Like
article thumbnail
A Step-by-Step Guide to AWS Lambda Durable Functions
Build long-running workflows by separating orchestration from execution, persisting state, and using events or callbacks to pause and resume without holding compute.
January 20, 2026
by Lakshmi Narayana Rasalay
· 2,947 Views
article thumbnail
Top 5 Payment Gateway APIs for Indian SaaS: A Developer’s Analysis
Compare 5 international payment APIs for Indian SaaS. Choose APIs that enable automated FIRC retrieval — and test the sandbox.
January 19, 2026
by Sarang S Babu
· 4,326 Views
article thumbnail
Supercharge AI Workflows on Azure: Remote MCP Tool Triggers + Your First TypeScript MCP Server
Remote MCP in Azure Functions exposes serverless tools for AI assistants, enabling scalable, cloud-native workflows with Azure services and bindings.
January 13, 2026
by Swapnil Nagar
· 1,927 Views · 1 Like
article thumbnail
UX Research in Agile Product Development: Making AI Workflows Work for People
UX research in agile product development helps teams build AI workflows grounded in real user needs, reducing guesswork and improving ROI.
January 12, 2026
by Priyanka Kuvalekar
· 1,655 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×