Mukunda Rao Katta MukundaKatta

_{Open Source · Recently Shipped · Packages · Projects · Impact · Experience · Stats}

  8+ years building production systems at Fortune 100 scale
  Former SDE at Amazon Web Services  •  Currently at Southwest Airlines
  Deep expertise in ML systems, distributed architectures, and full-stack engineering

Now: shipped the @mukundakatta/agent* reliability stack (fit → guard → snap → vet → cast), 6 matching MCP servers in the official MCP Registry, 3 new GitHub Actions on the Marketplace, and doubled the PyPI footprint to 52 packages (full Python ports of the npm catalog). Plus 40+ open PRs across MCP SDKs, FastMCP, claude-code-action, and Anthropic's agent SDK.

Portfolio at a Glance

_{PUBLIC REPOS}
572

_ORIGINALS
160

_{ACTIVE PROJECTS}
122

_FORKS
412

_ARCHIVED
324

Every repo is indexed in claude-workspace — wired for Multica, Claude Code, Codex, OpenClaw, and Cursor to reason across the portfolio.

Latest Drop · The Agent Reliability Stack

🌐 Live at mukundakatta.github.io/agent-stack — single landing page for the whole 117-package ecosystem (npm + PyPI + MCP Registry + GitHub Marketplace).

🤗 Try it live on the HuggingFace Space · jailbreak fixtures on the HF Dataset.

Five small, focused npm packages that fix the boring problems every long-running agent eventually hits. Pure ESM JavaScript, zero runtime deps, TypeScript types in the box. Designed to compose into a pipeline: fit → guard → snap → vet → cast.

agentfit

_{Fit it.}

Token-aware message truncation with three strategies (drop-oldest, drop-middle, priority). Pluggable tokenizers. Per-model estimators.

agentguard

_{Sandbox it.}

Network-egress firewall: a declarative allowlist of domains agent tools can fetch. Throws on violation, with a clear error.

agentsnap

_{Test it.}

Snapshot tests for tool-call traces. Catch silent regressions in LLM tool use the way you catch UI regressions today.

agentvet

_{Vet it.}

Validate tool args before execution. Wrap any tool function; on bad args, throw a typed error with an LLM-friendly retry hint.

agentcast

_{Validate it.}

Structured-output enforcer. Validate the model's response, retry with the validation error as feedback, return typed data or throw after N attempts. BYO LLM and validator.

npm i @mukundakatta/agentfit @mukundakatta/agentguard @mukundakatta/agentsnap @mukundakatta/agentvet @mukundakatta/agentcast

Each one also ships as an MCP server so Claude Desktop, Cursor, Cline, Windsurf, and Zed can call them directly mid-conversation:

npx -y @mukundakatta/agentfit-mcp     # fit a chat history into a budget
npx -y @mukundakatta/agentguard-mcp   # check URLs against an egress policy
npx -y @mukundakatta/agentsnap-mcp    # diff tool-call traces
npx -y @mukundakatta/agentvet-mcp     # validate tool args + generate retry hints
npx -y @mukundakatta/agentcast-mcp    # extract / validate JSON from LLM text

_{Sibling libraries that share a design philosophy: small, focused, zero-dep, BYO-LLM. Each one solves a single concrete reliability problem so you can pick the ones you need without dragging in a framework. Previous drop, streamparse (streaming JSON parser, npm + Homebrew + MCP Registry), is still in active use.}

Open Source Focus

I contribute practical fixes to AI SDKs, MCP tooling, eval frameworks, agent infrastructure, structured outputs, and developer experience.

My lane is finding the sharp edges that slow builders down: unclear contracts, brittle tool calls, docs that almost answer the question, eval gaps where regressions hide, and AI tooling that needs better failure signals. I like small, reviewable patches with clear intent, and compact packages that turn repeated manual checks into reusable workflows.

Recent contribution areas (merged upstream):

Microsoft — security and architecture docs for internal AI-engineering toolchains (hve-core, physical-ai-toolchain)
Pydantic — pydantic-ai integration with the Vercel AI SDK
Hugging Face ecosystem — safetensors Python bindings, sentence-transformers trainer migration docs
Meilisearch — heed multi-target docs.rs infrastructure
Vercel — next.js documentation
Apache Software Foundation — doc / comment fixes across iceberg, pulsar, skywalking, ozone, iotdb

I keep a public log of selected OSS work in oss-contributions.

Distribution pattern. Each flagship ships as a complete unit, not a single npm package:

library  →  Python port  →  CLI binary  →  GitHub Action  →  Homebrew formula  →  MCP server
   npm           PyPI                           Marketplace        brew tap            npm

So the same problem (mcpcheck, skillint, streamparse) is solvable from any environment a developer or AI assistant happens to be in: a TypeScript app, a Python script, a CI workflow, a terminal, or directly inside Claude / Cursor / Cline / Windsurf / Zed.

Recent OSS Highlights

openai/tiktoken #535 — use a static artifact name for the sdist build job in CI
modelcontextprotocol/typescript-sdk #1966 — return tool input validation failures as Tool Execution Errors (SEP-1303)
modelcontextprotocol/registry #1209 — add Maven Central package source for JVM MCP servers
modelcontextprotocol/python-sdk #2515 — document FastMCP server instructions
openai/openai-node #1831 — improved fallback handling for non-standard JSON error bodies
stanford-crfm/helm #4210 — fixed later-page deep links for run instances

Recently Shipped

Last refreshed 2026-04-27 from npm, PyPI, and the GitHub API.

Latest releases

2026-04-27 · PyPI footprint doubled to 52 packages. Added 26 more Python ports today (mk-agentkit meta + 5 agent infra: agent-loop-breaker-py, agent-regression-lens-py, agent-trajectory-replay-py, tool-call-contracts-py, tool-permission-gate-py · 5 evals/cost/routing: eval-dataset-smith-py, llm-trace-sampler-py, model-fallback-planner-py, model-router-policy-py, ai-supply-chain-manifest-py · 3 tools/safety: tool-result-taint-py, jailbreak-corpus-mini-py, consent-redaction-log-py · 3 RAG: rag-staleness-auditor-py, retrieval-acl-filter-py, context-drift-detector-py · 5 context/prompt: context-forge-py, context-window-packer-py, prompt-token-trim-py, prompt-version-diff-py, llm-response-schema-lite-py · 4 niche: kavach-py, mcpcheck-py, skillint-py, designlint-py)
2026-04-27 · 18 new Python ports on PyPI: partial-json-stream, agentfit-py, agentguard-firewall, agentsnap-py, agentvet-py, agentcast-py, pii-sentry-py, prompt-injection-shield-py, llm-output-sanitizer-py, rag-quality-kit, vector-poison-score, embedding-dedupe, llm-cost-guard-py, semantic-cache-key, eval-flake-detector, citation-integrity-check, hallucination-risk-meter, system-prompt-leak-scan
2026-04-27 · @mukundakatta/agentkit v0.1.0 · npm · meta-package re-exporting all 5 agent-stack libraries
2026-04-27 · 5 of the 5 agent-stack libraries bumped to v0.1.1 with new npx-runnable CLI binaries
2026-04-27 · 3 new GitHub Marketplace Actions: agentvet-action, agentsnap-action, mcp-stack-validate-action
2026-04-27 · 5 new entries in the official MCP Registry: io.github.MukundaKatta/{agentfit, agentguard, agentsnap, agentvet, agentcast}
2026-04-26 · @mukundakatta/agentfit-mcp v0.1.0 · npm · MCP server for agentfit
2026-04-26 · @mukundakatta/agentguard-mcp v0.1.0 · npm · MCP server for agentguard
2026-04-26 · @mukundakatta/agentsnap-mcp v0.1.0 · npm · MCP server for agentsnap
2026-04-26 · @mukundakatta/agentvet-mcp v0.1.0 · npm · MCP server for agentvet
2026-04-26 · @mukundakatta/agentcast-mcp v0.1.0 · npm · MCP server for agentcast
2026-04-26 · @mukundakatta/agentcast v0.1.0 · npm · structured-output enforcer for any LLM
2026-04-26 · @mukundakatta/agentfit v0.1.0 · npm · token-aware message truncation
2026-04-26 · @mukundakatta/agentvet v0.1.0 · npm · tool-arg validator with retry hints
2026-04-25 · @mukundakatta/agentguard v0.1.0 · npm · network-egress firewall for agent tools
2026-04-25 · @mukundakatta/agentsnap v0.1.0 · npm · snapshot tests for tool-call traces
2026-04-25 · @mukundakatta/streamparse v1.0.1 · npm · streaming JSON parser with CLI + Homebrew formula
2026-04-25 · @mukundakatta/streamparse-mcp v1.0.1 · npm + MCP Registry (io.github.MukundaKatta/streamparse)

Recently merged PRs

2026-04-24 · langgenius/dify #35547 — docs: fix Kubernetes deployment wording
2026-04-24 · infiniflow/ragflow #14352 — docs: fix API key guide typo
2026-04-22 · pydantic/pydantic-ai #5156 — fix(vercel-ai): allow regenerate requests without messageId
2026-04-23 · ntop/ntopng #10297 — fix(locales/en): correct display string 'Enstablished' -> 'Established'
2026-04-22 · safetensors/safetensors #753 — fix(python): make SafetensorError picklable

Open PRs (recent batch) — substantive fixes shipped 2026-04-26 across MCP, Anthropic, FastMCP, Apache, Google Cloud, HuggingFace, OpenTelemetry:

modelcontextprotocol/typescript-sdk #1961 — fix SSE reader-lock leak in StreamableHTTPClientTransport
modelcontextprotocol/typescript-sdk #1965 — feat(client): honor Retry-After on HTTP 429 responses
modelcontextprotocol/typescript-sdk #1964 — feat(deps): make HTTP/SSE transport deps optional for stdio-only consumers
modelcontextprotocol/csharp-sdk #1530 — fix(client): preserve underlying status code in AutoDetect probe
modelcontextprotocol/inspector #1231 — feat(auth): support OAuth 2.0 client_credentials grant type
modelcontextprotocol/registry #1209 — feat(sources): add Maven Central package source for JVM MCP servers
anthropics/claude-code-action #1261 — fix(mcp): spawn bundled MCP servers on pull_request events
anthropics/claude-agent-sdk-python #879 — fix(session): generate AI title for SDK-created sessions
PrefectHQ/fastmcp #4071 — feat(openapi): per-call HTTP headers for multi-tenant auth
pydantic/pydantic #13120 — docs(validators): document model_validator execution order with inheritance
huggingface/lerobot #3464 — fix(policy): resolve state-dict naming clash from tied-weight storage views
open-telemetry/opentelemetry-python #5149 — fix(ci): stabilize tracecontext job
apache/skywalking #13845 — docs: BanyanDB 0.10.0 upgrade notes

Published Packages

npm (scope @mukundakatta):

Flagship packages:

Package	Why it matters	Install
@mukundakatta/streamparse _{partial JSON for LLM streams}	Streaming JSON parser that yields partial valid trees as tokens arrive. Render LLM tool calls mid-stream, recover dropped responses, parse messy ` ```json ` blocks. Zero deps, 64 tests. Also published as an MCP server in the official MCP Registry.	`npm i @mukundakatta/streamparse`
@mukundakatta/streamparse-mcp _{MCP: parse partial JSON}	MCP server that lets Claude / Cursor / Cline / Windsurf / Zed parse partial, truncated, or messy JSON on demand. Three tools: `parse_partial_json`, `extract_json_from_text`, `validate_json`.	`npx -y @mukundakatta/streamparse-mcp`
@mukundakatta/mcpcheck _{MCP config quality gate}	Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, and Zed. CLI, GitHub Action, and SARIF for code scanning.	`npm i -g @mukundakatta/mcpcheck`
@mukundakatta/designlint _{frontend quality checks}	HTML/CSS accessibility and design linter for contrast, touch targets, headings, form labels, and leaked secrets.	`npm i -g @mukundakatta/designlint`
@mukundakatta/skillint _{AI skill validation}	Lint Claude Code `SKILL.md` files for frontmatter, required fields, descriptions, and hardcoded secrets.	`npm i -g @mukundakatta/skillint`
@mukundakatta/ai-eval-forge _{eval harness}	Zero-dependency eval harness for comparing model, prompt, and agent behavior. CLI plus programmatic API; also on PyPI.	`npm i @mukundakatta/ai-eval-forge`
@mukundakatta/codex-skill-kit _{Codex skill tooling}	Scaffold and validate Codex skills from the command line. Published for npm and PyPI workflows.	`npm i -g @mukundakatta/codex-skill-kit`
@mukundakatta/kavach _{AI-app threat signals}	Small, inspectable threat-scoring library for AI-app security monitoring: signals to weighted score to tier and playbook.	`npm i @mukundakatta/kavach`

More npm packages (43) — grouped by area

MCP servers (6) — callable directly from Claude Desktop, Cursor, Cline, Windsurf, Zed via stdio:

Package	What it does
`@mukundakatta/streamparse-mcp`	Parse partial / truncated / messy JSON for LLM tool calls. Listed in the official MCP Registry.
`@mukundakatta/agentfit-mcp`	Token-aware message truncation: count tokens, fit a chat history into a budget.
`@mukundakatta/agentguard-mcp`	Check URLs against a network-egress allowlist before any tool fetch.
`@mukundakatta/agentsnap-mcp`	Diff and validate tool-call trace snapshots.
`@mukundakatta/agentvet-mcp`	Validate tool-call args against a shape spec; produce LLM-friendly retry hints.
`@mukundakatta/agentcast-mcp`	Extract JSON from messy LLM text and validate it against a shape.

Structured outputs & parsing (1)

Package	What it does
`@mukundakatta/streamparse`	Streaming JSON parser that yields partial valid trees as tokens arrive.

Agent infrastructure (11)

Package	What it does
`@mukundakatta/agentfit`	Token-aware message truncation; fit chat history into a context budget.
`@mukundakatta/agentguard`	Network-egress firewall for agent tools: declarative domain allowlist.
`@mukundakatta/agentsnap`	Snapshot tests for tool-call traces, like Jest snapshots for LLM tool use.
`@mukundakatta/agentvet`	Validate tool args before execution, with LLM-friendly retry hints.
`@mukundakatta/agentcast`	Structured-output enforcer: validate, retry with feedback, BYO-LLM/validator.
`@mukundakatta/agent-loop-breaker`	Detect repeated agent steps and stop runaway loops.
`@mukundakatta/agent-regression-lens`	Detect regressions between baseline and current AI agent runs.
`@mukundakatta/agent-trajectory-replay`	Replay and diff AI agent event trajectories for debugging regressions.
`@mukundakatta/tool-call-contracts`	Validate LLM tool-call payloads with small JSON-like contracts.
`@mukundakatta/tool-permission-gate`	Policy-check agent tool calls before execution.
`@mukundakatta/tool-result-taint`	Track untrusted tool output before it enters prompts or actions.

RAG & retrieval (6)

Package	What it does
`@mukundakatta/rag-quality-kit`	Heuristic quality metrics for RAG retrieval and grounded answers.
`@mukundakatta/rag-staleness-auditor`	Find stale RAG chunks by age, version, and freshness requirements.
`@mukundakatta/retrieval-acl-filter`	Enforce document ACLs after retrieval and before prompting.
`@mukundakatta/vector-poison-score`	Score retrieved documents for vector/RAG poisoning signals.
`@mukundakatta/embedding-dedupe`	Deduplicate near-identical embedding records by cosine similarity.
`@mukundakatta/context-drift-detector`	Detect topic drift between user intent, retrieved context, and AI answers.

Prompt & output safety (5)

Package	What it does
`@mukundakatta/pii-sentry`	Detect and redact PII and secret-like values before AI processing.
`@mukundakatta/prompt-injection-shield`	Prompt-injection risk scanner for untrusted AI context.
`@mukundakatta/llm-output-sanitizer`	Sanitize LLM outputs before rendering, SQL, shell, or markdown sinks.
`@mukundakatta/system-prompt-leak-scan`	Detect system prompt leakage in model outputs.
`@mukundakatta/jailbreak-corpus-mini`	Small local jailbreak + prompt-injection fixture set for tests.

Context & prompt engineering (4)

Package	What it does
`@mukundakatta/context-forge`	Context engineering toolkit for ranking, packing, and risk-scanning RAG context.
`@mukundakatta/context-window-packer`	Pack context chunks into a budget by relevance and priority.
`@mukundakatta/prompt-token-trim`	Trim prompt messages to fit a token budget while preserving priority.
`@mukundakatta/prompt-version-diff`	Diff prompt templates and flag risky instruction changes.

Evals & tracing (3)

Package	What it does
`@mukundakatta/eval-dataset-smith`	Generate balanced eval cases from bugs, docs, examples, and policies.
`@mukundakatta/eval-flake-detector`	Detect flaky LLM eval cases across repeated runs.
`@mukundakatta/llm-trace-sampler`	Sample LLM traces by risk, errors, latency, and deterministic ids.

Cost, routing & caching (4)

Package	What it does
`@mukundakatta/llm-cost-guard`	Estimate AI request cost and enforce per-request or session budgets.
`@mukundakatta/model-fallback-planner`	Plan model fallback chains from capability, cost, and health data.
`@mukundakatta/model-router-policy`	Policy-based model routing by capability, cost, latency, and privacy.
`@mukundakatta/semantic-cache-key`	Stable semantic cache keys for AI prompts, tools, models, and retrieval context.

Supply chain, citations, consent (5)

Package	What it does
`@mukundakatta/ai-supply-chain-manifest`	Build and validate lightweight AI model / data / tool manifests.
`@mukundakatta/citation-integrity-check`	Verify answer citations refer to supplied source ids.
`@mukundakatta/consent-redaction-log`	Record consent-aware redactions for privacy review trails.
`@mukundakatta/hallucination-risk-meter`	Estimate hallucination risk from answer, context, citations, and uncertainty language.
`@mukundakatta/llm-response-schema-lite`	Tiny schema validator for structured LLM responses.

Install any of them with npm i @mukundakatta/<package>.

PyPI:

Package	Purpose	Install
claude-skill-check	Lint Claude Code `SKILL.md` files for YAML frontmatter, required fields, description quality, and secret patterns.	`pip install claude-skill-check`
mcp-config-check	Validate MCP configs across Claude Desktop, Cursor, Cline, Windsurf, and Zed; catches auth, transport, duplicate, and placeholder issues.	`pip install mcp-config-check`
claude-hooks-check	Audit Claude Code hooks for malformed matchers, dangerous commands, invalid events, and hardcoded secrets.	`pip install claude-hooks-check`
claude-commands-check	Validate Claude Code slash-command files for naming, frontmatter, model values, allowed-tools shape, and secret leakage.	`pip install claude-commands-check`
llm-usage-report	Parse raw LLM API response logs and generate token and cost reports by provider, model, day, project, or user.	`pip install llm-usage-report`
codex-skill-kit	Scaffold and validate Codex skills from Python environments; mirrors the npm CLI workflow.	`pip install codex-skill-kit`
ai-eval-forge	Zero-dependency LLM and agent eval harness with exact, regex, token-F1, JSON, and citation-coverage checks.	`pip install ai-eval-forge`
agent-run-diff	Compare baseline and current agent runs across success, errors, tools, output drift, steps, latency, and cost.	`pip install agent-run-diff`

More PyPI packages (44) — Python ports of the @mukundakatta JS libraries

Streaming + agent reliability stack (6)

Package	What it does
`partial-json-stream`	Streaming JSON parser that yields partial valid trees as tokens arrive.
`agentfit-py`	Token-aware message truncation; fit a chat history into a context budget.
`agentguard-firewall`	Network-egress firewall for agent tools.
`agentsnap-py`	Snapshot tests for tool-call traces.
`agentvet-py`	Validate tool args before execution; LLM-friendly retry hints.
`agentcast-py`	Structured-output enforcer; validate, retry with feedback.

Prompt + output safety (3)

Package	What it does
`pii-sentry-py`	Detect and redact PII and secret-like values before AI processing.
`prompt-injection-shield-py`	Prompt-injection risk scanner for untrusted AI context.
`llm-output-sanitizer-py`	Sanitize LLM outputs before HTML / SQL / shell / markdown sinks.

RAG + retrieval (3)

Package	What it does
`rag-quality-kit`	Heuristic quality metrics for RAG retrieval and grounded answers.
`vector-poison-score`	Score retrieved documents for vector / RAG poisoning signals.
`embedding-dedupe`	Deduplicate near-identical embedding records by cosine similarity.

Cost, caching, evals (3)

Package	What it does
`llm-cost-guard-py`	Estimate AI request cost and enforce per-request or session budgets.
`semantic-cache-key`	Stable semantic cache keys for AI prompts, tools, models, retrieval.
`eval-flake-detector`	Detect flaky LLM eval cases across repeated runs.

Verification + grounding (3)

Package	What it does
`citation-integrity-check`	Verify answer citations refer to supplied source ids.
`hallucination-risk-meter`	Estimate hallucination risk from answer + context + citations.
`system-prompt-leak-scan`	Detect system-prompt leakage in model outputs.

Agent infrastructure + meta (6)

Package	What it does
`mk-agentkit`	Meta-package re-exporting all 5 agent-stack ports under one import.
`agent-loop-breaker-py`	Detect repeated agent steps and stop runaway loops.
`agent-regression-lens-py`	Detect regressions between baseline and current agent runs.
`agent-trajectory-replay-py`	Replay and diff agent event trajectories.
`tool-call-contracts-py`	Validate LLM tool-call payloads with small JSON-like contracts.
`tool-permission-gate-py`	Policy-check agent tool calls before execution.

Tools / safety / privacy (4)

Package	What it does
`tool-result-taint-py`	Track untrusted tool output before it enters prompts.
`jailbreak-corpus-mini-py`	Local jailbreak + prompt-injection fixture set for tests.
`consent-redaction-log-py`	Record consent-aware redactions for privacy review trails.
`kavach-py`	Threat-scoring library for AI-app security monitoring.

RAG (3)

Package	What it does
`rag-staleness-auditor-py`	Find stale RAG chunks by age, version, and freshness requirements.
`retrieval-acl-filter-py`	Enforce document ACLs after retrieval and before prompting.
`context-drift-detector-py`	Detect topic drift between intent, context, and answer.

Context engineering (5)

Package	What it does
`context-forge-py`	Context engineering toolkit: ranking, packing, risk-scanning.
`context-window-packer-py`	Pack context chunks into a budget by relevance and priority.
`prompt-token-trim-py`	Trim prompt messages to fit a token budget while preserving priority.
`prompt-version-diff-py`	Diff prompt templates and flag risky instruction changes.
`llm-response-schema-lite-py`	Tiny schema validator for structured LLM responses.

Evals + cost + routing (5)

Package	What it does
`eval-dataset-smith-py`	Generate balanced eval cases from bugs, docs, examples, policies.
`llm-trace-sampler-py`	Sample LLM traces by risk, errors, latency, and deterministic ids.
`llm-cost-guard-py`	Estimate AI request cost and enforce per-request or session budgets.
`model-fallback-planner-py`	Plan model fallback chains from capability, cost, and health data.
`model-router-policy-py`	Policy-based model routing by capability, cost, latency, privacy.

Niche linters (4)

Package	What it does
`mcpcheck-py`	Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, Zed.
`skillint-py`	Lint Claude Code SKILL.md files.
`designlint-py`	HTML/CSS accessibility and design linter.
`ai-supply-chain-manifest-py`	Build and validate lightweight AI model / data / tool manifests.

GitHub Marketplace (7 Actions):

Composite GitHub Actions, discoverable on the GitHub Marketplace:

Linters:

Agent-stack CI gates:

agentvet-action — fail PRs on bad LLM tool definitions
agentsnap-action — fail PRs on tool-call trace drift
mcp-stack-validate-action — one CI gate that runs all 5 agent-stack tools

Homebrew tap — mukundakatta/tools:

brew tap mukundakatta/tools
brew install claude-skill-check mcp-config-check claude-hooks-check claude-commands-check

Each ships a CLI, a programmatic API, and (for the linters) a composite GitHub Action you can drop into any workflow in 3 lines.

🤗 HuggingFace — mukunda1729 — 14 Spaces · 13 Datasets:

🚀 Live Gradio playgrounds (6):

Space	What you can try
`agent-stack-demo`	All 5 libs (`fit`, `guard`, `snap`, `vet`, `cast`) in one app.
`token-counter`	Count tokens for any text across Claude / GPT / Llama tokenizers.
`json-extractor`	Pull clean JSON out of messy LLM output (fenced, inline, unfenced).
`pii-redactor`	Find emails, phones, secrets, and IDs — mask, hash, or highlight.
`prompt-injection-detector`	Heuristic scanner for the most common injection families.
`mcp-config-validator`	Sanity-check Claude Desktop / Cursor / Cline / Windsurf / Zed configs.

📖 Static reference & explainer pages (8):

Space	What it covers
`agent-stack-tour`	Guided tour of all 5 libraries with install commands and live links.
`why-this-stack`	The thinking behind the stack — what's broken, why these 5 libs.
`install-cheatsheet`	All install commands across pip, npm, and MCP.
`mcp-quickstart`	Add the 5 MCP servers to Claude Desktop / Cursor / Cline / Windsurf / Zed.
`fit-strategies-explained`	Visual explainer: drop-oldest vs drop-middle vs priority.
`trace-format-reference`	Field-by-field reference for the agentsnap trace JSON schema.
`prompt-injection-taxonomy`	10-category taxonomy with examples + the cheap defense for each.
`dataset-cards-index`	One-page index of all 13 datasets below.

📊 Datasets (13) — all MIT, all datasets.load_dataset("mukunda1729/<name>") ready:

Dataset	Rows	Purpose
`jailbreak-corpus-mini`	15	Curated jailbreak fixtures across 8 categories.
`prompt-injection-patterns-extended`	30	Prompt-injection patterns across 10 categories.
`pii-detection-fixtures`	25	PII / secret strings labeled with span offsets.
`tool-arg-validation-cases`	20	(Tool, schema, args) tuples — valid + invalid.
`mcp-tool-test-fixtures`	22	MCP tool-call args across 8 categories.
`llm-output-extraction-cases`	20	Messy LLM outputs with expected JSON.
`hallucination-risk-cases`	20	Prompt → response pairs rated for hallucination risk.
`rag-quality-benchmarks-mini`	15	RAG eval queries with ground-truth answers.
`agent-trace-samples`	10	agentsnap-format tool-call traces (good + regressed pairs).
`agent-budget-violations`	15	Agent runs with budget caps + actual usage + root cause.
`token-counting-edge-cases`	20	Strings with token counts across 3 tokenizer families.
`model-pricing-table`	20	LLM pricing — input/output cost per 1k tokens, context window.
`mcp-config-examples`	15	MCP client configs across Claude Desktop, Cursor, Cline, Windsurf, Zed.

Featured Projects

Karna — AI Agent Platform

Self-hosted AI assistant with 7 messaging channels (Telegram, Slack, Discord, WhatsApp, SMS, iMessage, Web), extensible plugin SDK, semantic memory, and voice. TypeScript monorepo with Next.js dashboard and React Native mobile app.

_{Stack · TypeScript • Node.js • Next.js • Supabase • WebSocket • pgvector}

Chetana — AI Consciousness Research Platform

Research-driven platform exploring machine consciousness through 14 indicators grounded in 6 scientific theories. Built to turn abstract AI-consciousness questions into structured experiments, scoring, and analysis.

_{Stack · AI Research • Evaluation • Experimentation • Python}

AgentRAG — Modular RAG Pipeline

Provider-agnostic RAG framework with pluggable vector stores, chunking strategies, and retrieval methods. Designed for agentic workflows with clean API boundaries.

_{Stack · RAG • Vector Search • Embeddings • TypeScript}

Astra Agent — AI Agent Runtime

Standalone AI agent runtime with tool execution, context management, and multi-model routing. Foundation for building autonomous AI assistants with structured tool use.

_{Stack · TypeScript • LLM Orchestration • Tool Use • Agents}

More Projects

Project	Description
Sadhak	AI-powered job search command center — automated evaluation, resume tailoring, application tracking
Chetana	AI consciousness research platform — 14 indicators from 6 scientific theories
Prithvi	Container security scanner — vulnerability detection, compliance checks, Docker audits
Amogha Cafe	Full-stack Firebase restaurant platform — real-time ordering, QR dine-in. Live
RNHT	Temple community platform — events, donations, priest scheduling
Patchly	AI code review bot — flags bugs, suggests fixes, explains why, like a senior engineer
Evalharness	Prompt, agent, and RAG test harness — red teaming, regression testing, CI/CD for AI
AgentMem	Pluggable memory management for AI agents
LLM Bench CLI	CLI for benchmarking local LLMs — speed, throughput, quality
TokenWise	Token usage optimization across providers

Impact at a Glance

Production AI / ML Impact
_{COST EFFICIENCY} 78% _{infrastructure cost reduction SageMaker → Bedrock migration}	_LATENCY 600x _{retrieval latency improvement ML prediction system}	_{RAG SCALE} 30K+ _{knowledge base entries 9-stage agentic RAG pipeline}	_QUALITY 370+ _{unit tests & evaluations production ML systems}
Open Source Footprint
_UPSTREAM 97 _{merged PRs in external public repos}	_PACKAGES 144 _{52 npm (incl. 6 MCP servers, agentkit) + 52 PyPI + 6 in the official MCP Registry + 7 GitHub Marketplace Actions + 14 HF Spaces + 13 HF Datasets}	_{ORIGINAL WORK} 160 _{original public repos maintained on GitHub}	_ECOSYSTEMS 6+ _{major org ecosystems OpenAI, Anthropic, Google, Microsoft, Stanford, Princeton}

What I Build

 ML Systems           Fault prediction, embedding pipelines, model evaluation, cost-optimized inference
 Agentic AI           RAG pipelines, LangGraph workflows, query routing, hallucination detection
 Cloud Infrastructure AWS (Bedrock, SageMaker, ECS, OpenSearch), GCP, Azure, Kubernetes, Terraform
 Full-Stack           React/TypeScript + Java/Python backend APIs, CI/CD, zero-downtime deployments

Experience

Role	Company	Era	Primary arena
AI/ML Engineer	Southwest Airlines	Aug 2025 — Present	production ML, agentic RAG, Bedrock migration
AI/ML Engineer	GPS IT Solutions	Jun 2024 — Aug 2025	RAG platforms, model-risk governance, vector search
Software Development Engineer	Amazon Web Services	Aug 2022 — May 2024	enterprise cloud systems, React/Java/Python, CI/CD
Data Engineer	GPS IT Solutions	Jan 2022 — Aug 2022	data pipelines, AWS Glue, PySpark, analytics workflows
Software Engineer	American Express	Feb 2017 — Dec 2020	Python backend services, REST APIs, enterprise platforms

Highlights

Southwest Airlines — AI/ML Engineer

Architected ML fault prediction system for aircraft maintenance — 5 prediction types, 10K+ records, sub-second retrieval
Led SageMaker → Bedrock migration: 78% cost reduction ($1,740→$371/mo), 600x latency improvement
Designed 9-stage agentic RAG pipeline (LangGraph, Bedrock Nova Pro/Micro, FAISS + BM25) over 30K+ KB entries

GPS IT Solutions — AI/ML Engineer

Built GPT-4 + RAG content generation platform with compliance validation, reducing production time by 40%
Designed AI model risk governance framework with 23 automated evaluation tests achieving regulatory compliance
Architected FastAPI microservices with FAISS/Pinecone vector search on Kubernetes

Amazon Web Services (AWS) — Software Development Engineer

Built and shipped features for AWS Application Manager (Systems Manager) serving enterprise customers globally
Owned full-stack delivery: React/TypeScript frontend + Java/Python backend APIs with operational excellence
Designed CI/CD and IaC patterns enabling zero-downtime deployments at enterprise scale

GPS IT Solutions — Data Engineer

Led end-to-end migration of data pipelines from on-prem to AWS (Glue, PySpark)

American Express — Software Engineer

Developed Python backend services and RESTful APIs for enterprise platforms handling high-volume transactions at scale

Follow For

If you follow my work here, you’ll mostly see:

open-source contributions to AI SDKs and agent tooling
MCP, eval, and developer-experience improvements
practical full-stack and infrastructure-heavy AI projects
systems thinking around memory, retrieval, orchestration, and production reliability

Education

University of Central Missouri — M.S. in Big Data Analytics and Information Technology (Jan 2021 — May 2022)

SRM University — B.Tech in Mechanical Engineering (2012 — 2016)

Certifications

Anthropic

AWS

Cloud & Infrastructure

Stanford / Wharton

Microsoft

LinkedIn Learning

Tech Stack

GitHub Stats

Live Signals

Open to opportunities — Senior AI/ML Engineer • GenAI Platform Engineer • Software Engineer

mukunda-ai.vercel.app • Las Vegas, NV

Provide feedback

Saved searches

Use saved searches to filter your results more quickly