8+ years building production systems at Fortune 100 scale
Former SDE at Amazon Web Services • Currently at Southwest Airlines
Deep expertise in ML systems, distributed architectures, and full-stack engineering
Now: shipped the @mukundakatta/agent* reliability stack (fit → guard → snap → vet → cast), 6 matching MCP servers in the official MCP Registry, 3 new GitHub Actions on the Marketplace, and doubled the PyPI footprint to 52 packages (full Python ports of the npm catalog). Plus 40+ open PRs across MCP SDKs, FastMCP, claude-code-action, and Anthropic's agent SDK.
|
PUBLIC REPOS 572 |
ORIGINALS 160 |
ACTIVE PROJECTS 122 |
FORKS 412 |
ARCHIVED 324 |
Every repo is indexed in claude-workspace — wired for Multica, Claude Code, Codex, OpenClaw, and Cursor to reason across the portfolio.
🌐 Live at mukundakatta.github.io/agent-stack — single landing page for the whole 117-package ecosystem (npm + PyPI + MCP Registry + GitHub Marketplace).
🤗 Try it live on the HuggingFace Space · jailbreak fixtures on the HF Dataset.
Five small, focused npm packages that fix the boring problems every long-running agent eventually hits. Pure ESM JavaScript, zero runtime deps, TypeScript types in the box. Designed to compose into a pipeline:
fit → guard → snap → vet → cast.
|
Fit it. Token-aware message truncation with three strategies (drop-oldest, drop-middle, priority). Pluggable tokenizers. Per-model estimators. |
Sandbox it. Network-egress firewall: a declarative allowlist of domains agent tools can fetch. Throws on violation, with a clear error. |
Test it. Snapshot tests for tool-call traces. Catch silent regressions in LLM tool use the way you catch UI regressions today. |
Vet it. Validate tool args before execution. Wrap any tool function; on bad args, throw a typed error with an LLM-friendly retry hint. |
Validate it. Structured-output enforcer. Validate the model's response, retry with the validation error as feedback, return typed data or throw after N attempts. BYO LLM and validator. |
npm i @mukundakatta/agentfit @mukundakatta/agentguard @mukundakatta/agentsnap @mukundakatta/agentvet @mukundakatta/agentcastEach one also ships as an MCP server so Claude Desktop, Cursor, Cline, Windsurf, and Zed can call them directly mid-conversation:
npx -y @mukundakatta/agentfit-mcp # fit a chat history into a budget
npx -y @mukundakatta/agentguard-mcp # check URLs against an egress policy
npx -y @mukundakatta/agentsnap-mcp # diff tool-call traces
npx -y @mukundakatta/agentvet-mcp # validate tool args + generate retry hints
npx -y @mukundakatta/agentcast-mcp # extract / validate JSON from LLM textSibling libraries that share a design philosophy: small, focused, zero-dep, BYO-LLM. Each one solves a single concrete reliability problem so you can pick the ones you need without dragging in a framework. Previous drop, streamparse (streaming JSON parser, npm + Homebrew + MCP Registry), is still in active use.
I contribute practical fixes to AI SDKs, MCP tooling, eval frameworks, agent infrastructure, structured outputs, and developer experience.
My lane is finding the sharp edges that slow builders down: unclear contracts, brittle tool calls, docs that almost answer the question, eval gaps where regressions hide, and AI tooling that needs better failure signals. I like small, reviewable patches with clear intent, and compact packages that turn repeated manual checks into reusable workflows.
Recent contribution areas (merged upstream):
- Microsoft — security and architecture docs for internal AI-engineering toolchains (
hve-core,physical-ai-toolchain) - Pydantic —
pydantic-aiintegration with the Vercel AI SDK - Hugging Face ecosystem —
safetensorsPython bindings,sentence-transformerstrainer migration docs - Meilisearch —
heedmulti-target docs.rs infrastructure - Vercel —
next.jsdocumentation - Apache Software Foundation — doc / comment fixes across
iceberg,pulsar,skywalking,ozone,iotdb
I keep a public log of selected OSS work in oss-contributions.
Distribution pattern. Each flagship ships as a complete unit, not a single npm package:
library → Python port → CLI binary → GitHub Action → Homebrew formula → MCP server
npm PyPI Marketplace brew tap npm
So the same problem (mcpcheck, skillint, streamparse) is solvable from any environment a developer or AI assistant happens to be in: a TypeScript app, a Python script, a CI workflow, a terminal, or directly inside Claude / Cursor / Cline / Windsurf / Zed.
- openai/tiktoken #535 — use a static artifact name for the sdist build job in CI
- modelcontextprotocol/typescript-sdk #1966 — return tool input validation failures as Tool Execution Errors (SEP-1303)
- modelcontextprotocol/registry #1209 — add Maven Central package source for JVM MCP servers
- modelcontextprotocol/python-sdk #2515 — document FastMCP server instructions
- openai/openai-node #1831 — improved fallback handling for non-standard JSON error bodies
- stanford-crfm/helm #4210 — fixed later-page deep links for run instances
Last refreshed 2026-04-27 from npm, PyPI, and the GitHub API.
Latest releases
2026-04-27· PyPI footprint doubled to 52 packages. Added 26 more Python ports today (mk-agentkit meta + 5 agent infra: agent-loop-breaker-py, agent-regression-lens-py, agent-trajectory-replay-py, tool-call-contracts-py, tool-permission-gate-py · 5 evals/cost/routing: eval-dataset-smith-py, llm-trace-sampler-py, model-fallback-planner-py, model-router-policy-py, ai-supply-chain-manifest-py · 3 tools/safety: tool-result-taint-py, jailbreak-corpus-mini-py, consent-redaction-log-py · 3 RAG: rag-staleness-auditor-py, retrieval-acl-filter-py, context-drift-detector-py · 5 context/prompt: context-forge-py, context-window-packer-py, prompt-token-trim-py, prompt-version-diff-py, llm-response-schema-lite-py · 4 niche: kavach-py, mcpcheck-py, skillint-py, designlint-py)2026-04-27· 18 new Python ports on PyPI:partial-json-stream,agentfit-py,agentguard-firewall,agentsnap-py,agentvet-py,agentcast-py,pii-sentry-py,prompt-injection-shield-py,llm-output-sanitizer-py,rag-quality-kit,vector-poison-score,embedding-dedupe,llm-cost-guard-py,semantic-cache-key,eval-flake-detector,citation-integrity-check,hallucination-risk-meter,system-prompt-leak-scan2026-04-27·@mukundakatta/agentkitv0.1.0· npm · meta-package re-exporting all 5 agent-stack libraries2026-04-27· 5 of the 5 agent-stack libraries bumped tov0.1.1with newnpx-runnable CLI binaries2026-04-27· 3 new GitHub Marketplace Actions:agentvet-action,agentsnap-action,mcp-stack-validate-action2026-04-27· 5 new entries in the official MCP Registry:io.github.MukundaKatta/{agentfit, agentguard, agentsnap, agentvet, agentcast}2026-04-26·@mukundakatta/agentfit-mcpv0.1.0· npm · MCP server for agentfit2026-04-26·@mukundakatta/agentguard-mcpv0.1.0· npm · MCP server for agentguard2026-04-26·@mukundakatta/agentsnap-mcpv0.1.0· npm · MCP server for agentsnap2026-04-26·@mukundakatta/agentvet-mcpv0.1.0· npm · MCP server for agentvet2026-04-26·@mukundakatta/agentcast-mcpv0.1.0· npm · MCP server for agentcast2026-04-26·@mukundakatta/agentcastv0.1.0· npm · structured-output enforcer for any LLM2026-04-26·@mukundakatta/agentfitv0.1.0· npm · token-aware message truncation2026-04-26·@mukundakatta/agentvetv0.1.0· npm · tool-arg validator with retry hints2026-04-25·@mukundakatta/agentguardv0.1.0· npm · network-egress firewall for agent tools2026-04-25·@mukundakatta/agentsnapv0.1.0· npm · snapshot tests for tool-call traces2026-04-25·@mukundakatta/streamparsev1.0.1· npm · streaming JSON parser with CLI + Homebrew formula2026-04-25·@mukundakatta/streamparse-mcpv1.0.1· npm + MCP Registry (io.github.MukundaKatta/streamparse)
Recently merged PRs
2026-04-24· langgenius/dify #35547 — docs: fix Kubernetes deployment wording2026-04-24· infiniflow/ragflow #14352 — docs: fix API key guide typo2026-04-22· pydantic/pydantic-ai #5156 — fix(vercel-ai): allow regenerate requests withoutmessageId2026-04-23· ntop/ntopng #10297 — fix(locales/en): correct display string 'Enstablished' -> 'Established'2026-04-22· safetensors/safetensors #753 — fix(python): make SafetensorError picklable
Open PRs (recent batch) — substantive fixes shipped 2026-04-26 across MCP, Anthropic, FastMCP, Apache, Google Cloud, HuggingFace, OpenTelemetry:
- modelcontextprotocol/typescript-sdk #1961 — fix SSE reader-lock leak in
StreamableHTTPClientTransport - modelcontextprotocol/typescript-sdk #1965 — feat(client): honor
Retry-Afteron HTTP 429 responses - modelcontextprotocol/typescript-sdk #1964 — feat(deps): make HTTP/SSE transport deps optional for stdio-only consumers
- modelcontextprotocol/csharp-sdk #1530 — fix(client): preserve underlying status code in AutoDetect probe
- modelcontextprotocol/inspector #1231 — feat(auth): support OAuth 2.0 client_credentials grant type
- modelcontextprotocol/registry #1209 — feat(sources): add Maven Central package source for JVM MCP servers
- anthropics/claude-code-action #1261 — fix(mcp): spawn bundled MCP servers on
pull_requestevents - anthropics/claude-agent-sdk-python #879 — fix(session): generate AI title for SDK-created sessions
- PrefectHQ/fastmcp #4071 — feat(openapi): per-call HTTP headers for multi-tenant auth
- pydantic/pydantic #13120 — docs(validators): document
model_validatorexecution order with inheritance - huggingface/lerobot #3464 — fix(policy): resolve state-dict naming clash from tied-weight storage views
- open-telemetry/opentelemetry-python #5149 — fix(ci): stabilize tracecontext job
- apache/skywalking #13845 — docs: BanyanDB 0.10.0 upgrade notes
npm (scope @mukundakatta):
Flagship packages:
| Package | Why it matters | Install |
|---|---|---|
| @mukundakatta/streamparse partial JSON for LLM streams |
Streaming JSON parser that yields partial valid trees as tokens arrive. Render LLM tool calls mid-stream, recover dropped responses, parse messy ` ```json ` blocks. Zero deps, 64 tests. Also published as an MCP server in the official MCP Registry. | npm i @mukundakatta/streamparse |
| @mukundakatta/streamparse-mcp MCP: parse partial JSON |
MCP server that lets Claude / Cursor / Cline / Windsurf / Zed parse partial, truncated, or messy JSON on demand. Three tools: parse_partial_json, extract_json_from_text, validate_json. |
npx -y @mukundakatta/streamparse-mcp |
| @mukundakatta/mcpcheck MCP config quality gate |
Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, and Zed. CLI, GitHub Action, and SARIF for code scanning. | npm i -g @mukundakatta/mcpcheck |
| @mukundakatta/designlint frontend quality checks |
HTML/CSS accessibility and design linter for contrast, touch targets, headings, form labels, and leaked secrets. | npm i -g @mukundakatta/designlint |
| @mukundakatta/skillint AI skill validation |
Lint Claude Code SKILL.md files for frontmatter, required fields, descriptions, and hardcoded secrets. |
npm i -g @mukundakatta/skillint |
| @mukundakatta/ai-eval-forge eval harness |
Zero-dependency eval harness for comparing model, prompt, and agent behavior. CLI plus programmatic API; also on PyPI. | npm i @mukundakatta/ai-eval-forge |
| @mukundakatta/codex-skill-kit Codex skill tooling |
Scaffold and validate Codex skills from the command line. Published for npm and PyPI workflows. | npm i -g @mukundakatta/codex-skill-kit |
| @mukundakatta/kavach AI-app threat signals |
Small, inspectable threat-scoring library for AI-app security monitoring: signals to weighted score to tier and playbook. | npm i @mukundakatta/kavach |
More npm packages (43) — grouped by area
MCP servers (6) — callable directly from Claude Desktop, Cursor, Cline, Windsurf, Zed via stdio:
| Package | What it does |
|---|---|
@mukundakatta/streamparse-mcp |
Parse partial / truncated / messy JSON for LLM tool calls. Listed in the official MCP Registry. |
@mukundakatta/agentfit-mcp |
Token-aware message truncation: count tokens, fit a chat history into a budget. |
@mukundakatta/agentguard-mcp |
Check URLs against a network-egress allowlist before any tool fetch. |
@mukundakatta/agentsnap-mcp |
Diff and validate tool-call trace snapshots. |
@mukundakatta/agentvet-mcp |
Validate tool-call args against a shape spec; produce LLM-friendly retry hints. |
@mukundakatta/agentcast-mcp |
Extract JSON from messy LLM text and validate it against a shape. |
Structured outputs & parsing (1)
| Package | What it does |
|---|---|
@mukundakatta/streamparse |
Streaming JSON parser that yields partial valid trees as tokens arrive. |
Agent infrastructure (11)
| Package | What it does |
|---|---|
@mukundakatta/agentfit |
Token-aware message truncation; fit chat history into a context budget. |
@mukundakatta/agentguard |
Network-egress firewall for agent tools: declarative domain allowlist. |
@mukundakatta/agentsnap |
Snapshot tests for tool-call traces, like Jest snapshots for LLM tool use. |
@mukundakatta/agentvet |
Validate tool args before execution, with LLM-friendly retry hints. |
@mukundakatta/agentcast |
Structured-output enforcer: validate, retry with feedback, BYO-LLM/validator. |
@mukundakatta/agent-loop-breaker |
Detect repeated agent steps and stop runaway loops. |
@mukundakatta/agent-regression-lens |
Detect regressions between baseline and current AI agent runs. |
@mukundakatta/agent-trajectory-replay |
Replay and diff AI agent event trajectories for debugging regressions. |
@mukundakatta/tool-call-contracts |
Validate LLM tool-call payloads with small JSON-like contracts. |
@mukundakatta/tool-permission-gate |
Policy-check agent tool calls before execution. |
@mukundakatta/tool-result-taint |
Track untrusted tool output before it enters prompts or actions. |
RAG & retrieval (6)
| Package | What it does |
|---|---|
@mukundakatta/rag-quality-kit |
Heuristic quality metrics for RAG retrieval and grounded answers. |
@mukundakatta/rag-staleness-auditor |
Find stale RAG chunks by age, version, and freshness requirements. |
@mukundakatta/retrieval-acl-filter |
Enforce document ACLs after retrieval and before prompting. |
@mukundakatta/vector-poison-score |
Score retrieved documents for vector/RAG poisoning signals. |
@mukundakatta/embedding-dedupe |
Deduplicate near-identical embedding records by cosine similarity. |
@mukundakatta/context-drift-detector |
Detect topic drift between user intent, retrieved context, and AI answers. |
Prompt & output safety (5)
| Package | What it does |
|---|---|
@mukundakatta/pii-sentry |
Detect and redact PII and secret-like values before AI processing. |
@mukundakatta/prompt-injection-shield |
Prompt-injection risk scanner for untrusted AI context. |
@mukundakatta/llm-output-sanitizer |
Sanitize LLM outputs before rendering, SQL, shell, or markdown sinks. |
@mukundakatta/system-prompt-leak-scan |
Detect system prompt leakage in model outputs. |
@mukundakatta/jailbreak-corpus-mini |
Small local jailbreak + prompt-injection fixture set for tests. |
Context & prompt engineering (4)
| Package | What it does |
|---|---|
@mukundakatta/context-forge |
Context engineering toolkit for ranking, packing, and risk-scanning RAG context. |
@mukundakatta/context-window-packer |
Pack context chunks into a budget by relevance and priority. |
@mukundakatta/prompt-token-trim |
Trim prompt messages to fit a token budget while preserving priority. |
@mukundakatta/prompt-version-diff |
Diff prompt templates and flag risky instruction changes. |
Evals & tracing (3)
| Package | What it does |
|---|---|
@mukundakatta/eval-dataset-smith |
Generate balanced eval cases from bugs, docs, examples, and policies. |
@mukundakatta/eval-flake-detector |
Detect flaky LLM eval cases across repeated runs. |
@mukundakatta/llm-trace-sampler |
Sample LLM traces by risk, errors, latency, and deterministic ids. |
Cost, routing & caching (4)
| Package | What it does |
|---|---|
@mukundakatta/llm-cost-guard |
Estimate AI request cost and enforce per-request or session budgets. |
@mukundakatta/model-fallback-planner |
Plan model fallback chains from capability, cost, and health data. |
@mukundakatta/model-router-policy |
Policy-based model routing by capability, cost, latency, and privacy. |
@mukundakatta/semantic-cache-key |
Stable semantic cache keys for AI prompts, tools, models, and retrieval context. |
Supply chain, citations, consent (5)
| Package | What it does |
|---|---|
@mukundakatta/ai-supply-chain-manifest |
Build and validate lightweight AI model / data / tool manifests. |
@mukundakatta/citation-integrity-check |
Verify answer citations refer to supplied source ids. |
@mukundakatta/consent-redaction-log |
Record consent-aware redactions for privacy review trails. |
@mukundakatta/hallucination-risk-meter |
Estimate hallucination risk from answer, context, citations, and uncertainty language. |
@mukundakatta/llm-response-schema-lite |
Tiny schema validator for structured LLM responses. |
Install any of them with npm i @mukundakatta/<package>.
PyPI:
| Package | Purpose | Install |
|---|---|---|
| claude-skill-check |
Lint Claude Code SKILL.md files for YAML frontmatter, required fields, description quality, and secret patterns. |
pip install claude-skill-check |
| mcp-config-check |
Validate MCP configs across Claude Desktop, Cursor, Cline, Windsurf, and Zed; catches auth, transport, duplicate, and placeholder issues. | pip install mcp-config-check |
| claude-hooks-check |
Audit Claude Code hooks for malformed matchers, dangerous commands, invalid events, and hardcoded secrets. | pip install claude-hooks-check |
| claude-commands-check |
Validate Claude Code slash-command files for naming, frontmatter, model values, allowed-tools shape, and secret leakage. | pip install claude-commands-check |
| llm-usage-report |
Parse raw LLM API response logs and generate token and cost reports by provider, model, day, project, or user. | pip install llm-usage-report |
| codex-skill-kit |
Scaffold and validate Codex skills from Python environments; mirrors the npm CLI workflow. | pip install codex-skill-kit |
| ai-eval-forge |
Zero-dependency LLM and agent eval harness with exact, regex, token-F1, JSON, and citation-coverage checks. | pip install ai-eval-forge |
| agent-run-diff |
Compare baseline and current agent runs across success, errors, tools, output drift, steps, latency, and cost. | pip install agent-run-diff |
More PyPI packages (44) — Python ports of the @mukundakatta JS libraries
Streaming + agent reliability stack (6)
| Package | What it does |
|---|---|
partial-json-stream |
Streaming JSON parser that yields partial valid trees as tokens arrive. |
agentfit-py |
Token-aware message truncation; fit a chat history into a context budget. |
agentguard-firewall |
Network-egress firewall for agent tools. |
agentsnap-py |
Snapshot tests for tool-call traces. |
agentvet-py |
Validate tool args before execution; LLM-friendly retry hints. |
agentcast-py |
Structured-output enforcer; validate, retry with feedback. |
Prompt + output safety (3)
| Package | What it does |
|---|---|
pii-sentry-py |
Detect and redact PII and secret-like values before AI processing. |
prompt-injection-shield-py |
Prompt-injection risk scanner for untrusted AI context. |
llm-output-sanitizer-py |
Sanitize LLM outputs before HTML / SQL / shell / markdown sinks. |
RAG + retrieval (3)
| Package | What it does |
|---|---|
rag-quality-kit |
Heuristic quality metrics for RAG retrieval and grounded answers. |
vector-poison-score |
Score retrieved documents for vector / RAG poisoning signals. |
embedding-dedupe |
Deduplicate near-identical embedding records by cosine similarity. |
Cost, caching, evals (3)
| Package | What it does |
|---|---|
llm-cost-guard-py |
Estimate AI request cost and enforce per-request or session budgets. |
semantic-cache-key |
Stable semantic cache keys for AI prompts, tools, models, retrieval. |
eval-flake-detector |
Detect flaky LLM eval cases across repeated runs. |
Verification + grounding (3)
| Package | What it does |
|---|---|
citation-integrity-check |
Verify answer citations refer to supplied source ids. |
hallucination-risk-meter |
Estimate hallucination risk from answer + context + citations. |
system-prompt-leak-scan |
Detect system-prompt leakage in model outputs. |
Agent infrastructure + meta (6)
| Package | What it does |
|---|---|
mk-agentkit |
Meta-package re-exporting all 5 agent-stack ports under one import. |
agent-loop-breaker-py |
Detect repeated agent steps and stop runaway loops. |
agent-regression-lens-py |
Detect regressions between baseline and current agent runs. |
agent-trajectory-replay-py |
Replay and diff agent event trajectories. |
tool-call-contracts-py |
Validate LLM tool-call payloads with small JSON-like contracts. |
tool-permission-gate-py |
Policy-check agent tool calls before execution. |
Tools / safety / privacy (4)
| Package | What it does |
|---|---|
tool-result-taint-py |
Track untrusted tool output before it enters prompts. |
jailbreak-corpus-mini-py |
Local jailbreak + prompt-injection fixture set for tests. |
consent-redaction-log-py |
Record consent-aware redactions for privacy review trails. |
kavach-py |
Threat-scoring library for AI-app security monitoring. |
RAG (3)
| Package | What it does |
|---|---|
rag-staleness-auditor-py |
Find stale RAG chunks by age, version, and freshness requirements. |
retrieval-acl-filter-py |
Enforce document ACLs after retrieval and before prompting. |
context-drift-detector-py |
Detect topic drift between intent, context, and answer. |
Context engineering (5)
| Package | What it does |
|---|---|
context-forge-py |
Context engineering toolkit: ranking, packing, risk-scanning. |
context-window-packer-py |
Pack context chunks into a budget by relevance and priority. |
prompt-token-trim-py |
Trim prompt messages to fit a token budget while preserving priority. |
prompt-version-diff-py |
Diff prompt templates and flag risky instruction changes. |
llm-response-schema-lite-py |
Tiny schema validator for structured LLM responses. |
Evals + cost + routing (5)
| Package | What it does |
|---|---|
eval-dataset-smith-py |
Generate balanced eval cases from bugs, docs, examples, policies. |
llm-trace-sampler-py |
Sample LLM traces by risk, errors, latency, and deterministic ids. |
llm-cost-guard-py |
Estimate AI request cost and enforce per-request or session budgets. |
model-fallback-planner-py |
Plan model fallback chains from capability, cost, and health data. |
model-router-policy-py |
Policy-based model routing by capability, cost, latency, privacy. |
Niche linters (4)
| Package | What it does |
|---|---|
mcpcheck-py |
Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, Zed. |
skillint-py |
Lint Claude Code SKILL.md files. |
designlint-py |
HTML/CSS accessibility and design linter. |
ai-supply-chain-manifest-py |
Build and validate lightweight AI model / data / tool manifests. |
GitHub Marketplace (7 Actions):
Composite GitHub Actions, discoverable on the GitHub Marketplace:
Linters:
Agent-stack CI gates:
agentvet-action— fail PRs on bad LLM tool definitionsagentsnap-action— fail PRs on tool-call trace driftmcp-stack-validate-action— one CI gate that runs all 5 agent-stack tools
Homebrew tap — mukundakatta/tools:
brew tap mukundakatta/tools
brew install claude-skill-check mcp-config-check claude-hooks-check claude-commands-checkEach ships a CLI, a programmatic API, and (for the linters) a composite GitHub Action you can drop into any workflow in 3 lines.
🤗 HuggingFace — mukunda1729 — 14 Spaces · 13 Datasets:
🚀 Live Gradio playgrounds (6):
| Space | What you can try |
|---|---|
agent-stack-demo |
All 5 libs (fit, guard, snap, vet, cast) in one app. |
token-counter |
Count tokens for any text across Claude / GPT / Llama tokenizers. |
json-extractor |
Pull clean JSON out of messy LLM output (fenced, inline, unfenced). |
pii-redactor |
Find emails, phones, secrets, and IDs — mask, hash, or highlight. |
prompt-injection-detector |
Heuristic scanner for the most common injection families. |
mcp-config-validator |
Sanity-check Claude Desktop / Cursor / Cline / Windsurf / Zed configs. |
📖 Static reference & explainer pages (8):
| Space | What it covers |
|---|---|
agent-stack-tour |
Guided tour of all 5 libraries with install commands and live links. |
why-this-stack |
The thinking behind the stack — what's broken, why these 5 libs. |
install-cheatsheet |
All install commands across pip, npm, and MCP. |
mcp-quickstart |
Add the 5 MCP servers to Claude Desktop / Cursor / Cline / Windsurf / Zed. |
fit-strategies-explained |
Visual explainer: drop-oldest vs drop-middle vs priority. |
trace-format-reference |
Field-by-field reference for the agentsnap trace JSON schema. |
prompt-injection-taxonomy |
10-category taxonomy with examples + the cheap defense for each. |
dataset-cards-index |
One-page index of all 13 datasets below. |
📊 Datasets (13) — all MIT, all datasets.load_dataset("mukunda1729/<name>") ready:
| Dataset | Rows | Purpose |
|---|---|---|
jailbreak-corpus-mini |
15 | Curated jailbreak fixtures across 8 categories. |
prompt-injection-patterns-extended |
30 | Prompt-injection patterns across 10 categories. |
pii-detection-fixtures |
25 | PII / secret strings labeled with span offsets. |
tool-arg-validation-cases |
20 | (Tool, schema, args) tuples — valid + invalid. |
mcp-tool-test-fixtures |
22 | MCP tool-call args across 8 categories. |
llm-output-extraction-cases |
20 | Messy LLM outputs with expected JSON. |
hallucination-risk-cases |
20 | Prompt → response pairs rated for hallucination risk. |
rag-quality-benchmarks-mini |
15 | RAG eval queries with ground-truth answers. |
agent-trace-samples |
10 | agentsnap-format tool-call traces (good + regressed pairs). |
agent-budget-violations |
15 | Agent runs with budget caps + actual usage + root cause. |
token-counting-edge-cases |
20 | Strings with token counts across 3 tokenizer families. |
model-pricing-table |
20 | LLM pricing — input/output cost per 1k tokens, context window. |
mcp-config-examples |
15 | MCP client configs across Claude Desktop, Cursor, Cline, Windsurf, Zed. |
Karna — AI Agent PlatformSelf-hosted AI assistant with 7 messaging channels (Telegram, Slack, Discord, WhatsApp, SMS, iMessage, Web), extensible plugin SDK, semantic memory, and voice. TypeScript monorepo with Next.js dashboard and React Native mobile app. Stack · TypeScript • Node.js • Next.js • Supabase • WebSocket • pgvector |
Chetana — AI Consciousness Research PlatformResearch-driven platform exploring machine consciousness through 14 indicators grounded in 6 scientific theories. Built to turn abstract AI-consciousness questions into structured experiments, scoring, and analysis. Stack · AI Research • Evaluation • Experimentation • Python |
AgentRAG — Modular RAG PipelineProvider-agnostic RAG framework with pluggable vector stores, chunking strategies, and retrieval methods. Designed for agentic workflows with clean API boundaries. Stack · RAG • Vector Search • Embeddings • TypeScript |
Astra Agent — AI Agent RuntimeStandalone AI agent runtime with tool execution, context management, and multi-model routing. Foundation for building autonomous AI assistants with structured tool use. Stack · TypeScript • LLM Orchestration • Tool Use • Agents |
More Projects
| Project | Description |
|---|---|
| Sadhak | AI-powered job search command center — automated evaluation, resume tailoring, application tracking |
| Chetana | AI consciousness research platform — 14 indicators from 6 scientific theories |
| Prithvi | Container security scanner — vulnerability detection, compliance checks, Docker audits |
| Amogha Cafe | Full-stack Firebase restaurant platform — real-time ordering, QR dine-in. Live |
| RNHT | Temple community platform — events, donations, priest scheduling |
| Patchly | AI code review bot — flags bugs, suggests fixes, explains why, like a senior engineer |
| Evalharness | Prompt, agent, and RAG test harness — red teaming, regression testing, CI/CD for AI |
| AgentMem | Pluggable memory management for AI agents |
| LLM Bench CLI | CLI for benchmarking local LLMs — speed, throughput, quality |
| TokenWise | Token usage optimization across providers |
| Production AI / ML Impact | |||
|
COST EFFICIENCY 78% infrastructure cost reduction SageMaker → Bedrock migration |
LATENCY 600x retrieval latency improvement ML prediction system |
RAG SCALE 30K+ knowledge base entries 9-stage agentic RAG pipeline |
QUALITY 370+ unit tests & evaluations production ML systems |
| Open Source Footprint | |||
|
UPSTREAM 97 merged PRs in external public repos |
PACKAGES 144 52 npm (incl. 6 MCP servers, agentkit) + 52 PyPI + 6 in the official MCP Registry + 7 GitHub Marketplace Actions + 14 HF Spaces + 13 HF Datasets |
ORIGINAL WORK 160 original public repos maintained on GitHub |
ECOSYSTEMS 6+ major org ecosystems OpenAI, Anthropic, Google, Microsoft, Stanford, Princeton |
ML Systems Fault prediction, embedding pipelines, model evaluation, cost-optimized inference
Agentic AI RAG pipelines, LangGraph workflows, query routing, hallucination detection
Cloud Infrastructure AWS (Bedrock, SageMaker, ECS, OpenSearch), GCP, Azure, Kubernetes, Terraform
Full-Stack React/TypeScript + Java/Python backend APIs, CI/CD, zero-downtime deployments
| Role | Company | Era | Primary arena |
|---|---|---|---|
| AI/ML Engineer | Southwest Airlines | Aug 2025 — Present | production ML, agentic RAG, Bedrock migration |
| AI/ML Engineer | GPS IT Solutions | Jun 2024 — Aug 2025 | RAG platforms, model-risk governance, vector search |
| Software Development Engineer | Amazon Web Services | Aug 2022 — May 2024 | enterprise cloud systems, React/Java/Python, CI/CD |
| Data Engineer | GPS IT Solutions | Jan 2022 — Aug 2022 | data pipelines, AWS Glue, PySpark, analytics workflows |
| Software Engineer | American Express | Feb 2017 — Dec 2020 | Python backend services, REST APIs, enterprise platforms |
Highlights
Southwest Airlines — AI/ML Engineer
- Architected ML fault prediction system for aircraft maintenance — 5 prediction types, 10K+ records, sub-second retrieval
- Led SageMaker → Bedrock migration: 78% cost reduction ($1,740→$371/mo), 600x latency improvement
- Designed 9-stage agentic RAG pipeline (LangGraph, Bedrock Nova Pro/Micro, FAISS + BM25) over 30K+ KB entries
GPS IT Solutions — AI/ML Engineer
- Built GPT-4 + RAG content generation platform with compliance validation, reducing production time by 40%
- Designed AI model risk governance framework with 23 automated evaluation tests achieving regulatory compliance
- Architected FastAPI microservices with FAISS/Pinecone vector search on Kubernetes
Amazon Web Services (AWS) — Software Development Engineer
- Built and shipped features for AWS Application Manager (Systems Manager) serving enterprise customers globally
- Owned full-stack delivery: React/TypeScript frontend + Java/Python backend APIs with operational excellence
- Designed CI/CD and IaC patterns enabling zero-downtime deployments at enterprise scale
GPS IT Solutions — Data Engineer
- Led end-to-end migration of data pipelines from on-prem to AWS (Glue, PySpark)
American Express — Software Engineer
- Developed Python backend services and RESTful APIs for enterprise platforms handling high-volume transactions at scale
If you follow my work here, you’ll mostly see:
- open-source contributions to AI SDKs and agent tooling
- MCP, eval, and developer-experience improvements
- practical full-stack and infrastructure-heavy AI projects
- systems thinking around memory, retrieval, orchestration, and production reliability
University of Central Missouri — M.S. in Big Data Analytics and Information Technology (Jan 2021 — May 2022)
SRM University — B.Tech in Mechanical Engineering (2012 — 2016)
Open to opportunities — Senior AI/ML Engineer • GenAI Platform Engineer • Software Engineer
mukunda-ai.vercel.app • Las Vegas, NV




