close

DEV Community

Samson Tanimawo profile picture

Samson Tanimawo

Building the first Agentic SRE Platform. 100 AI agents that detect, investigate, and resolve incidents autonomously.

Location Houston Joined Joined on  Personal website https://novaaiops.com

Pronouns

He/Him/His

The Incident Commander Role: Running Incidents Without Chaos

The Incident Commander Role: Running Incidents Without Chaos

Image 1
Comments
2 min read
Terraform at Scale: Lessons from Managing 500+ Resources

Terraform at Scale: Lessons from Managing 500+ Resources

Comments
2 min read
Why Your Microservices Need Circuit Breakers (And How to Add Them)

Why Your Microservices Need Circuit Breakers (And How to Add Them)

Comments
2 min read
The On-Call Handoff That Prevents Dropped Incidents

The On-Call Handoff That Prevents Dropped Incidents

Comments
2 min read
SLOs That Product Managers Actually Understand

SLOs That Product Managers Actually Understand

Comments
2 min read
MTTR Optimization: The 7 Levers That Actually Move the Needle

MTTR Optimization: The 7 Levers That Actually Move the Needle

Comments
3 min read
Service Maps: The Architectural Clarity Your Team Is Missing

Service Maps: The Architectural Clarity Your Team Is Missing

Comments
2 min read
AI in Incident Response: Hype vs. Reality in 2024

AI in Incident Response: Hype vs. Reality in 2024

Comments
3 min read
Monitoring Costs Are Out of Control — Here's How to Fix It

Monitoring Costs Are Out of Control — Here's How to Fix It

Comments
2 min read
Hiring SREs: What I Look For After Interviewing 100+ Candidates

Hiring SREs: What I Look For After Interviewing 100+ Candidates

Comments
3 min read
Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Comments
2 min read
Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Comments 1
2 min read
Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Comments
2 min read
Chaos Engineering for Teams That Aren't Netflix

Chaos Engineering for Teams That Aren't Netflix

Comments
3 min read
Distributed Tracing: The Missing Piece of Your Observability Stack

Distributed Tracing: The Missing Piece of Your Observability Stack

Comments
3 min read
The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
On-Call Wellness: Protecting Your Engineers from Burnout

On-Call Wellness: Protecting Your Engineers from Burnout

Comments
2 min read
On-Call Wellness: Protecting Your Engineers from Burnout

On-Call Wellness: Protecting Your Engineers from Burnout

Comments
2 min read
Post-Mortem Best Practices That Actually Drive Change

Post-Mortem Best Practices That Actually Drive Change

Comments
2 min read
Post-Mortem Best Practices That Actually Drive Change

Post-Mortem Best Practices That Actually Drive Change

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Error Budgets in Practice: A No-BS Guide

Error Budgets in Practice: A No-BS Guide

Comments
2 min read
3am Incident Response: What I Learned from 200+ Pages

3am Incident Response: What I Learned from 200+ Pages

Comments
2 min read
The SRE's Guide to Surviving Tool Sprawl

The SRE's Guide to Surviving Tool Sprawl

Comments
2 min read
I Reduced Our Alert Volume by 90%. Here's the Playbook

I Reduced Our Alert Volume by 90%. Here's the Playbook

Comments
2 min read
Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Comments
3 min read
Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Comments
3 min read
loading...