SRE Intelligence Platform

Know Before You Deploy.
Recover Before It Hurts.

Strake tells you whether it's safe to push right now — based on system health, error budget burn, open incidents, and deploy velocity. When things break anyway, your runbooks are connected to alerts, your team knows what to do, and the knowledge doesn't live only in whoever has been here longest.

Teams are running more infrastructure with fewer SREs than at any point in the last decade. The tools haven't kept up.

Get Started Free See How It Works

67→23

min MTTR

with structured runbooks

43%

of incidents

preceded by a recent deploy

person

who knows at 3am

strake / deploy-gate

Current conditions:

feat/checkout-v2 → main

payment-service · 47 files changed · prod-us-east-1

prod-us-east-1

Strake Recommendation

Hold Deploy

Error budget at 18% remaining with SLO window closing in 72 hours. 2 active incidents on service dependencies. 3 deploys in the last 2 hours. Risk of cascading failure is elevated.

Error Budget18%

System HealthDegraded

Open Incidents2 active

Deploy Velocity3 / 2hr

Active Runbooks2 triggered

RB-14postgres-high-connection-countLive

RB-07payment-service-5xx-spikeMonitoring

Updated just now

The Problem

Your team shouldn't need a dedicated SRE
to run production reliably.

But without the right infrastructure, every incident becomes a fire drill.

// 01

Only one person knows what to do at 3am

When the engineer who "just knows" is on vacation, incidents that should take 20 minutes take 3 hours. Knowledge concentrated in one person isn't expertise — it's a single point of failure.

// 02

Runbooks that nobody uses

Your runbooks are in Confluence. Your engineers are in the terminal. Nobody opens Confluence at 3am. The best runbook is the one that shows up automatically when an incident opens — not the one you have to search for.

// 03

43% of incidents start with a deploy

Your team spends the first 30 minutes of every incident figuring out if a recent deploy caused it. That context should be automatic. It isn't — unless you build the connection between your deploy history and your incident workflow.

Deploy Gate

What Strake reads
before you push.

Every deploy decision is based on five live signals pulled from your existing stack. No new agents. No new dashboards. Strake reads what's already there and tells you what it means right now.

Signal 01 — SLO Budget

How much error budget is left in the current window, and how fast it's burning. Strake flags when a deploy risks exhausting the rest of it before the window closes.

Signal 02 — Active Incidents

Open incidents across the service and its direct dependencies. Deploying into an active incident almost always makes things worse and root cause harder to find.

Signal 03 — Change Velocity

How many deploys have gone out in the last few hours. High velocity makes root cause isolation nearly impossible when something breaks.

Signal 04 — System Health

Current health of the target service and its dependency graph — latency, error rates, resource saturation. The baseline you're deploying into matters.

Signal 05 — Dependency Changes

Diffs lockfiles between builds. Flags new packages, major version bumps, and suspicious publish timing.

strake / deploy-gate · all servicesupdated 12s ago

Service

Budget

Incidents

Velocity

Health

Deps

Gate

payment-service

18%

3/2hr

Degraded

1 ⚠

HOLD

api-gateway

84%

1/2hr

Nominal

—

checkout-svc

41%

2/2hr

Elevated

2 ⚠

HOLD

user-service

92%

0/2hr

Nominal

—

notification-svc

5/2hr

Critical

3 ⚠

HOLD

search-indexer

77%

1/2hr

Nominal

—

Services monitored

Currently blocked

Clear to deploy

Supply Chain Defense

Your CI pulls
what it's told.
Strake asks
what changed.

Supply chain attacks exploit the gap between a malicious publish and your next CI build. That window is measured in minutes — not days. CVE databases lag behind by design: a vulnerability has to be discovered, reported, catalogued, and published before your scanner knows it exists.

Strake diffs your lockfiles between the last known-good deploy and the current one. Every new package, every version bump, every maintainer change — flagged before it reaches production. No CVE required.

// The attack window scanners miss

The axios npm compromise was live for 3 hours before it was pulled. The LiteLLM attack affected 500,000 machines in 40 minutes. Traditional scanners check CVE databases — Strake checks what actually changed.

DEPENDENCY ANALYSIS

live

package-lock.json•3a7f2c1 → 8b4e9d2

3 changes detected

CRITICAL

axios2.1.0 → 2.1.1

Published 47 minutes ago

Maintainer changed since last release

WARNING

lodash4.17.21 → 5.0.0

Major version bump

No code changes in this PR reference this package

react-query5.62.0 → 5.62.1

Patch release · well-established package

Gate Verdict

HOLD1 critical · 1 warning · 1 clean

Resolve critical before deploy

// 5.1

Lockfile Drift

Detects changes in package-lock.json, yarn.lock, go.sum, and requirements.txt between your last known-good deploy and this one. Every change is accounted for.

// 5.2

Phase 2

Suspicious Timing

Flags dependencies published within hours of your build. The attack window most scanners miss entirely — before a CVE exists, before anyone has noticed.

// 5.3

Zero CVE Required

Doesn't wait for a vulnerability to be reported. Catches supply chain compromises at deploy time — not days later when the CVE database catches up.

RB-14postgres-high-connection-count

Triggered 18 min agoLIVE

Alert

postgres.conn > 90%

Service

postgres-primary

Connections

100 / 100

On-call

@rnewton

Steps4 / 6 complete

Verify alert is not spurious

✓ Confirmed — connections at 100/100, p99 latency 1840ms

02:14:18

Check for long-running queries

✓ Found 14 queries > 30s — all from payment-service v2.1.7

02:14:31

Verify pgBouncer pool configuration

✓ pool_size: 100 · max_client_conn: 100 · pool_mode: transaction

02:14:47

Correlate with recent deploys

✓ payment-service v2.1.7 deployed 02:13:44 — N+1 query pattern introduced

02:15:02

Resize pool or initiate rollback

Decision required — pool resize (faster) vs rollback v2.1.7 (safer). See notes for tradeoffs.

In progress

Verify recovery and close incident

Monitor error rate for 10 min · update incident record · add postmortem note

—

This runbook has run 7 times · last updated 2 days ago by @rnewtonView incident history →

Runbook Engine

The Notion page
from 2021 is not
a runbook.

Strake connects runbooks directly to the alerts that trigger them. When PagerDuty fires, the right runbook opens — not a search bar, not a Notion space, not a Slack message asking who knows what to do.

Steps are tracked. What the engineer found, what they decided. Every time the runbook runs, the record gets richer. The next engineer who gets paged starts from that, not from zero.

RB-14 · Incident History

Date

Resolution

Time

Jan 15

Pool resize · resolved

22min

Dec 28

Rollback v2.0.4 · resolved

41min

Dec 09

Query kill + pool flush

18min

Nov 22

Escalated · manual DBA

94min

// What this means

The Nov 22 incident took 94 minutes. The last three averaged 27 minutes. That's the runbook getting smarter — and it's the clearest signal of what Strake actually does.

Built for the stack you already run

datadogDatadog

pagerdutyPagerDuty

gh-actionsGitHub Actions

prometheusPrometheus

grafanaGrafana

kubernetesKubernetes

opsgenieOpsGenie

slackSlack

+ CloudWatch · Terraform · Loki · Confluence · Notion · GCP Cloud Run · AWS ECS · and more

The runbook for our database failover lived in a Confluence page that hadn't been opened in 14 months. Found that out at 3am on a Sunday.

Senior SRE · Series B fintech

We had a deploy gate. It was a Slack message: "anyone know if it's okay to push right now?" Someone always said yes. Then we'd find out.

Staff Engineer · Infrastructure team

The tribal knowledge problem is real. Three incidents in the last year that came down to one person not knowing what another person knew.

VP Engineering · 80-person startup

Free during private beta

Get early access.

Strake is in private beta. We're working directly with early teams to shape the product. No credit card. No commitment. Just a conversation about deploy safety.

Get Started Free Book a Call

Built for the engineer
who gets the page.

Strake is in private beta with a small cohort of senior engineers and SRE leads. If you're on-call and your MTTR isn't where it needs to be, we want to talk.

Free during private beta

Free to Use

No credit card required

Get Started Free Book a DemoNo sales script. No demo theater.
A real conversation with the team.

Know Before You Deploy.Recover Before It Hurts.

Your team shouldn't need a dedicated SREto run production reliably.

What Strake readsbefore you push.

Your CI pullswhat it's told.Strake askswhat changed.

The Notion pagefrom 2021 is nota runbook.

Get early access.

Built for the engineerwho gets the page.

Know Before You Deploy.
Recover Before It Hurts.

Your team shouldn't need a dedicated SRE
to run production reliably.

What Strake reads
before you push.

Your CI pulls
what it's told.
Strake asks
what changed.

The Notion page
from 2021 is not
a runbook.

Built for the engineer
who gets the page.