# reliable agents as code
Agents, datasets, and evals are code — branch a change, review it in a PR, gate every deploy in CI. The reliability loop your code already has, now for your agents.
Git-native · Open core · OpenTelemetry · MCP
## the platform
Your source of truth is git — agents, datasets, and evals as files. The platform adds what files alone can't: hosted tracing, experiments, prompt management, alerts, annotation queues, and real-time collaboration.

## the problem
Most teams find out from users, not dashboards. Here's what that looks like.
Silent regression
−23%
response quality
Prompt shipped Monday. Response quality dropped 23%. No alert fired. A user complained on Friday.
Caught 4 days later
Runaway loop
$47,000
total damage
Two agents got stuck coordinating. Week 1 cost $127. Week 4 cost $18,400. The team mistook the spike for user growth.
Caught on the invoice
Fluent failure
200 OK
HTTP status
Agent called a tool with a wrong parameter. Database returned zero rows. The agent told the user: "I couldn't find any data." Every dashboard stayed green.
Never caught
The fix is the same one that works for code: evals, observability, alerts, and version control.
That's AgentMark.
## the solution
Commit a change, gate it with evals in CI, ship it, then trace, alert, and fix in production. One workflow, and every step lives in your repo.
Catch regressions before deploy with evals in CI, then catch what slips through with traces and alerts. It's one workflow — versioned in git, not scattered across a dashboard.
## in production
Metrics, traces, experiments, and alerts — each tied back to the exact agent, dataset, or eval in git that produced it. One workflow, from commit to production.
Know your cost, latency, and error rate before a user complains — not after.
“AgentMark is, by far, the best agent representation layer of this new stack. You're the only people I've seen that take actual developer needs seriously in this regard.”

Dominic Vinyard
Founding AI Designer
San Francisco, CA
## integrations
No proprietary SDKs. Standard OpenTelemetry for traces, git for version control, and direct support for every major model and framework.
## editor-native
Agents, datasets, and evals are just files — your AI assistant reads, writes, and refactors them over MCP. Ask it what failed in prod and it pulls the trace, names the root cause, and points at the line to fix.
Connect AgentMark via MCP and ask Claude Code exactly what went wrong. It pulls the spans, identifies the root cause, and tells you precisely where to fix it.
// any MCP-capable editor
## ownership
Closed platforms own your data and lock you into their SDKs. AgentMark doesn't.
## faq
The questions engineers ask before they commit.
Agents, datasets, and evals are stored in your own git repository (GitHub or GitLab) — in a folder with your application code or in a separate repository. Spans are stored in GCP us-central-1 in a ClickHouse database, encrypted at rest and in transit. For enterprise customers with data residency requirements, we offer a self-hosted data plane that keeps all trace data within your own infrastructure.
Yes — securely. Provider keys you configure for your deployed handler are stored as environment variables in an encrypted vault, scoped to your app, excluded from logs, and only decrypted at build time or when an authorized user explicitly reveals a value in the dashboard.
Yes. Prompts are MDX files and datasets are JSONL files in your repo — edit them in your IDE like any other code file. Run the local dev server to test against real model calls before pushing to production. No cloud connection required for local development.
AgentMark integrates with popular AI SDKs like Vercel AI SDK, Mastra, and Pydantic AI. Our SDKs provide adapters that let you use AgentMark prompts with your existing code. Just install the package and point it at your prompts folder.
Minimal. Prompts are fetched via a globally distributed edge network with minimal latency. Tracing is async and non-blocking. We're built for production workloads where every millisecond counts.
AgentMark is open core. The core SDK and prompt format are open source (AGPL-3.0) at github.com/agentmark-ai/agentmark. Multi-user access, managed storage, alerting, and hosted observability are available through our cloud platform.
We support TypeScript/JavaScript and Python. Because prompts are stored as MDX files, they're language-agnostic and can be parsed by any runtime.
## get started
Tell us what you're building — we'll get you set up.