The 6 Ways Monitoring Tools Bill You (One Is a Trap)

I have spent more of my life than I’d like comparing monitoring tools on the wrong axis.

You do it too. You open six tabs, you build a little mental spreadsheet — does it do traces, does it do SNMP, does it have an app, is the UI any good — and you pick the one that wins on features. Then six months later you get an invoice that makes you physically recoil, and you realize the feature list was never the thing that mattered. The pricing model was the thing that mattered. You compared the paint jobs and ignored which cars run on jet fuel.

Here’s the uncomfortable truth I’ve backed into after enough renewal conversations: the pricing model decides your monitoring bill far more than the tool does. Two products with near-identical feature sets can differ 30x in what they cost you, purely because of how they count. And almost nobody comparison-shops the counting.

So let’s fix that. There are basically six ways a monitoring tool charges you. Five of them are fine until they aren’t. One of them is a trap.

1. Per host

The classic. You pay per server (or per “host,” which increasingly means per container, which is where the fun starts). Datadog is the poster child, but most of the old-guard APM world works this way.

The pitch is intuitive: more machines, more money, fair enough. The trap is that “host” stopped meaning “server” around 2018. Spin up a Kubernetes cluster that autoscales pods, and your host count breathes in and out all day — and many of these tools bill on the high-water mark, the single highest count they saw that hour. Your Black Friday traffic spike isn’t just a traffic spike anymore; it’s a line item.

The per-host high-water-mark surprise:
─────────────────────────────────────
  Normal day:        40 hosts  × $23/host/mo  ≈  $920/mo
  Autoscale event:  120 hosts  (for ~2 hours)
  What you're billed: 120 hosts × $23/host/mo  ≈  $2,760/mo
                      └─ because they keep the peak, not the average

And that’s before custom metrics, which on some platforms bill per unique tag combination — the single most effective way I’ve ever seen to turn a well-meaning engineer adding a customer_id label into a four-figure monthly surprise.

2. Per GB ingested

The logs world’s favorite, and Splunk’s original sin. You pay for every gigabyte you send in. Sounds reasonable — you use more, you pay more.

The trap is that you don’t control your own log volume. A developer ships a debug statement in a hot loop. A dependency starts logging stack traces. A retry storm kicks off at 2am. None of these are your decision, and all of them land on your bill at full freight. I have watched a single misconfigured DEBUG flag turn a predictable logging budget into a budget meeting.

The vendors know this, which is why half of them now sell you a second product — a pipeline tool — whose entire job is to throw away the data you’re paying the first product to accept. You are, in effect, paying twice: once to generate the data, once to not store it.

3. Per event / per span

The newer, cleverer-sounding version: pay per log event, per trace span, per “thing that happened.” Sentry bills spans this way; Last9 bills events. It feels more granular and fair than per-GB.

It is also, structurally, the same trap with better branding. You still don’t control how many events a chatty service emits. A noisy microservice having a bad day is an event spike, and an event spike is a cost spike. The unit got smaller; the lack of control over the unit did not.

4. Per node

The cost-control crowd’s answer — Groundcover, CubeAPM, a wave of eBPF-based newcomers. You pay per node (a real machine), and you can shove as much telemetry through it as you want without the meter spinning on data volume.

This one I actually like, with one asterisk. Several of these run bring-your-own-cloud: the data stays in your VPC, on your storage and compute. The headline “no data-volume bill” is real — but some of that cost didn’t vanish, it moved into your AWS account where it’s harder to see. It’s still usually cheaper and far more predictable. Just know that “we don’t charge for data” and “data is free” are not the same sentence.

5. Flat / unlimited

The honest outlier. Gravwell prices per indexer and lets you ingest unlimited data through each one. A few others do flat tiers. The pitch is “we will not punish you for monitoring more,” and it’s refreshing precisely because it’s so rare.

The asterisk here is smaller: flat-rate vendors tend to be smaller, more niche, with thinner ecosystems, and the “unlimited” usually has a throughput ceiling somewhere in the fine print. But as a philosophy, “predictable” beats “fair-sounding-but-variable” every single time you have to defend a budget.

6. Free (you pay in time)

Prometheus, Zabbix, the whole open-source wing. The license costs nothing. This is the model everyone reaches for after their first bill-shock, and it is genuinely the right answer for a lot of people.

But “free” relocates the cost; it doesn’t delete it. Prometheus isn’t a product, it’s a starter kit — by the time you’ve added Alertmanager, Grafana, and Thanos or Mimir for the long-term storage it doesn’t ship with, you’re operating a small distributed system, and the person operating it is not free. The trade is real and often worth it. Just don’t pretend the column reads $0; it reads “one of your engineers, partially.”

So which one quietly bankrupts you?

Look back at the six and a pattern jumps out. Three of them — per host (high-water), per GB, per event — share a single fatal property: the unit you’re billed on is something you don’t fully control. Your load varies. Your log volume varies. Your span count varies. A variable-cost model wired to a variable you can’t govern is a bill that does whatever it wants, usually at the worst possible time.

The other three — per node, flat, free — are predictable. The bill is tied to something you actually decide: how many machines to run, what tier to buy, how much engineer time to spend. You can defend those numbers in a budget meeting because you chose them.

That’s the trap, and it’s not really a specific tool. It’s any usage-based model meeting real-world, spiky, out-of-your-hands load. The tool can be excellent. The model is what bites.

What I’d actually do

Match the model to your reality, not the feature checklist:

Spiky, autoscaling, or you can’t predict load? Run screaming from high-water-mark and per-event pricing, or wrap it in a hard spend cap on day one. The cap is not optional.
Cost-sensitive and growing? Per-node or flat. Predictable beats cheap-on-paper.
Have the engineering time and want zero license cost? Open source — but staff it honestly, and budget the person, not just the server.
Whatever you pick: turn on billing alerts before you turn on the tool. The surprise invoice is always avoidable in hindsight and never avoided in advance.

None of this is in the marketing. The marketing is all features, because features are where the products differentiate and pricing is where they’d rather you not look too closely. Look closely.

FAQ

Why is per-host pricing risky for Kubernetes?

Because a “host” in Kubernetes is often an ephemeral, autoscaling pod, and many tools bill on the highest host count they observed in a window (the high-water mark) rather than the average. Your bill tracks your worst spike, not your normal state — so the more elastic your infrastructure, the less predictable your invoice.

Is open-source monitoring actually free?

The license is. The operation isn’t. Tools like Prometheus or Zabbix cost zero dollars to download and real money in engineering time to run, scale, and keep alive — long-term storage, high availability, and upgrades are all on you. It’s frequently the right call; just budget the human, not just the hardware.

What’s the difference between per-GB and per-event pricing?

Mostly the size of the unit. Per-GB bills on the volume of data ingested; per-event (or per-span) bills on the count of discrete things. Both share the same weakness: you don’t control how much your applications emit, so a noisy service or a debug-log flood drives the bill regardless of your intentions.

How do I stop a surprise monitoring bill?

Set a hard spend cap and budget alerts at the provider before you send a single byte, prefer predictable models (per-node, flat, or self-hosted) if you can’t forecast load, and put a pipeline or sampling step in front of high-volume sources so you’re not paying to ingest noise. The fix is always boring and always works.

Which pricing model is cheapest?

Wrong question. The cheapest-looking model on a calm day can be the most expensive on a bad one. The right question is which model is most predictable for my workload — because a slightly higher bill you can forecast beats a lower one that occasionally triples without warning.

So here’s the thing I wish someone had handed me years ago: one tool from each model, lined up on the axis that actually matters. It’s pulled straight from the database, so when the prices drift — and they will — they get fixed in one place.

	Datadog Network Monitoring Datadog	Splunk Splunk (Cisco)	Last9 Last9	groundcover groundcover	Gravwell Gravwell, Inc.	Prometheus open source / community (CNCF)
Category	Network / NMS	Logs	Observability / APM	Observability / APM	Logs	Infra & metrics
License	Proprietary	Proprietary	Proprietary	Proprietary	Proprietary	Open source
Deployment	SaaS	SaaS or self-hosted	SaaS or self-hosted	SaaS or self-hosted	SaaS or self-hosted	Self-hosted
Monitors	NetworkServersMetricsLogsTracesSyntheticsCloudK8s	LogsMetricsTracesServersSecurityCloudK8s	MetricsLogsTracesK8s	ServersMetricsLogsTracesK8sCloud	LogsSecurityNetwork	MetricsServersK8sCloudNetwork
Pricing	Per hostPer deviceUsage credits Free tier ✓	Per GB ingestUsage creditsQuote-only Free tier ✓	Per event No free tier	Per node Free tier ✓	Flat tier Free tier ✓	Free / OSS Free tier ✓
Cost	High NPM from ~$5/host/mo; platform $15-23+/host/mo; high-water-mark billing.	Enterprise Ingest (per GB/day), workload, or entity pricing; ~$1,800-$18,000/yr per 1-10 GB/day.	Medium Per-event-ingested (no per-host/user charges); Pro from $1,150/mo (1B events/mo).	Medium Per-node/host (~$30/host/mo Pro), not per-GB; bring-your-own-cloud keeps data in your VPC.	Medium Priced per indexer — each has unlimited ingest; paid tiers are quote-only.	Free Free software; cost is engineering time + the surrounding stack.
Self-host effort	—	Heavy	—	Moderate	Moderate	Heavy
Maturity	Incumbent	Incumbent	Rising	Rising	Established	Incumbent
Protocols	SNMPNetFlow / sFlow / IPFIXOTLP	Syslog				PrometheusSNMP
The catch	Notorious bill surprises — high-water-mark, multi-SKU billing balloons unpredictably, and as a pure NMS it's weaker on topology/config than a dedicated tool.	Famous for cost blowups — ingest-based pricing means a noisy app or debug-log flood can blow the annual budget, and you index everything you ingest whether you query it or not.	Event-based pricing is its own forecasting puzzle (a noisy service = event spike = cost spike), and it's a smaller vendor — strong on cardinality/triage, less of a turnkey full suite.	Per-node + BYOC means YOU run and pay for the storage/compute in your account — the "no data-volume bill" savings partly shift into your own infra and ops, and eBPF gives less code-level depth than SDK APM.	The flat "unlimited ingest" pricing is genuinely differentiated, but it's a smaller/niche vendor with a steeper learning curve (its own query pipeline language) and a thinner ecosystem.	Single-node by design — no native HA or long-term storage — so any serious deployment becomes a 4-5 component stack (Alertmanager, Grafana, Thanos/Mimir, exporters) you assemble and operate yourself.

Built from the monitoring tool database — these figures live there, not here, so they only have to be right once.