Paste your service traffic rows as service, volume, error rate percent, p95 latency in ms, and an optional criticality, then select Build sampling policy. A sample service set is loaded so you can try it right away.
About the OpenTelemetry sampling policy card
The OpenTelemetry sampling policy card turns a list of services and their traffic into a starting sampling policy before observability costs spike. Paste one row per service with its request volume, error rate, and p95 latency, plus an optional sampler config, and the tool scores each service and recommends what to keep: always sample error spans, tail-sample slow traces above a latency threshold, and head-sample lower-criticality traffic at a reduced rate. It is built for the moment when trace volume is growing faster than the value you get from it and you need a defensible policy to take into a collector config or a cost review.
A sample service set is loaded so you can see a policy right away: a critical checkout path that is both slow and error-prone, a high-volume edge service that can be head-sampled hard, and a service flagged critical by hand. Everything runs in your browser. The service rows and config you paste are never uploaded, logged, or stored. When the policy looks right, download the policy CSV for a review or copy the policy markdown, which includes a vendor-neutral collector starter you can adapt.
How to use
- List your services with their request or span volume, error rate as a percentage, and p95 latency in milliseconds.
- Paste one service per line as service, volume, error rate, p95 latency, and an optional criticality, separated by a pipe, tab, or comma.
- Optionally set a sampler config: default head rate, slow latency threshold, error keep rate, and the error percent that marks a service as elevated risk.
- Select Build sampling policy to score each service and get a per-service head and tail recommendation.
- Download the policy CSV for a review, or copy the policy markdown with its collector starter.
Worked examples
A slow, error-prone service is marked critical
The sample checkout service has a high error rate and p95 latency above the slow threshold, so it is scored critical: keep a high head rate, tail-sample slow traces, and always keep errors.
A high-volume edge service is head-sampled hard
The sample edge service has huge volume but a tiny error rate and fast responses, so its baseline head rate is trimmed well below the default to cut cost while errors are still kept.
An explicit criticality label is respected
A service tagged critical in the criticality column keeps a high head rate even if its metrics alone would not score it that high.
Frequently asked questions
- What can I paste in?
- One service per line, with columns in this order: service name, volume, error rate as a percentage (for example 2.4 for 2.4 percent), p95 latency in milliseconds, and an optional criticality label (critical, high, standard, or low). Columns can be separated by a pipe, a tab, or a comma. A header row with service as the first column is detected and skipped, and lines that have no service name are reported as parse warnings instead of being silently dropped.
- How does it decide criticality?
- If you provide a criticality column, that value is used. Otherwise the tool scores each service from its error rate and p95 latency against the thresholds in the sampler config: a service that is both above the high error percent and above the slow latency threshold scores critical, one that is above either scores high, and quiet, fast, low-error services score low. Criticality drives how much baseline traffic the policy keeps.
- How are the head and tail rates chosen?
- Errors are always kept, and any service whose p95 latency is at or above the slow threshold is tail-sampled for slow traces. The head sampling rate starts from the configured default and is raised for critical and high services so baseline traces stay observable, then trimmed for low-criticality and very high-volume services, which are the biggest cost drivers. The result is a per-service head rate plus an error and latency policy.
- Is my pasted data sent anywhere?
- No. The tool parses and scores everything in your browser. The service rows and config you paste are never uploaded, logged, or stored, and the CSV and markdown you export are generated locally. The only analytics recorded are coarse bands, such as a bucketed service count and a criticality band, never service names, endpoints, or raw metrics.
- Can I apply this policy directly to my collector?
- Not as-is. The copied markdown includes a vendor-neutral OpenTelemetry Collector tail_sampling starter with the error, latency, and head-baseline policies, but it is a starting point, not an authoritative or vendor-specific config. Review it against your own traffic and collector version, and keep an always-on error and slow-trace path, before you roll anything out.
Use this again tomorrow
Save this page so it's one tap away when you need a quick result.
Ready for a quick Daily Challenge?
Play Daily Challenge on sts.games