Streaming aggregation

vmagent and single-node VictoriaMetrics can aggregate incoming samples in streaming mode by time and by labels before data is written to remote storage (or local storage for single-node VictoriaMetrics).

flowchart LR
    A["requests_total{instance=foo}"] --> V[vmagent]
    B["requests_total{instance=bar}"] --> V
    C["requests_total{instance=baz}"] --> V
    V --> D[requests_total:rate5m]

Features #

Stream aggregation has the following features:

It can calculate aggregates on ingested samples before they’re sent to remote destination;
It is applied to all the metric samples received via any supported data ingestion protocol and/or scraped from Prometheus-compatible targets
It can filter out raw samples matched by aggregation rules, so raw data will never reach the remote destination. See -streamAggr.keepInput and -streamAggr.dropInput in aggregation config ;
It allows building flexible processing pipelines ;
It is horizontally scalable .

Limitations #

By default, stream aggregation ignores timestamps of the input samples and processes samples based on their ingestion time. See how to ignore old samples .
Aggregation state is held in the process memory and will be lost on process restart.

Use cases #

Stream aggregation can be used in the following cases:

See skills/stream-aggregation-helper for agent-assisted configuration.

Statsd alternative #

Stream aggregation can be used as statsd alternative in the following cases:

Currently, streaming aggregation is available only for supported data ingestion protocols and not available for Statsd metrics format .

Recording rules alternative #

Sometimes rules may require non-trivial amounts of CPU, RAM, disk IO and network bandwidth for processing on the metrics storage side.

For example, if the http_request_duration_seconds histogram is generated by thousands of application instances, then the alerting query histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[2m])) without (instance)) > 0.5 can become slow, since it needs to scan a large number of unique time series .

This alerting query can be accelerated by pre-calculating the sum(rate(http_request_duration_seconds_bucket[5m])) without (instance) via recording rule . But it only shifts slowness from the alerting rule to the recording rule, since calculation still has to happen somewhere. It is better to substitute the slow recording rule with the following stream aggregation config :

      - match: 'http_request_duration_seconds_bucket'
  interval: 1m
  without: [instance]
  outputs: [rate_sum]
    

It is recommended to set the interval field to a value at least 2 times the matched metrics collection interval.

This stream aggregation generates http_request_duration_seconds_bucket:1m_without_instance_rate_sum output series according to output metric naming . Then these series can be used in alerting rules :

      histogram_quantile(0.99, avg_over_time(http_request_duration_seconds_bucket:1m_without_instance_rate_sum[5m])) > 0.5

This query executes much faster than the original one because it needs to scan fewer time series.

avg_over_time(<aggregate:1m>[5m]) is similar to recording rules calculating rate over a sliding window of 5m with 1m interval. If the sliding window isn’t important, then simply omit the avg_over_time aggregation in the expression.

See the list of aggregate output , which can be specified at the output field. See also aggregating by labels , aggregating histograms .

Reducing the number of stored samples #

If per- series samples are ingested at high frequency, then this may result in high disk space usage, since too much data must be stored to disk. This also may result in slow queries, since too much data must be processed during queries.

This can be fixed with the stream aggregation by increasing the interval between per-series samples stored in the database.

For example, the following stream aggregation config reduces the frequency of input samples to one sample per 5 minutes per each input time series (this operation is also known as downsampling):

        # Aggregate metrics ending with _total with `total` output.
  # See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs
- match: '{__name__=~".+_total"}'
  interval: 5m
  outputs: [total]

  # Downsample other metrics with `count_samples`, `sum_samples`, `min`, and `max` outputs
  # See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs
- match: '{__name__!~".+_total"}'
  interval: 5m
  outputs: [count_samples, sum_samples, min, max]
    

The aggregated output metrics have the following names according to output metric naming :

      # For input metrics ending with _total
some_metric_total:5m_total

# For input metrics not ending with _total
some_metric:5m_count_samples
some_metric:5m_sum_samples
some_metric:5m_min
some_metric:5m_max
    

See the list of aggregate output , which can be specified at the output field. See also aggregating histograms and aggregating by labels .

Reducing the number of stored series #

Sometimes applications may generate too many time series . For example, the http_requests_total metric may have path or user label with too many unique values. In this case, the following stream aggregation can be used for reducing the number of metrics stored in VictoriaMetrics:

      - match: 'http_requests_total'
  interval: 30s
  without: [path, user]
  outputs: [total]
    

This config specifies labels, which must be removed from the aggregate output, in the without list. See these docs for more details.

The aggregated output metric has the following name according to output metric naming :

      http_requests_total:30s_without_path_user_total

See the list of aggregate output , which can be specified at the output field. See also aggregating histograms .

Counting input samples #

If the monitored application generates event-based metrics, then it may be useful to count the number of such metrics at the stream aggregation level.

For example, if an advertising server generates hits{some="labels"} 1 and clicks{some="labels"} 1 metrics per each incoming hit and click, then the following stream aggregation config can be used for counting these metrics every 30-second interval:

      - match: '{__name__=~"hits|clicks"}'
  interval: 30s
  outputs: [count_samples]
    

This config generates the following output metrics for hits and clicks input metrics according to output metric naming :

      hits:30s_count_samples count1
clicks:30s_count_samples count2

See the list of aggregate output , which can be specified at the output field. See also aggregating by labels .

Summing input metrics #

If the monitored application calculates some events and then sends the calculated number of events to VictoriaMetrics at irregular intervals or at too high frequency, then stream aggregation can be used for summing such events and writing the aggregate sums to the storage at regular intervals.

For example, if an advertising server generates hits{some="labels} N and clicks{some="labels"} M metrics at irregular intervals, then the following stream aggregation config can be used for summing these metrics per every minute:

      - match: '{__name__=~"hits|clicks"}'
  interval: 1m
  outputs: [sum_samples]
    

This config generates the following output metrics according to output metric naming :

      hits:1m_sum_samples sum1
clicks:1m_sum_samples sum2

See the list of aggregate output , which can be specified at the output field. See also aggregating by labels .

Quantiles over input metrics #

If the monitored application generates measurement metrics for each request, then it may be useful to calculate the pre-defined set of percentiles over these measurements.

For example, if the monitored application generates request_duration_seconds N and response_size_bytes M metrics per each incoming request, then the following stream aggregation config can be used for calculating 50th and 99th percentiles for these metrics every 30 seconds:

      - match:
  - request_duration_seconds
  - response_size_bytes
  interval: 30s
  outputs: ["quantiles(0.50, 0.99)"]
    

This config generates the following output metrics according to output metric naming :

      request_duration_seconds:30s_quantiles{quantile="0.50"} value1
request_duration_seconds:30s_quantiles{quantile="0.99"} value2

response_size_bytes:30s_quantiles{quantile="0.50"} value1
response_size_bytes:30s_quantiles{quantile="0.99"} value2
    

See the list of aggregate output , which can be specified at the output field. See also histograms over input metrics and aggregating by labels .

Histograms over input metrics #

If the monitored application generates measurement metrics for each request, then it may be useful to calculate a histogram over these metrics.

For example, if the monitored application generates request_duration_seconds N and response_size_bytes M metrics per each incoming request, then the following stream aggregation config can be used for calculating VictoriaMetrics histogram buckets for these metrics every 60 seconds:

      - match:
  - request_duration_seconds
  - response_size_bytes
  interval: 60s
  outputs: [histogram_bucket]
    

This config generates the following output metrics according to output metric naming .

      request_duration_seconds:60s_histogram_bucket{vmrange="start1...end1"} count1
request_duration_seconds:60s_histogram_bucket{vmrange="start2...end2"} count2
...
request_duration_seconds:60s_histogram_bucket{vmrange="startN...endN"} countN

response_size_bytes:60s_histogram_bucket{vmrange="start1...end1"} count1
response_size_bytes:60s_histogram_bucket{vmrange="start2...end2"} count2
...
response_size_bytes:60s_histogram_bucket{vmrange="startN...endN"} countN
    

The resulting histogram buckets can be queried with MetricsQL in the following ways:

An estimated 50th and 99th percentiles of the request duration over the last hour:

      histogram_quantiles("quantile", 0.50, 0.99, sum(increase(request_duration_seconds:60s_histogram_bucket[1h])) by (vmrange))

This query uses histogram_quantiles function.

An estimated standard deviation of the request duration over the last hour:

      histogram_stddev(sum(increase(request_duration_seconds:60s_histogram_bucket[1h])) by (vmrange))

This query uses the histogram_stddev function.

An estimated share of requests with the duration smaller than 0.5s over the last hour:
```
      histogram_share(0.5, sum(increase(request_duration_seconds:60s_histogram_bucket[1h])) by (vmrange))
    
```
This query uses histogram_share function.

See the list of aggregate output , which can be specified at the output field. See also quantiles over input metrics and aggregating by labels .

Aggregating histograms #

Histogram is a set of counter metrics with different vmrange or le labels. Since typical usage of histograms is to calculate quantiles over the buckets change via histogram_quantile function the appropriate aggregation output for this is rate_sum :

      - match: 'http_request_duration_seconds_bucket'
  interval: 5m
  without: [instance]
  enable_windows: true
  outputs: [rate_sum]
    

This config generates the following output metrics according to output metric naming :

      http_request_duration_seconds_bucket:5m_without_instance_rate_sum{le="0.1"}  value1
http_request_duration_seconds_bucket:5m_without_instance_rate_sum{le="0.2"}  value2
http_request_duration_seconds_bucket:5m_without_instance_rate_sum{le="0.4"}  value3
http_request_duration_seconds_bucket:5m_without_instance_rate_sum{le="1"}    value4
http_request_duration_seconds_bucket:5m_without_instance_rate_sum{le="3"}    value5
http_request_duration_seconds_bucket:5m_without_instance_rate_sum{le="+Inf"} value6
    

The resulting metrics can be passed to histogram_quantile function:

      histogram_quantile(0.9, sum(http_request_duration_seconds_bucket:5m_without_instance_rate_sum) by(le))

Please note, histograms can be aggregated if their le labels are configured identically. VictoriaMetrics histogram buckets have no such requirement.

Stream aggregation of histogram buckets is very sensitive to sample delays. Histogram is logical group of independent time series (buckets) that are supposed to be updated uniformly. Aggregation cannot guarantee that all samples belonging to a single histogram are updated within the same aggregation interval. Situations like this can cause accuracy issues . The recommended ways for improving accuracy are:

enable aggregation windows ;
increase interval ;
ensure that vmagent has no resource shortage;
ensure that samples delivery pipeline has no resource shortage or queue accumulation, so it can deliver samples fast.

See the list of aggregate output , which can be specified at the output field. See also histograms over input metrics and quantiles over input metrics .

Routing #

Single-node VictoriaMetrics supports relabeling, deduplication and stream aggregation for all the received data, scraped or pushed. The processed data is then stored in local storage, and can’t be forwarded further.

vmagent supports relabeling, deduplication, and stream aggregation for all the received data, scraped or pushed. See the processing order for vmagent .

Typical scenarios for data routing with vmagent:

Aggregate incoming data and replicate to N destinations. Specify -streamAggr.config command-line flag to aggregate the incoming data before replicating it to all the configured -remoteWrite.url destinations.
Individually aggregate incoming data for each destination. Specify -remoteWrite.streamAggr.config command-line flag for each -remoteWrite.url destination. Relabeling via -remoteWrite.urlRelabelConfig can be used for routing only the selected metrics to each -remoteWrite.url destination.

Deduplication #

If -streamAggr.dedupInterval is enabled, out-of-order samples (older than already received) within the configured interval are treated as duplicates and ignored. See deduplication .

vmagent supports deduplication of samples before sending them to the configured -remoteWrite.url. The deduplication can be enabled via the following options:

By specifying the desired deduplication interval via -streamAggr.dedupInterval command-line flag for all received data or via -remoteWrite.streamAggr.dedupInterval command-line flag for the particular -remoteWrite.url destination. For example, ./vmagent -remoteWrite.url=http://remote-storage/api/v1/write -remoteWrite.streamAggr.dedupInterval=30s instructs vmagent to leave only the last sample for each seen time series every 30 seconds. The de-deduplication is performed after applying relabeling and before performing the aggregation.
By specifying the dedup_interval option individually per each stream aggregation config in -remoteWrite.streamAggr.config or -streamAggr.config configs.

Single-node VictoriaMetrics supports two types of de-duplication:

After storing the duplicate samples in local storage. See -dedup.minScrapeInterval command-line option.
Before storing the duplicate samples in local storage. This type of deduplication can be enabled via the following options:
- By specifying the desired deduplication interval via the -streamAggr.dedupInterval command-line flag. For example, ./victoria-metrics -streamAggr.dedupInterval=30s instructs VictoriaMetrics to leave only the last sample for each seen time series per every 30 seconds. The deduplication is performed after applying -relabelConfig relabeling .
- By specifying dedup_interval option individually per each stream aggregation config at -streamAggr.config.

Labels can be dropped before deduplication is applied. See these docs .

Stream aggregation deduplication is applied before aggregation rules, so duplicate samples are dropped before aggregation. The dropped old samples can be tracked with the vm_streamaggr_dedup_dropped_samples_total metric.

Relabeling #

It is possible to apply arbitrary relabeling to input and output metrics during stream aggregation via input_relabel_configs and output_relabel_configs options in stream aggregation config .

Relabeling rules inside input_relabel_configs are applied to samples matching the match filters before optional [deduplication](# deduplication). Relabeling rules in output_relabel_configs are applied to aggregated samples before they are sent to the remote storage.

For example, the following config removes the :1m_sum_samples suffix added to the output metric name :

      - interval: 1m
  outputs: [sum_samples]
  output_relabel_configs:
  - source_labels: [__name__]
    target_label: __name__
    regex: "(.+):.+"
    

Another option to remove the suffix, which is added by stream aggregation, is to add keep_metric_names: true to the config:

      - interval: 1m
  outputs: [sum_samples]
  keep_metric_names: true
    

Advanced usage #

Ignoring old samples #

By default, all the input samples are taken into account during stream aggregation. If samples with old timestamps outside the current aggregation interval must be ignored, then the following options can be used:

To pass -streamAggr.ignoreOldSamples command-line flag to single-node VictoriaMetrics or to vmagent . At vmagent -remoteWrite.streamAggr.ignoreOldSamples flag can be specified individually per each -remoteWrite.url. This enables ignoring old samples for all the aggregation configs .
To set ignore_old_samples: true option at the particular aggregation config . This enables ignoring old samples for that particular aggregation config.
To enable aggregation windows .
To enable deduplication .

The dropped old samples can be tracked with the vm_streamaggr_ignored_samples_total{reason="too_old"} and vm_streamaggr_dedup_dropped_samples_total metrics.

Ignore aggregation intervals on start #

Streaming aggregation results may be incorrect for some time after the restart of vmagent or single-node VictoriaMetrics until all the buffered samples are sent from remote sources to the vmagent or single-node VictoriaMetrics via supported data ingestion protocols . In this case it may be a good idea to drop the aggregated data during the first N aggregation intervals just after the restart of vmagent or single-node VictoriaMetrics. This can be done via the following options:

The -streamAggr.ignoreFirstIntervals=N command-line flag at vmagent and single-node VictoriaMetrics. This flag instructs skipping the first N aggregation intervals just after the restart across all the configured stream aggregation configs .
The -remoteWrite.streamAggr.ignoreFirstIntervals command-line flag can be specified individually per each -remoteWrite.url at vmagent .
The ignore_first_intervals: N option at the particular aggregation config .

Flush time alignment #

By default, the time for aggregated data flush is aligned by the interval option specified in aggregate config .

For example:

if interval: 1m is set, then the aggregated data is flushed to the storage at the end of every minute
if interval: 1h is set, then the aggregated data is flushed to the storage at the end of every hour

If you do not need such an alignment, then set the no_align_flush_to_interval: true option in the aggregate config . In this case, aggregated data flushes will be aligned to the vmagent start time or to config reload time.

The aggregated data on the first and the last interval is dropped during vmagent start, restart, or config reload , since the first and last aggregation intervals are incomplete, they usually contain incomplete, confusing data. If you need to preserve the aggregated data on these intervals, then set flush_on_shutdown: true option in the aggregate config .

Output metric names #

Output metric names for stream aggregation are constructed according to the following pattern:

      <metric_name>:<interval>[_by_<by_labels>][_without_<without_labels>]_<output>

<metric_name> is the original metric name.
<interval> is the interval specified in the stream aggregation config .
<by_labels> is _-delimited sorted list of by labels specified in the stream aggregation config . If the by list is missing in the config, then the _by_<by_labels> part isn’t included in the output metric name.
<without_labels> is an optional _-delimited sorted list of without labels specified in the stream aggregation config . If the without list is missing in the config, then the _without_<without_labels> part isn’t included in the output metric name.
<output> is the aggregate used for constructing the output metric. The aggregate name is taken from the outputs list at the corresponding stream aggregation config .

Both input and output metric names can be modified if needed via relabeling according to these docs .

It is possible to leave the original metric name after the aggregation by specifying keep_metric_names: true option at stream aggregation config . The keep_metric_names option can be used if only a single output is set in outputs list .

Aggregating by labels #

By default, all labels from the input metrics are preserved in the output metrics. For example, the input metric foo{app="bar",instance="host1"} results to the output metric foo:1m_sum_samples{app="bar",instance="host1"} when the following stream aggregation config is used:

      - interval: 1m
  outputs: [sum_samples]
    

The input labels can be removed via a without list specified in the config. For example, the following config removes the instance label from output metrics by summing input samples across all the instances:

      - interval: 1m
  without: [instance]
  outputs: [sum_samples]
    

In this case the foo{app="bar",instance="..."} input metrics are transformed into foo:1m_without_instance_sum_samples{app="bar"} output metric according to output metric naming .

It is possible to specify the exact list of labels in the output metrics via the by list. For example, the following config sums input samples by the app label:

      - interval: 1m
  by: [app]
  outputs: [sum_samples]
    

In this case the foo{app="bar",instance="..."} input metrics are transformed into foo:1m_by_app_sum_samples{app="bar"} output metric according to output metric naming .

The labels used in by and without lists can be modified via the input_relabel_configs section - see these docs .

Dropping unneeded labels #

To optimize performance and reduce churn rate , it’s important to drop unnecessary labels from incoming samples. Dropping unnecessary labels can significantly enhance efficiency. There are various strategies for label dropping, which can be implemented individually or combined.

Global Label Dropping is configured using the -streamAggr.dropInputLabels flag. It works in conjunction with the -streamAggr.config flag and applies to all matching sections within it. The labels are dropped before input relabeling , deduplication , and stream aggregation are applied. This flag can be used with vmagent , vminsert, and vmsingle .

The following example demonstrates how to drop the replica and az labels for both foo and bar remote write targets:

      /path/to/vmagent \
  -remoteWrite.url="http://foo/api/v1/write" \
  -remoteWrite.url="http://bar/api/v1/write" \
  -streamAggr.config="aggr.yaml" \
  -streamAggr.dropInputLabels="replica,az"
    

Per Remote Write Label Drop is configured using the -remoteWrite.streamAggr.dropInputLabels flag. It should be defined as many times as there are -remoteWrite.url flags. To drop multiple labels for a remote write, use ^^ to separate them. The labels are dropped before input relabeling , de-duplication , and stream aggregation are applied. This flag is available for vmagent only.

In the example below, replica and az are dropped for the foo target, while instance is dropped for the bar target:

      /path/to/vmagent \
  -remoteWrite.url="http://foo/api/v1/write" \
  -remoteWrite.url="http://bar/api/v1/write" \
  -remoteWrite.streamAggr.config="aggr.yaml" \
  -remoteWrite.streamAggr.dropInputLabels="replica^^az" \
  -remoteWrite.streamAggr.dropInputLabels="instance"
    

Config based label drop can be defined within the stream aggregation config using the drop_input_labels key. This method applies to configurations provided via either the -streamAggr.config or -remoteWrite.streamAggr.config flag. When specified, drop_input_labels takes precedence over any label drop definitions set via flags.

Below is an example of an aggr.yaml configuration that drops the replica and az labels from process_resident_memory_bytes metrics:

      - match: 'process_resident_memory_bytes'
  interval: '1m'
  drop_input_labels: ['replica', 'az']
  outputs: ['avg']
  keep_metric_names: true
    

Scaling aggregation horizontally #

Aggregation output is only correct when all contributing samples are processed by the same aggregator instance.

To scale the aggregation horizontally, always shard the input samples in a deterministic way. This can be achieved by building a two layer topology of vmagents where the first layer is responsible for sharding, and the second layer is responsible for aggregating:

flowchart LR
    V1[vmagent-shard-1] -- requests_total{env=test, pod=foo} --> SV1[vmagent-aggr-1]
    V1[vmagent-shard-1] -- requests_total{env=prod, pod=bar} --> SV2[vmagent-aggr-1]
    V2[vmagent-shard-2] -- requests_total{env=prod, pod=baz} --> SV2[vmagent-aggr-2]
    SV1 -- requests_total:5m_without_pod_total{env=test} --> x(( ))
    SV2 -- requests_total:5m_without_pod_total{env=prod} --> y(( ))
style x fill:none,stroke:none
style y fill:none,stroke:none

The sharding layer of vmagents can be configured via the -remoteWrite.shardByURL.labels or -remoteWrite.shardByURL.ignoreLabels command line flags. See how to shard data across remote write destinations for more details.

The following requirements must be met for sharded aggregation to work correctly:

All sharding vmagents should have the same deterministic sharding configuration.
The sharding configuration must align with the by and without lists:
- Labels configured in -remoteWrite.shardByURL.labels must be a subset of the labels listed in by. For example, if the aggregation config specifies by: [env, job], then -remoteWrite.shardByURL.labels may include env, job, or both. This ensures that all samples contributing to the same aggregation result are routed to the same aggregator instance and aggregated together to produce a complete output.
- Labels configured in -remoteWrite.shardByURL.ignoreLabels must be a superset of the labels listed in without. For example, if the aggregation config specifies without: [env, pod], then -remoteWrite.shardByURL.ignoreLabels must include at least env and pod. This ensures that labels removed during aggregation are not used for shard routing.
Aggregating vmagents should not produce collisions: the aggregation output should be unique across all the sharded agents. For example, requests_total:5m_without_env_pod_total produced by both vmagent-aggr-1 and vmagent-aggr-2 will collide unless they have labels uniquely identifying them. These labels should be either preserved during sharding and aggregation config, or enforced on the output via -remoteWrite.label - see these docs for more details.

Never shard histograms by le (or vmrange in case of VM histograms) label. A histogram is a logical group of series differing only in the bucket label. All of those buckets must land on the same aggregator at the same time so it can produce a coherent bucket set. See more about aggregating histograms .

Troubleshooting #

Aggregation windows #

By default, stream aggregation and deduplication store a single state for each aggregation output result. The data for each aggregator is flushed independently once per aggregation interval. But there’s no guarantee that incoming samples with timestamps close to the aggregation interval’s end will get into it. For example, when aggregating with interval: 1m, a data sample with timestamp 1739473078 (18:57:59) can fall into the aggregation round 18:58:00 or 18:59:00. It depends on network lag, load, clock synchronization, etc. In most scenarios, it doesn’t impact aggregation or deduplication results, which are consistent within the margin of error. But for metrics represented as a collection of series, like histograms , such inaccuracy leads to invalid aggregation results.

For this case, streaming aggregation and deduplication support mode with aggregation windows for the current and previous state. In this mode, flush doesn’t happen immediately but is delayed by a calculated sample lag, which should significantly improve the accuracy of calculations when samples arrive with a delay. Available from v1.112.0

This mode doubles the memory used for the aggregation state, since two states(for the current and previous intervals) are stored per series simultaneously. Aggregation windows can be enabled via the following settings:

-streamAggr.enableWindows at single-node VictoriaMetrics and vmagent . At vmagent -remoteWrite.streamAggr.enableWindows flag can be specified individually for each -remoteWrite.url. If one of these flags is set, all aggregators will use fixed windows. In conjunction with -remoteWrite.streamAggr.dedupInterval or -streamAggr.dedupInterval fixed aggregation windows are enabled on the deduplicator as well.
enable_windows option in aggregation config . It allows enabling aggregation windows for a specific aggregator.

Counter resets #

If counter-specific outputs, such as total*, rate*, and increase*, produce values that are significantly higher than anticipated, then check the vm_streamaggr_counter_resets_total metric. This metric increments each time when counter reset event happens and could be caused by duplication or collision of raw samples. If you observe duplication or collision, try solving this problem by either fixing the source of these metrics or by deduplicating these samples before aggregation.

Data delay and staleness #

Stream aggregation processes input samples in a streaming manner and flushes results once per specified interval. Because of this, aggregation results can be heavily affected by data delays (see vm_streamaggr_samples_lag_seconds_bucket metric).

In particular:

Stream aggregation won’t produce results if input samples are delayed for multiple aggregation intervals, causing gaps in the output.
Delayed and out-of-order samples can inflate or skew correctness of aggregation results.

Dropping delayed samples can result in missed observations in the results, while keeping delayed samples may inflate the results. It is up to the user to decide what they prefer in the produced results:

If you prefer consistency in aggregation results and do not want delayed data to affect the next aggregation window, drop all potentially delayed samples via ignore_old_samples .
If you prefer to have the accumulated changes from delayed data reflected in aggregation windows after the delay, increase staleness_interval in the stream aggregation config . This is especially important for outputs that track the last seen per-series values in order to properly calculate output values:

For these outputs, the last seen per-series value is dropped if no new samples are received for the given time series during consecutive aggregation intervals specified in the stream aggregation config via interval option. If a new sample for the existing time series is received after that, then it is treated as the first sample for a new time series. This may lead to the following issues when data is delayed:

total and increase may produce unexpected spikes, since they assume that a new time series starts from 0.
total_prometheus and increase_prometheus may produce lower than expected results, if you expect to see the accumulated changes reflected after the delay, since they ignore the first sample in a new time series.

These issues can be improved in the following ways:

By increasing the interval option at stream aggregation config , so it covers the expected delays in data ingestion pipelines. It is recommended to set interval to at least 2× the scrape or push interval of the input. Set it to a higher value if the input pipeline is prone to large delays.
By increasing the staleness_interval option in the stream aggregation config , so it covers the expected delays in data ingestion pipelines. By default, the staleness_interval is equal to interval.

High resource usage #

The following solutions can help reduce memory usage and CPU usage during streaming aggregation:

To use more specific match filters at streaming aggregation config , so only the really needed raw samples are aggregated.
To increase the aggregation interval by specifying a bigger duration for the interval option at streaming aggregation config .
To generate a lower number of output time series by using less specific by list or more specific without list .
To drop unneeded labels from input samples via input_relabel_configs or dropInputLabels .

Cluster mode #

If you use vmagent in cluster mode for streaming aggregation then be careful when using by or without options or when modifying sample labels via relabeling , since incorrect usage may result in duplicates and data collision.

For example, if more than one vmagent instance calculates increase for http_requests_total metric with by: [path] option, then all the vmagent instances will aggregate samples to the same set of time series with different path labels. The proper fix would be adding a unique label for all the output samples produced by each vmagent, so they are aggregated into distinct sets of time series . These time series can then be aggregated later as needed during querying.

If vmagent instances run in Docker or Kubernetes, then you can refer to POD_NAME or HOSTNAME environment variables as a unique label value per each vmagent via -remoteWrite.label=vmagent=%{HOSTNAME} command-line flag. See these docs on how to refer to environment variables in VictoriaMetrics components.

Common mistakes #

Put aggregator behind load balancer #

When configuring the aggregation rule, ensure that vmagent receives all required data to satisfy the match rule. If traffic to the vmagent goes through the load balancer, it could happen that vmagent will be receiving only a fraction of the data and produce incomplete aggregations.

To keep aggregation results consistent, ensure that vmagent receives all required data for aggregation. In case you need to split the load across multiple vmagents, try sharding the traffic among them via metric names or labels. For example, see how vmagent could consistently shard data across remote write destinations via -remoteWrite.shardByURL.labels or -remoteWrite.shardByURL.ignoreLabels cmd-line flags.

Create aggregator per each recording rule #

Stream aggregation can be used as an alternative for recording rules . But creating an aggregation rule for each recording rule can lead to elevated resource usage on the vmagent, because the ingestion stream should be matched against every configured aggregation rule.

To optimize this, we recommend merging together aggregations that only differ in match expressions. For example, let’s see the following list of recording rules:

      - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[3m])) BY (instance)
  record: instance:node_cpu:rate:sum
- expr: sum(rate(node_network_receive_bytes_total[3m])) BY (instance)
  record: instance:node_network_receive_bytes:rate:sum
- expr: sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)
  record: instance:node_network_transmit_bytes:rate:sum
    

These rules can be effectively converted into a single aggregation rule:

      - match:
  - node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}
  - node_network_receive_bytes_total
  - node_network_transmit_bytes_total
  interval: 3m
  outputs: [rate_sum]
  by:
  - instance
  output_relabel_configs:
    - source_labels: [__name__]
      target_label: __name__
      regex: "(.+):.+"
      replacement: "instance:$1:rate:sum"
    

Note: having a separate aggregator for a certain match expression can only be justified when the aggregator cannot keep up with all the data pushed to an aggregator within an aggregation interval.

Use identical –remoteWrite.streamAggr.config for all remote writes #

Each specified -remoteWrite.streamAggr.config aggregation config is processed independently on the copy of the data stream. So if you want to aggregate incoming data and replicate it across multiple destinations, it would be more efficient to use a global -streamAggr.config instead. In this way, vmagent will perform aggregation only once and then will replicate it across multiple -remoteWrite.url.

Use aggregated metrics like original ones #

Stream aggregation allows keeping original metric names after aggregation by using the keep_metric_names setting. But the “meaning” of aggregated metrics is usually different from that of the original metrics. Make sure that you update queries in your alerting rules and dashboards accordingly if you used the keep_metric_names setting.

Use different deduplication intervals on storage and vmagent #

If the storage uses -dedup.minScrapeInterval but vmagent has no deduplication configured, aggregation results may not match queries on the storage. For example, sum(rate(foo[1m])) by (instance) query result can differ from the rate_sum aggregation result foo:1m_by_instance_rate_sum. This happens because vmagent aggregates all samples, while queries on the storage use deduplicated samples. To avoid this, set -streamAggr.dedupInterval or -remoteWrite.streamAggr.dedupInterval on vmagent to match the storage interval.

The section below contains backward-compatible anchors for links that were moved or renamed.

Streaming aggregation #

Features #

Limitations #

Use cases #

Statsd alternative #

Recording rules alternative #

Reducing the number of stored samples #

Reducing the number of stored series #

Counting input samples #

Summing input metrics #

Quantiles over input metrics #

Histograms over input metrics #

Aggregating histograms #

Routing #

Deduplication #

Relabeling #

Advanced usage #

Ignoring old samples #

Ignore aggregation intervals on start #

Flush time alignment #

Output metric names #

Aggregating by labels #

Dropping unneeded labels #

Scaling aggregation horizontally #

Troubleshooting #

Aggregation windows #

Counter resets #

Data delay and staleness #

High resource usage #

Cluster mode #

Common mistakes #

Put aggregator behind load balancer #

Create aggregator per each recording rule #

Use identical –remoteWrite.streamAggr.config for all remote writes #

Use aggregated metrics like original ones #

Use different deduplication intervals on storage and vmagent #

Configuration #

Stream aggregation config #

Configuration update #

Aggregation outputs #

avg #

count_samples #

count_series #

histogram_bucket #

increase #

increase_prometheus #

last #

max #

min #

rate_avg #

rate_sum #

stddev #

stdvar #

sum_samples #

total #

total_prometheus #

unique_samples #

quantiles #

Streaming aggregation