A Helm chart for Running VMCluster on Multiple Availability Zones
Prerequisites #
Before installing this chart, ensure your environment meets the following requirements:
- Kubernetes cluster - A running Kubernetes cluster with sufficient resources
- Helm - Helm package manager installed and configured
Additional requirements depend on your configuration:
- Persistent storage - Required if you enable persistent volumes for data retention (enabled by default)
- kubectl - Needed for cluster management and troubleshooting
For installation instructions, refer to the official documentation:
Multiple availability zones.
Chart Details #
This chart sets up multiple VictoriaMetrics instances(cluster or single node version, using cluster by default) on multiple availability zones , provides both global write and read entrypoints.
The default setup is as shown below:

For write:
- extra-vmagent(optional): scrapes external targets and all the components installed by this chart, sends data to global write entrypoint.
- vmauth-global-write: global write entrypoint, proxies requests to one of the zone
vmagentwithleast_loadedpolicy. - vmagent(per-zone): remote writes data to availability zones that enabled
.Values.availabilityZones[*].write.allow, and buffer data on disk when zone is unavailable to ingest. - vmauth-write-balancer(per-zone): proxies requests to vminsert instances inside it’s zone with
least_loadedpolicy. - vmcluster(per-zone): processes write requests and stores data.
For read:
- vmcluster(per-zone): processes query requests and returns results.
- vmauth-read-balancer(per-zone): proxies requests to vmselect instances inside it’s zone with
least_loadedpolicy. - vmauth-read-proxy(per-zone): uses all the
vmauth-read-balanceras servers if zone has.Values.availabilityZones[*].read.allowenabled, always prefer “local”vmauth-read-balancerto reduce cross-zone traffic withfirst_availablepolicy. - vmauth-global-read: global query entrypoint, proxies requests to one of the zone
vmauth-read-proxywithfirst_availablepolicy. - grafana(optional): uses
vmauth-global-readas default datasource.
Note: As the topology shown above, this chart doesn’t include components like vmalert, alertmanager, etc by default. You can install them using dependency victoria-metrics-k8s-stack or having separate release.
Why use victoria-metrics-distributed chart?
#
One of the best practice of running production kubernetes cluster is running with multiple availability zones . And apart from kubernetes control plane components, we also want to spread our application pods on multiple zones, to continue serving even if zone outage happens.
VictoriaMetrics supports data replication natively which can guarantees data availability when part of the vmstorage instances failed. But it doesn’t works well if vmstorage instances are spread on multiple availability zones, since data replication could be stored on single availability zone, which will be lost when zone outage happens. To avoid this, vmcluster must be installed on multiple availability zones, each containing a 100% copy of data. As long as one zone is available, both global write and read entrypoints should work without interruption.
How to write data? #
The chart provides vmauth-global-write as global write entrypoint, it supports
push-based data ingestion protocols
as VictoriaMetrics does.
Optionally, you can push data to any of the per-zone vmagents, and they will replicate the received data across zones.
How to query data? #
The chart provides vmauth-global-read as global read entrypoint, it picks the first available zone (see
first_available
policy) as it’s preferred datasource and switches automatically to next zone if first one is unavailable, check
vmauth first_available
for more details.
If you have services like
vmalert
or Grafana deployed in each zone, then configure them to use local vmauth-read-proxy. Per-zone vmauth-read-proxy always prefers “local” vmcluster for querying and reduces cross-zone traffic.
You can also pick other proxies like kubernetes service which supports Topology Aware Routing as global read entrypoint.
What happens if zone outage happen? #
If availability zone zone-eu-1 is experiencing an outage, vmauth-global-write and vmauth-global-read will work without interruption:
vmauth-global-writestops proxying write requests tozone-eu-1automatically;vmauth-global-readandvmauth-read-proxystops proxying read requests tozone-eu-1automatically;vmagentonzone-us-1fails to send data tozone-eu-1.vmauth-write-balancer, starts to buffer data on disk(unless-remoteWrite.disableOnDiskQueueis specified, which is not recommended for this topology); To keep data completeness for all the availability zones, make sure you have enough disk space on vmagent for buffer, see this doc for size recommendation.
And to avoid getting incomplete responses from zone-eu-1 which gets recovered from outage, check vmagent on zone-us-1 to see if persistent queue has been drained. If not, remove zone-eu-1 from serving query by setting .Values.availabilityZones.{zone-eu-1}.read.allow=false and change it back after confirm all data are restored.
How to use multitenancy? #
By default with
multitenancy
, all the data that written to vmauth-global-write belong to tenant 0. To write data to different tenants, set .Values.enableMultitenancy=true and create new tenant users for vmauth-global-write.
For example, writing data to tenant 1088 with following steps:
- create tenant VMUser for vmauth
vmauth-global-writeto use:
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMUser
metadata:
name: tenant-1088-rw
labels:
tenant-test: "true"
spec:
targetRefs:
- static:
## list all the zone vmagent here
url: "http://vmagent-vmagent-zone-eu-1:8429"
url: "http://vmagent-vmagent-zone-us-1:8429"
paths:
- "/api/v1/write"
- "/prometheus/api/v1/write"
- "/write"
- "/api/v1/import"
- "/api/v1/import/.+"
target_path_suffix: /insert/1088/
username: tenant-1088
password: secret
Add extra VMUser selector in vmauth vmauth-global-write
spec:
userSelector:
matchLabels:
tenant-test: "true"
- send data to
vmauth-global-writeusing above token. Example command using vmagent:
/path/to/vmagent -remoteWrite.url=http://vmauth-vmauth-global-write-$ReleaseName-vm-distributed:8427/prometheus/api/v1/write -remoteWrite.basicAuth.username=tenant-1088 -remoteWrite.basicAuth.password=secret
How to install #
Access a Kubernetes cluster.
Setup chart repository (can be omitted for OCI repositories) #
Add a chart helm repository with follow commands:
helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update
List versions of vm/victoria-metrics-distributed chart available to installation:
helm search repo vm/victoria-metrics-distributed -l
Install victoria-metrics-distributed chart
#
Export default values of victoria-metrics-distributed chart to file values.yaml:
For HTTPS repository
helm show values vm/victoria-metrics-distributed > values.yamlFor OCI repository
helm show values oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-distributed > values.yaml
Change the values according to the need of the environment in values.yaml file.
Consider setting .Values.nameOverride to a small value like vmd to avoid hitting resource name limits of 63 characters
Test the installation with command:
For HTTPS repository
helm install vmd vm/victoria-metrics-distributed -f values.yaml -n NAMESPACE --debugFor OCI repository
helm install vmd oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-distributed -f values.yaml -n NAMESPACE --debug
Install chart with command:
For HTTPS repository
helm install vmd vm/victoria-metrics-distributed -f values.yaml -n NAMESPACEFor OCI repository
helm install vmd oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-distributed -f values.yaml -n NAMESPACE
Get the pods lists by running this commands:
kubectl get pods -A | grep 'vmd'
Get the application by running this command:
helm list -f vmd -n NAMESPACE
See the history of versions of vmd application with command.
helm history vmd -n NAMESPACE
How to upgrade #
In order to serving query and ingestion while upgrading components version or changing configurations, it’s recommended to perform maintenance on availability zone one by one.
First, performing update on availability zone zone-eu-1:
- remove
zone-eu-1from serving query by setting.Values.availabilityZones.{zone-eu-1}.read.allow=false; - run
helm upgrade vm-dis -n NAMESPACEwith updated configurations forzone-eu-1invalues.yaml; - wait for all the components on zone
zone-eu-1running; - wait
zone-us-1vmagent persistent queue forzone-eu-1been drained, addzone-eu-1back to serving query by setting.Values.availabilityZones.{zone-eu-1}.read.allow=true.
Then, perform update on availability zone zone-us-1 with the same steps1~4.
Upgrade to 0.13.0 #
Introduction of VMCluster’s
requestsLoadBalancer
allowed to simplify distributed chart setup by removing VMAuth CRs for read and write load balancing. Some parameters are not needed anymore:
- removed
availabilityZones[*].write.vmauth - removed
availabilityZones[*].read.perZone.vmauth - removed
zoneTpl.write.vmauth - removed
zoneTpl.read.perZone.vmauth - moved
zoneTpl.read.crossZone.vmauthtozoneTpl.read.vmauth - moved
availabilityZones[*].read.perZone.vmauthtoavailabilityZones[*].read.vmauth
Upgrade to 0.5.0 #
This release was refactored, names of the parameters was changed:
vmauthIngestGlobalwas changed towrite.global.vmauthvmauthQueryGlobalwas changed toread.global.vmauthavailabilityZones[*].allowIngestwas changed toavailabilityZones[*].write.allowavailabilityZones[*].allowReadwas changed toavailabilityZones[*].read.allowavailabilityZones[*].nodeSelectorwas moved toavailabilityZones[*].common.spec.nodeSelectoravailabilityZones[*].extraAffinitywas moved toavailabilityZones[*].common.spec.affinityavailabilityZones[*].topologySpreadConstraintswas moved toavailabilityZones[*].common.spec.topologySpreadConstraintsavailabilityZones[*].vmauthIngestwas moved toavailabilityZones[*].write.vmauthavailabilityZones[*].vmauthQueryPerZonewas moved toavailabilityZones[*].read.perZone.vmauthavailabilityZones[*].vmauthCrossAZQuerywas moved toavailabilityZones[*].read.crossZone.vmauth
Example:
If before an upgrade you had given below configuration
vmauthIngestGlobal:
spec:
extraArgs:
discoverBackendIPs: "true"
vmauthQueryGlobal:
spec:
extraArgs:
discoverBackendIPs: "true"
availabilityZones:
- name: zone-eu-1
vmauthIngest:
spec:
extraArgs:
discoverBackendIPs: "true"
vmcluster:
spec:
retentionPeriod: "14"
after upgrade it will look like this:
write:
global:
vmauth:
spec:
extraArgs:
discoverBackendIPs: "true"
read:
global:
vmauth:
spec:
extraArgs:
discoverBackendIPs: "true"
availabilityZones:
- name: zone-eu-1
write:
vmauth:
spec:
extraArgs:
discoverBackendIPs: "true"
vmcluster:
spec:
retentionPeriod: "14"
How to uninstall #
Remove application with command.
helm uninstall vmd -n NAMESPACE
Documentation of Helm Chart #
Install helm-docs following the instructions on this
tutorial
.
Generate docs with helm-docs command.
cd charts/victoria-metrics-distributed
helm-docs
The markdown generation is entirely go template driven. The tool parses metadata from charts and generates a number of sub-templates that can be referenced in a template file (by default README.md.gotmpl). If no template file is provided, the tool has a default internal template that will generate a reasonably formatted README.
Parameters #
The following tables lists the configurable parameters of the chart and their default values.
Change the values according to the need of the environment in victoria-metrics-distributed`/values.yaml file.