High Availability

High availability is not only important for customer-facing software but if the monitoring infrastructure is not highly available, then there is a risk that operations people are not notified of alerts. Therefore, high availability must be just as thought through for the monitoring stack, as for anything else.

Components#

VictoriaMetrics operator support high availability for each component of the monitoring stack:

More details you can find in the section High Availability for resources.

Operator#

VictoriaMetrics operator can be safely scaled horizontally, but only one replica of the operator can process the reconciliation at a time - it uses a leader election mechanism to ensure that only one replica is active at a time.

If one of replicas of the operator will be failed, then another replica will be elected as a leader and will continue to work - operator replication affects how quickly this happens.

CRD validation workload is fully distributed among the available operator replicas.

In addition, you can safely use for operator such features as assigning and distributing to nodes (like node selector, affinity and anti-affinity, topology spread constraints, taints and tolerations, etc…)

In addition, don’t forget about monitoring for the operator.