Version ArtifactHub License Slack X Reddit

Victoria Metrics Operator

Prerequisites #

  • Install the follow packages: git, kubectl, helm, helm-docs. See this tutorial.
  • PV support on underlying infrastructure.

ArgoCD issues #

When running operator using ArgoCD without Cert Manager (.Values.admissionWebhooks.certManager.enabled: false) it will rerender webhook certificates on each sync since Helm lookup function is not respected by ArgoCD. To prevent this please update you operator Application spec.syncPolicy and spec.ignoreDifferences with a following:

apiVersion: argoproj.io/v1alpha1
kind: Application
...
spec:
  ...
  destination:
    ...
    namespace: <operator-namespace>
  ...
  syncPolicy:
    syncOptions:
    # https://argo-cd.readthedocs.io/en/stable/user-guide/sync-options/#respect-ignore-difference-configs
    # argocd must also ignore difference during apply stage
    # otherwise it ll silently override changes and cause a problem
    - RespectIgnoreDifferences=true
  ignoreDifferences:
    - group: ""
      kind: Secret
      name: <fullname>-validation
      namespace: <operator-namespace>
      jsonPointers:
        - /data
    - group: admissionregistration.k8s.io
      kind: ValidatingWebhookConfiguration
      name: <fullname>-admission
      jqPathExpressions:
      - '.webhooks[]?.clientConfig.caBundle'

where <fullname> is output of {{ include "vm-operator.fullname" }} for your setup

Upgrade guide #

During release an issue with helm CRD was discovered. So for upgrade from version less then 0.1.3 you have to two options:

  1. use helm management for CRD, enabled by default.
  2. use own management system, need to add variable: –set createCRD=false.

If you choose helm management, following steps must be done before upgrade:

  1. define namespace and helm release name variables
export NAMESPACE=default
export RELEASE_NAME=operator

execute kubectl commands:

kubectl get crd  | grep victoriametrics.com | awk '{print $1 }' | xargs -i kubectl label crd {} app.kubernetes.io/managed-by=Helm --overwrite
kubectl get crd  | grep victoriametrics.com | awk '{print $1 }' | xargs -i kubectl annotate crd {} meta.helm.sh/release-namespace="$NAMESPACE" meta.helm.sh/release-name="$RELEASE_NAME"  --overwrite

run helm upgrade command.

Chart Details #

This chart will do the following:

  • Rollout victoria metrics operator

How to install #

Access a Kubernetes cluster.

Setup chart repository (can be omitted for OCI repositories) #

Add a chart helm repository with follow commands:

helm repo add vm https://victoriametrics.github.io/helm-charts/

helm repo update

List versions of vm/victoria-metrics-operator chart available to installation:

helm search repo vm/victoria-metrics-operator -l

Install victoria-metrics-operator chart #

Export default values of victoria-metrics-operator chart to file values.yaml:

  • For HTTPS repository

    helm show values vm/victoria-metrics-operator > values.yaml
    
  • For OCI repository

    helm show values oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-operator > values.yaml
    

Change the values according to the need of the environment in values.yaml file.

Test the installation with command:

  • For HTTPS repository

    helm install vmo vm/victoria-metrics-operator -f values.yaml -n NAMESPACE --debug --dry-run
    
  • For OCI repository

    helm install vmo oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-operator -f values.yaml -n NAMESPACE --debug --dry-run
    

Install chart with command:

  • For HTTPS repository

    helm install vmo vm/victoria-metrics-operator -f values.yaml -n NAMESPACE
    
  • For OCI repository

    helm install vmo oci://ghcr.io/victoriametrics/helm-charts/victoria-metrics-operator -f values.yaml -n NAMESPACE
    

Get the pods lists by running this commands:

kubectl get pods -A | grep 'vmo'

Get the application by running this command:

helm list -f vmo -n NAMESPACE

See the history of versions of vmo application with command.

helm history vmo -n NAMESPACE

Validation webhook #

Its possible to use validation of created resources with operator. For now, you need cert-manager to easily certificate management https://cert-manager.io/docs/

admissionWebhooks:
  enabled: true
  # what to do in case, when operator not available to validate request.
  certManager:
    # enables cert creation and injection by cert-manager
    enabled: true

How to uninstall #

Remove application with command.

helm uninstall vmo -n NAMESPACE

Documentation of Helm Chart #

Install helm-docs following the instructions on this tutorial.

Generate docs with helm-docs command.

cd charts/victoria-metrics-operator

helm-docs

The markdown generation is entirely go template driven. The tool parses metadata from charts and generates a number of sub-templates that can be referenced in a template file (by default README.md.gotmpl). If no template file is provided, the tool has a default internal template that will generate a reasonably formatted README.

Disabling automatic ServiceAccount token mount #

There are cases when it is required to disable automatic ServiceAccount token mount due to hardening reasons. To disable it, set the following values:

serviceAccount:
  automountServiceAccountToken: false

extraVolumes:
  - name: operator
    projected:
      sources:
        - downwardAPI:
            items:
              - fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
                path: namespace
        - configMap:
            name: kube-root-ca.crt
        - serviceAccountToken:
            expirationSeconds: 7200
            path: token

extraVolumeMounts:
  - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
    name: operator

This configuration disables the automatic ServiceAccount token mount and mounts the token explicitly.

Enable hostNetwork on operator #

When running managed Kubernetes such as EKS with custom CNI solution like Cilium or Calico, EKS control plane cannot communicate with CNI’s pod CIDR. In that scenario, we need to run webhook service i.e operator with hostNetwork so that it can share node’s network namespace.

hostNetwork: true

Parameters #

The following tables lists the configurable parameters of the chart and their default values.

Change the values according to the need of the environment in victoria-metrics-operator/values.yaml file.

KeyTypeDefaultDescription
admissionWebhooksobject
certManager:
    ca:
        commonName: ca.validation.victoriametrics
        duration: 63800h0m0s
    cert:
        duration: 45800h0m0s
    enabled: false
    issuer: {}
enabled: true
enabledCRDValidation:
    vlogs: true
    vmagent: true
    vmalert: true
    vmalertmanager: true
    vmalertmanagerconfig: true
    vmauth: true
    vmcluster: true
    vmrule: true
    vmsingle: true
    vmuser: true
keepTLSSecret: true
policy: Fail
tls:
    caCert: null
    cert: null
    key: null

Configures resource validation

admissionWebhooks.certManagerobject
ca:
    commonName: ca.validation.victoriametrics
    duration: 63800h0m0s
cert:
    duration: 45800h0m0s
enabled: false
issuer: {}

Enables custom ca bundle, if you are not using cert-manager. In case of custom ca, you have to create secret - {chart-name}-validation with keys: tls.key, tls.crt, ca.crt

admissionWebhooks.certManager.caobject
commonName: ca.validation.victoriametrics
duration: 63800h0m0s

Certificate Authority parameters

admissionWebhooks.certManager.certobject
duration: 45800h0m0s

Certificate parameters

admissionWebhooks.certManager.enabledbool
false

Enables cert creation and injection by cert-manager.

admissionWebhooks.certManager.issuerobject
{}

If needed, provide own issuer. Operator will create self-signed if empty.

admissionWebhooks.enabledbool
true

Enables validation webhook.

admissionWebhooks.policystring
Fail

What to do in case, when operator not available to validate request.

affinityobject
{}

Pod affinity

annotationsobject
{}

Annotations to be added to the all resources

crds.cleanup.enabledbool
false

Tells helm to clean up all the vm resources under this release’s namespace when uninstalling

crds.cleanup.imageobject
pullPolicy: IfNotPresent
repository: bitnami/kubectl
tag: ""

Image configuration for CRD cleanup Job

crds.cleanup.resourcesobject
limits:
    cpu: 500m
    memory: 256Mi
requests:
    cpu: 100m
    memory: 56Mi

Cleanup hook resources

crds.enabledbool
true

manages CRD creation. Disables CRD creation only in combination with crds.plain: false due to helm dependency conditions limitation

crds.plainbool
false

check if plain or templated CRDs should be created. with this option set to false, all CRDs will be rendered from templates. with this option set to true, all CRDs are immutable and require manual upgrade.

envlist
[]

Extra settings for the operator deployment. Full list here

envFromlist
[]

Specify alternative source for env variables

extraArgsobject
{}

Operator container additional commandline arguments

extraContainerslist
[]

Extra containers to run in a pod with operator

extraHostPathMountslist
[]

Additional hostPath mounts

extraLabelsobject
{}

Labels to be added to the all resources

extraObjectslist
[]

Add extra specs dynamically to this chart

extraVolumeMountslist
[]

Extra Volume Mounts for the container

extraVolumeslist
[]

Extra Volumes for the pod

fullnameOverridestring
""

Overrides the full name of server component resources

global.cluster.dnsDomainstring
cluster.local.

K8s cluster domain suffix, uses for building storage pods’ FQDN. Details are here

global.compatibilityobject
openshift:
    adaptSecurityContext: auto

Openshift security context compatibility configuration

global.image.registrystring
""

Image registry, that can be shared across multiple helm charts

global.imagePullSecretslist
[]

Image pull secrets, that can be shared across multiple helm charts

hostNetworkbool
false

Enable hostNetwork on operator deployment

imageobject
pullPolicy: IfNotPresent
registry: ""
repository: victoriametrics/operator
tag: ""
variant: ""

operator image configuration

image.pullPolicystring
IfNotPresent

Image pull policy

image.registrystring
""

Image registry

image.repositorystring
victoriametrics/operator

Image repository

image.tagstring
""

Image tag override Chart.AppVersion

imagePullSecretslist
[]

Secret to pull images

lifecycleobject
{}

Operator lifecycle. See this article for details.

logLevelstring
info

VM operator log level. Possible values: info and error.

nameOverridestring
""

Override chart name

nodeSelectorobject
{}

Pod’s node selector. Details are here

operator.disable_prometheus_converterbool
false

By default, operator converts prometheus-operator objects.

operator.enable_converter_ownershipbool
false

Enables ownership reference for converted prometheus-operator objects, it will remove corresponding victoria-metrics objects in case of deletion prometheus one.

operator.prometheus_converter_add_argocd_ignore_annotationsbool
false

Compare-options and sync-options for prometheus objects converted by operator for properly use with ArgoCD

operator.useCustomConfigReloaderbool
false

Enables custom config-reloader, bundled with operator. It should reduce vmagent and vmauth config sync-time and make it predictable.

podDisruptionBudgetobject
enabled: false
labels: {}

See kubectl explain poddisruptionbudget.spec for more or check these docs

podLabelsobject
{}

extra Labels for Pods only

podSecurityContextobject
enabled: true

Pod’s security context. Details are here

priorityClassNamestring
""

Name of Priority Class

probe.livenessobject
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 15
tcpSocket:
    port: probe
timeoutSeconds: 5

Liveness probe

probe.readinessobject
failureThreshold: 3
httpGet:
    port: probe
initialDelaySeconds: 5
periodSeconds: 15
timeoutSeconds: 5

Readiness probe

probe.startupobject
{}

Startup probe

rbac.aggregatedClusterRolesobject
enabled: true
labels:
    admin:
        rbac.authorization.k8s.io/aggregate-to-admin: "true"
    view:
        rbac.authorization.k8s.io/aggregate-to-view: "true"

Create aggregated clusterRoles for CRD readonly and admin permissions

rbac.aggregatedClusterRoles.labelsobject
admin:
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
view:
    rbac.authorization.k8s.io/aggregate-to-view: "true"

Labels attached to according clusterRole

rbac.createbool
true

Specifies whether the RBAC resources should be created

replicaCountint
1

Number of operator replicas

resourcesobject
{}

Resource object

securityContextobject
enabled: true

Security context to be added to server pods

service.annotationsobject
{}

Service annotations

service.clusterIPstring
""

Service ClusterIP

service.externalIPsstring
""

Service external IPs. Check here for details

service.externalTrafficPolicystring
""

Service external traffic policy. Check here for details

service.healthCheckNodePortstring
""

Health check node port for a service. Check here for details

service.ipFamilieslist
[]

List of service IP families. Check here for details.

service.ipFamilyPolicystring
""

Service IP family policy. Check here for details.

service.labelsobject
{}

Service labels

service.loadBalancerIPstring
""

Service load balancer IP

service.loadBalancerSourceRangeslist
[]

Load balancer source range

service.servicePortint
8080

Service port

service.typestring
ClusterIP

Service type

service.webhookPortint
9443

Service webhook port

serviceAccount.automountServiceAccountTokenbool
true

Whether to automount the service account token. Note that token needs to be mounted manually if this is disabled.

serviceAccount.createbool
true

Specifies whether a service account should be created

serviceAccount.namestring
""

The name of the service account to use. If not set and create is true, a name is generated using the fullname template

serviceMonitorobject
annotations: {}
basicAuth: {}
enabled: false
extraLabels: {}
interval: ""
relabelings: []
scheme: ""
scrapeTimeout: ""
tlsConfig: {}

Configures monitoring with serviceScrape. VMServiceScrape must be pre-installed

terminationGracePeriodSecondsint
30

Graceful pod termination timeout. See this article for details.

tolerationslist
[]

Array of tolerations object. Spec is here

topologySpreadConstraintslist
[]

Pod Topology Spread Constraints. Spec is here

watchNamespaceslist
[]

By default, the operator will watch all the namespaces If you want to override this behavior, specify the namespace. Operator supports multiple namespaces for watching.