--- sort: 7 --- # High Availability High availability is not only important for customer-facing software but if the monitoring infrastructure is not highly available, then there is a risk that operations people are not notified of alerts. Therefore high availability must be just as thought through for the monitoring stack, as for anything else. ## VMAgent To run VMAgent in a highly available manner you have to configure deduplication at Victoria Metrics first [doc](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/Single-server-VictoriaMetrics.md#deduplication) Then increase replicas for VMAgent. create `VMSingle` with dedup flag ```yaml cat < 8480/TCP 69s vmselect-example-vmcluster-persistent ClusterIP None 8481/TCP 79s vmstorage-example-vmcluster-persistent ClusterIP None 8482/TCP,8400/TCP,8401/TCP 85s ``` Now you can connect vmagent to vminsert and vmalert to vmselect >NOTE do not forget to create rbac for vmagent ```yaml cat << EOF | kubectl apply -f - apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAgent metadata: name: example-vmagent spec: serviceScrapeNamespaceSelector: {} serviceScrapeSelector: {} podScrapeNamespaceSelector: {} podScrapeSelector: {} # Add fields here replicaCount: 1 remoteWrite: - url: "http://vminsert-example-vmcluster-persistent.default.svc.cluster.local:8480/insert/0/prometheus/api/v1/write" EOF ``` Config for vmalert ```yaml cat << EOF | kubectl apply -f - apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAlert metadata: name: example-vmalert spec: # Add fields here replicas: 1 datasource: url: "http://vmselect-example-vmcluster-persistent.default.svc.cluster.local:8481/select/0/prometheus" notifier: url: "http://alertmanager-operated.default.svc:9093" evaluationInterval: "10s" ruleSelector: {} EOF ``` ## Alertmanager The final step of the high availability scheme is Alertmanager, when an alert triggers, actually fire alerts against *all* instances of an Alertmanager cluster. The Alertmanager, starting with the `v0.5.0` release, ships with a high availability mode. It implements a gossip protocol to synchronize instances of an Alertmanager cluster regarding notifications that have been sent out, to prevent duplicate notifications. It is an AP (available and partition tolerant) system. Being an AP system means that notifications are guaranteed to be sent at least once. The Victoria Metrics Operator ensures that Alertmanager clusters are properly configured to run highly available on Kubernetes.