在Kubernetes中手动部署Statefulset类型的Prometheus、Alertmanager集群,并使用StorageClass来持久化数据。
本篇使用StorageClass来持久化数据,搭建Statefulset的Prometheus联邦集群,对于数据持久化,方案众多,如Thanos、M3DB、InfluxDB、VictorMetric等,根据自己的需求进行选择,后面会详细讲解针对数据持久化的具体细节。
部署一个对外可以访问的Prometheus,首先要创建Prometheus所在的Namespace,然后在创建Prometheus使用的RBAC规则,创建Prometheus的 ConfigMap 来保存配置文件。创建SVC绑定固定集群IP,创建Statefulset有状态的Prometheus容器的Pod,最后创建Ingress 实现外部域名访问Prometheus。
如果Kubernetes版本比较旧的话,为了便于测试,可以进行升级一下,使用 sealos 自动部署工具快速一键部署高可用集群,对于是否使用kuboard,针对自己需求去部署。
环境
我的本地环境使用的 sealos 一键部署,主要是为了便于测试。
OS
Kubernetes
HostName
IP
Service
Ubuntu 18.04
1.17.7
sealos-k8s-m1
192.168.1.151
node-exporter prometheus-federate-0
Ubuntu 18.04
1.17.7
sealos-k8s-m2
192.168.1.152
node-exporter grafana alertmanager-0
Ubuntu 18.04
1.17.7
sealos-k8s-m3
192.168.1.150
node-exporter alertmanager-1
Ubuntu 18.04
1.17.7
sealos-k8s-node1
192.168.1.153
node-exporter prometheus-0 kube-state-metrics
Ubuntu 18.04
1.17.7
sealos-k8s-node2
192.168.1.154
node-exporter prometheus-1
Ubuntu 18.04
1.17.7
sealos-k8s-node2
192.168.1.155
node-exporter prometheus-2
`# 给master跟node加标签
# prometheus
kubectl label node sealos-k8s-node1 k8s-app=prometheus
kubectl label node sealos-k8s-node2 k8s-app=prometheus
kubectl label node sealos-k8s-node3 k8s-app=prometheus
# federate
kubectl label node sealos-k8s-m1 k8s-app=prometheus-federate
# alertmanager
kubectl label node sealos-k8s-m2 k8s-app=alertmanager
kubectl label node sealos-k8s-m3 k8s-app=alertmanager  
#创建对应的部署目录
mkdir /data/manual-deploy/ && cd /data/manual-deploy/
mkdir alertmanager  grafana  ingress-nginx  kube-state-metrics  node-exporter  prometheus  
`
部署 Prometheus
创建Prometheus的storageclass配置文件
cat prometheus-data-storageclass.yaml   apiVersion: storage.k8s.io/v1   kind: StorageClass   metadata:     name: prometheus-lpv   provisioner: kubernetes.io/no-provisioner   volumeBindingMode: WaitForFirstConsumer   
创建Prometheus的sc的pv配置文件,同时指定了调度节点。
`# 在需要调度的Prometheus的node上创建目录与赋权
mkdir /data/prometheus
chown -R 65534:65534 /data/prometheus  
cat prometheus-federate-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-0
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node1  
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-1
spec:
  capacity:
    storage: 20Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node2
---          
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-2
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node3
`
创建Prometheus的RBAC文件。
`cat prometheus-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1 # api的version
kind: ClusterRole # 类型
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources: # 资源
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"] 
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]  
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus # 自定义名字
  namespace: kube-system # 命名空间  
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef: # 选择需要绑定的Role
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects: # 对象
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system
`
创建Prometheus的configmap配置文件。
cat prometheus-configmap.yaml   apiVersion: v1   kind: ConfigMap   metadata:     name: prometheus-config     namespace: kube-system   data:     prometheus.yml: |       global:         scrape_interval:     30s         evaluation_interval: 30s         external_labels:           cluster: "01"       scrape_configs:       - job_name: 'kubernetes-apiservers'         kubernetes_sd_configs:         - role: endpoints         scheme: https         tls_config:           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token         relabel_configs:         - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]           action: keep           regex: default;kubernetes;https       - job_name: 'kubernetes-nodes'         kubernetes_sd_configs:         - role: node         scheme: https         tls_config:           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token         relabel_configs:         - action: labelmap           regex: __meta_kubernetes_node_label_(.+)         - target_label: __address__           replacement: kubernetes.default.svc:443         - source_labels: [__meta_kubernetes_node_name]           regex: (.+)           target_label: __metrics_path__           replacement: /api/v1/nodes/${1}/proxy/metrics       - job_name: 'kubernetes-cadvisor'         kubernetes_sd_configs:         - role: node         scheme: https         tls_config:           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token         relabel_configs:         - action: labelmap           regex: __meta_kubernetes_node_label_(.+)         - target_label: __address__           replacement: kubernetes.default.svc:443         - source_labels: [__meta_kubernetes_node_name]           regex: (.+)           target_label: __metrics_path__           replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor         metric_relabel_configs:         - action: replace           source_labels: [id]           regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'           target_label: rkt_container_name           replacement: '${2}-${1}'         - action: replace           source_labels: [id]           regex: '^/system\.slice/(.+)\.service$'           target_label: systemd_service_name           replacement: '${1}'       - job_name: 'kubernetes-pods'         kubernetes_sd_configs:         - role: pod         relabel_configs:         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]           action: keep           regex: true         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]           action: replace           target_label: __metrics_path__           regex: (.+)         - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]           action: replace           regex: ([^:]+)(?::\d+)?;(\d+)           replacement: $1:$2           target_label: __address__         - action: labelmap           regex: __meta_kubernetes_pod_label_(.+)         - source_labels: [__meta_kubernetes_namespace]           action: replace           target_label: kubernetes_namespace         - source_labels: [__meta_kubernetes_pod_name]           action: replace           target_label: kubernetes_pod_name       - job_name: 'kubernetes-service-endpoints'         kubernetes_sd_configs:         - role: endpoints         relabel_configs:         - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]           action: keep           regex: true         - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]           action: replace           target_label: __scheme__           regex: (https?)         - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]           action: replace           target_label: __metrics_path__           regex: (.+)         - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]           action: replace           target_label: __address__           regex: ([^:]+)(?::\d+)?;(\d+)           replacement: $1:$2         - action: labelmap           regex: __meta_kubernetes_service_label_(.+)         - source_labels: [__meta_kubernetes_namespace]           action: replace           target_label: kubernetes_namespace         - source_labels: [__meta_kubernetes_service_name]           action: replace           target_label: kubernetes_name         - source_labels: [__address__]           action: replace           target_label: instance           regex: (.+):(.+)           replacement: $1   
创建Prometheus的Statefulset配置文件。
cat prometheus-statefulset.yaml   apiVersion: apps/v1   kind: StatefulSet   metadata:     name: prometheus     namespace: kube-system     labels:       k8s-app: prometheus       kubernetes.io/cluster-service: "true"   spec:     serviceName: "prometheus"     podManagementPolicy: "Parallel"     replicas: 3     selector:       matchLabels:         k8s-app: prometheus     template:       metadata:         labels:           k8s-app: prometheus         annotations:           scheduler.alpha.kubernetes.io/critical-pod: ''       spec:         affinity:           podAntiAffinity:             requiredDuringSchedulingIgnoredDuringExecution:             - labelSelector:                 matchExpressions:                 - key: k8s-app                   operator: In                   values:                   - prometheus               topologyKey: "kubernetes.io/hostname"         priorityClassName: system-cluster-critical         hostNetwork: true         dnsPolicy: ClusterFirstWithHostNet         containers:         - name: prometheus-server-configmap-reload           image: "jimmidyson/configmap-reload:v0.4.0"           imagePullPolicy: "IfNotPresent"           args:             - --volume-dir=/etc/config             - --webhook-url=http://localhost:9090/-/reload           volumeMounts:             - name: config-volume               mountPath: /etc/config               readOnly: true           resources:             limits:               cpu: 10m               memory: 10Mi             requests:               cpu: 10m               memory: 10Mi         - image: prom/prometheus:v2.20.0           imagePullPolicy: IfNotPresent           name: prometheus           command:             - "/bin/prometheus"           args:             - "--config.file=/etc/prometheus/prometheus.yml"             - "--storage.tsdb.path=/prometheus"             - "--storage.tsdb.retention=24h"             - "--web.console.libraries=/etc/prometheus/console_libraries"             - "--web.console.templates=/etc/prometheus/consoles"             - "--web.enable-lifecycle"           ports:             - containerPort: 9090               protocol: TCP           volumeMounts:             - mountPath: "/prometheus"               name: prometheus-data             - mountPath: "/etc/prometheus"               name: config-volume           readinessProbe:             httpGet:               path: /-/ready               port: 9090             initialDelaySeconds: 30             timeoutSeconds: 30           livenessProbe:             httpGet:               path: /-/healthy               port: 9090             initialDelaySeconds: 30             timeoutSeconds: 30           resources:             requests:               cpu: 100m               memory: 100Mi             limits:               cpu: 1000m               memory: 2500Mi           securityContext:               runAsUser: 65534               privileged: true         serviceAccountName: prometheus         volumes:           - name: config-volume             configMap:               name: prometheus-config     volumeClaimTemplates:       - metadata:           name: prometheus-data         spec:           accessModes: [ "ReadWriteOnce" ]           storageClassName: "prometheus-lpv"           resources:             requests:               storage: 5Gi   
创建Prometheus的svc配置文件
cat prometheus-service-statefulset.yaml   apiVersion: v1   kind: Service   metadata:     name: prometheus     namespace: kube-system   spec:     ports:       - name: prometheus         port: 9090         targetPort: 9090     selector:       k8s-app: prometheus     clusterIP: None   
部署创建好的Prometheus的相关资源文件
cd /data/manual-deploy/prometheus   ls    prometheus-configmap.yaml # Configmap   prometheus-data-pv.yaml # PVC   prometheus-data-storageclass.yaml # SC   prometheus-rbac.yaml # RBAC   prometheus-service-statefulset.yaml # SVC   prometheus-statefulset.yaml # Statefulset   # 部署应用   kubectl apply -f .   
验证已经部署的Prometheus的pv与pvc的绑定关系以及部署状态
`kubectl get pv
NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS     REASON   AGE
prometheus-lpv-0   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
prometheus-lpv-1   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
prometheus-lpv-2   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
kubectl -n kube-system get pvc 
NAME                           STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS     AGE
prometheus-data-prometheus-0   Bound    prometheus-lpv-0   10Gi       RWO            prometheus-lpv   2m16s
prometheus-data-prometheus-1   Bound    prometheus-lpv-2   10Gi       RWO            prometheus-lpv   2m16s
prometheus-data-prometheus-2   Bound    prometheus-lpv-1   10Gi       RWO            prometheus-lpv   2m16s  
kubectl -n kube-system get pod prometheus-{0..2}
NAME           READY   STATUS    RESTARTS   AGE
prometheus-0   2/2     Running   0          3m16s
prometheus-1   2/2     Running   0          3m16s
prometheus-2   2/2     Running   0          3m16s  
`
部署 Node Exporter
创建Demonset的node-exporter文件
`cd /data/manual-deploy/node-exporter/
cat node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
        k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      containers:
      - image: quay.io/prometheus/node-exporter:v1.0.0
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          protocol: TCP
          name: metrics
        volumeMounts:
        - mountPath: /host/proc
          name: proc
        - mountPath: /host/sys
          name: sys
        - mountPath: /host
          name: rootfs
        args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
      hostNetwork: true
      hostPID: true  
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9100
    protocol: TCP
  selector:
    k8s-app: node-exporter` 
部署
cd /data/manual-deploy/node-exporter/   kubectl apply -f node-exporter.yaml   
验证状态
`kubectl -n kube-system get pod |grep node-exporter
node-exporter-45s2q                    2/2     Running   0          6h43m
node-exporter-f4rrw                    2/2     Running   0          6h43m
node-exporter-hvtzj                    2/2     Running   0          6h43m
node-exporter-nlvfq                    2/2     Running   0          6h43m
node-exporter-qbd2q                    2/2     Running   0          6h43m
node-exporter-zjrh4                    2/2     Running   0          6h43m  
`
部署 kube-state-metrics
kubelet已经集成了cAdvisor已知可以收集系统级别的CPU、Memory、Network、Disk、Container等指标信息,但是却不能采集到Kubernetes的资源对象的指标信息,如:Pod的数量以及状态等等。因此我们需要kube-state-metrics,来帮助我们完成这些采集操作。
kube-state-metrics是通过轮询的方式对Kubernetes API进行操作,然后返回有关资源对象指标的Metrics信息:CronJob、DaemonSet、Deployment、Job、LimitRange、Node、PersistentVolume 、PersistentVolumeClaim、 Pod、Pod Disruption Budget、ReplicaSet、ReplicationController、ResourceQuota、Service、StatefulSet、Namespace、Horizontal Pod Autoscaler、Endpoint、Secret、ConfigMap、Ingress、CertificateSigningRequest。
`cd /data/manual-deploy/kube-state-metrics/
cat kube-state-metrics-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: kube-system
  name: kube-state-metrics-resizer
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get"]
- apiGroups: ["apps"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
- apiGroups: ["extensions"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]  
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system  
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  - ingresses
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
- apiGroups: ["policy"]
  resources:
  - poddisruptionbudgets
  verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
  resources:
  - certificatesigningrequests
  verbs: ["list", "watch"]  
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system  
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
`
创建kube-state-metrics的deployment文件
`cat kube-state-metrics-deloyment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/coreos/kube-state-metrics:v1.6.0
        ports:
        - name: http-metrics
          containerPort: 8080
        - name: telemetry
          containerPort: 8081
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
        image: k8s.gcr.io/addon-resizer:1.8.4
        resources:
          limits:
            cpu: 150m
            memory: 50Mi
          requests:
            cpu: 150m
            memory: 50Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - /pod_nanny
          - --container=kube-state-metrics
          - --cpu=100m
          - --extra-cpu=1m
          - --memory=100Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=kube-state-metrics  
apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    k8s-app: kube-state-metrics
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    protocol: TCP
  - name: telemetry
    port: 8081
    targetPort: telemetry
    protocol: TCP
  selector:
    k8s-app: kube-state-metrics
`
部署
kubectl apply -f kube-state-metrics-rbac.yaml   kubectl apply -f kube-state-metrics-deloyment.yaml   
验证
kubectl -n kube-system get pod |grep kube-state-metrics   kube-state-metrics-657d8d6669-bqbs8        2/2     Running   0          4h   
kube-state-metrics的service中指定了annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以自动发现
kube-state-metrics在svc填写配置的时候指定annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以实现自动发现。
部署 Alertmanager 集群
创建目录、赋权
k8s-m2   mkdir /data/alertmanager   chown -R 65534:65534 /data/alertmanager   k8s-m3   mkdir /data/alertmanager   chown -R 65534:65534 /data/alertmanager 
cd /data/manual-deploy/alertmanager/   cat alertmanager-data-storageclass.yaml   apiVersion: storage.k8s.io/v1   kind: StorageClass   metadata:     name: alertmanager-lpv   provisioner: kubernetes.io/no-provisioner   volumeBindingMode: WaitForFirstConsumer   
创建Alertmanager的pv配置文件
`cat alertmanager-data-pv.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: alertmanager-pv-0
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: alertmanager-lpv
  local:
    path: /data/alertmanager
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-m2  
apiVersion: v1
kind: PersistentVolume
metadata:
  name: alertmanager-pv-1
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: alertmanager-lpv
  local:
    path: /data/alertmanager
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-m3
`
创建Alertmanager的configmap配置文件
`cat alertmanager-configmap.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_from: 'yo@qq.com'
      smtp_auth_username: '3452@qq.com'
      smtp_auth_password: 'bhgb'
      smtp_hello: '警报邮件'
      smtp_require_tls: false
    route:
      group_by: ['alertname', 'cluster']
      group_wait: 30s
      group_interval: 30s
      repeat_interval: 12h
      receiver: default  
      routes:
      - receiver: email
        group_wait: 10s
        match:
          team: ops
    receivers:
    - name: 'default'
      email_configs:
      - to: '9935226@qq.com'
        send_resolved: true
    - name: 'email'
      email_configs:
      - to: '9935226@qq.com'
        send_resolved: true
`
创建Alertmanager的statefulset文件,我这里部署的是集群模式,如果需要单体库模式,将replicas改为1,去掉集群参数即可。
cat alertmanager-statefulset-cluster.yaml    apiVersion: apps/v1   kind: StatefulSet   metadata:     name: alertmanager     namespace: kube-system     labels:       k8s-app: alertmanager       kubernetes.io/cluster-service: "true"       addonmanager.kubernetes.io/mode: Reconcile       version: v0.21.0   spec:     serviceName: "alertmanager-operated"     replicas: 2     selector:       matchLabels:         k8s-app: alertmanager         version: v0.21.0     template:       metadata:         labels:           k8s-app: alertmanager           version: v0.21.0         annotations:           scheduler.alpha.kubernetes.io/critical-pod: ''       spec:         tolerations:           - key: "CriticalAddonsOnly"             operator: "Exists"           - effect: NoSchedule             key: node-role.kubernetes.io/master         affinity:           podAntiAffinity:             requiredDuringSchedulingIgnoredDuringExecution:             - labelSelector:                 matchExpressions:                 - key: k8s-app                   operator: In                   values:                   - alertmanager               topologyKey: "kubernetes.io/hostname"         containers:           - name: prometheus-alertmanager             image: "prom/alertmanager:v0.21.0"             imagePullPolicy: "IfNotPresent"             args:               - "--config.file=/etc/config/alertmanager.yml"               - "--storage.path=/data"               - "--cluster.listen-address=${POD_IP}:9094"               - "--web.listen-address=:9093"               - "--cluster.peer=alertmanager-0.alertmanager-operated:9094"               - "--cluster.peer=alertmanager-1.alertmanager-operated:9094"             env:               - name: NODE_NAME                 valueFrom:                   fieldRef:                     fieldPath: spec.nodeName               - name: POD_IP                 valueFrom:                   fieldRef:                     fieldPath: status.podIP               - name: POD_NAME                 valueFrom:                   fieldRef:                     fieldPath: metadata.name             ports:               - containerPort: 9093                 name: web                 protocol: TCP               - containerPort: 9094                 name: mesh-tcp                 protocol: TCP               - containerPort: 9094                 name: mesh-udp                 protocol: UDP             readinessProbe:               httpGet:                 path: /#/status                 port: 9093               initialDelaySeconds: 30               timeoutSeconds: 60             volumeMounts:               - name: config-volume                 mountPath: /etc/config               - name: storage-volume                 mountPath: "/data"                 subPath: ""             resources:               limits:                 cpu: 1000m                 memory: 500Mi               requests:                 cpu: 10m                 memory: 50Mi           - name: prometheus-alertmanager-configmap-reload             image: "jimmidyson/configmap-reload:v0.4.0"             imagePullPolicy: "IfNotPresent"             args:               - --volume-dir=/etc/config               - --webhook-url=http://localhost:9093/-/reload             volumeMounts:               - name: config-volume                 mountPath: /etc/config                 readOnly: true             resources:               limits:                 cpu: 10m                 memory: 10Mi               requests:                 cpu: 10m                 memory: 10Mi             securityContext:                 runAsUser: 0                 privileged: true         volumes:           - name: config-volume             configMap:               name: alertmanager-config     volumeClaimTemplates:       - metadata:           name: storage-volume         spec:           accessModes: [ "ReadWriteOnce" ]           storageClassName: "alertmanager-lpv"           resources:             requests:               storage: 5Gi   
创建Alertmanager的operated-service配置文件
cat alertmanager-operated-service.yaml   apiVersion: v1   kind: Service   metadata:     name: alertmanager-operated     namespace: kube-system     labels:       app.kubernetes.io/name: alertmanager-operated       app.kubernetes.io/component: alertmanager   spec:     type: ClusterIP     clusterIP: None     sessionAffinity: None     selector:       k8s-app: alertmanager     ports:       - name: web         port: 9093         protocol: TCP         targetPort: web       - name: tcp-mesh         port: 9094         protocol: TCP         targetPort: tcp-mesh       - name: udp-mesh         port: 9094         protocol: UDP         targetPort: udp-mesh   
部署
cd /data/manual-deploy/alertmanager/   ls   alertmanager-configmap.yaml   alertmanager-data-pv.yaml   alertmanager-data-storageclass.yaml   alertmanager-operated-service.yaml   alertmanager-service-statefulset.yaml   alertmanager-statefulset-cluster.yaml   kubectl apply -f .   
OK ,到此我们已经手动在k8s中的kube-system中以statefulset方式部署了Prometheus与Alertmanager,下一篇我们部署grafana与ingress-nginx的相关部署。

本文分享自微信公众号 - Kubernetes技术栈(k8stech)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。
 
 
 
 
 