Ich versuche, den Kubernetes-Master einzurichten, indem er Folgendes ausgibt:
kubeadm init --pod-network-cidr = 192.168.0.0/16
problem: coredns
-Pods haben CrashLoopBackOff
oder Error
-Status:
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-lflwx 2/2 Running 0 2d
coredns-576cbf47c7-nm7gc 0/1 CrashLoopBackOff 69 2d
coredns-576cbf47c7-nwcnx 0/1 CrashLoopBackOff 69 2d
etcd-suey.nknwn.local 1/1 Running 0 2d
kube-apiserver-suey.nknwn.local 1/1 Running 0 2d
kube-controller-manager-suey.nknwn.local 1/1 Running 0 2d
kube-proxy-xkgdr 1/1 Running 0 2d
kube-scheduler-suey.nknwn.local 1/1 Running 0 2d
#
Ich habe es mit Troubleshooting kubeadm - Kubernetes versucht, jedoch läuft mein Knoten nicht SELinux
und mein Docker ist auf dem neuesten Stand.
# docker --version
Docker version 18.06.1-ce, build e68fc7a
#
kubectl
s describe
:
# kubectl -n kube-system describe pod coredns-576cbf47c7-nwcnx
Name: coredns-576cbf47c7-nwcnx
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: suey.nknwn.local/192.168.86.81
Start Time: Sun, 28 Oct 2018 22:39:46 -0400
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Annotations: cni.projectcalico.org/podIP: 192.168.0.30/32
Status: Running
IP: 192.168.0.30
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://ec65b8f40c38987961e9ed099dfa2e8bb35699a7f370a2cda0e0d522a0b05e79
Image: k8s.gcr.io/coredns:1.2.2
Image ID: docker-pullable://k8s.gcr.io/[email protected]:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 31 Oct 2018 23:28:58 -0400
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 31 Oct 2018 23:21:35 -0400
Finished: Wed, 31 Oct 2018 23:23:54 -0400
Ready: True
Restart Count: 103
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-xvq8b:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-xvq8b
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 54m (x10 over 4h19m) kubelet, suey.nknwn.local Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 9m56s (x92 over 4h20m) kubelet, suey.nknwn.local Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 5m4s (x173 over 4h10m) kubelet, suey.nknwn.local Back-off restarting failed container
# kubectl -n kube-system describe pod coredns-576cbf47c7-nm7gc
Name: coredns-576cbf47c7-nm7gc
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: suey.nknwn.local/192.168.86.81
Start Time: Sun, 28 Oct 2018 22:39:46 -0400
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Annotations: cni.projectcalico.org/podIP: 192.168.0.31/32
Status: Running
IP: 192.168.0.31
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://0f2db8d89a4c439763e7293698d6a027a109bf556b806d232093300952a84359
Image: k8s.gcr.io/coredns:1.2.2
Image ID: docker-pullable://k8s.gcr.io/[email protected]:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 31 Oct 2018 23:29:11 -0400
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 31 Oct 2018 23:21:58 -0400
Finished: Wed, 31 Oct 2018 23:24:08 -0400
Ready: True
Restart Count: 102
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-xvq8b:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-xvq8b
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 44m (x12 over 4h18m) kubelet, suey.nknwn.local Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
Warning BackOff 4m58s (x170 over 4h9m) kubelet, suey.nknwn.local Back-off restarting failed container
Warning Unhealthy 8s (x102 over 4h19m) kubelet, suey.nknwn.local Liveness probe failed: HTTP probe failed with statuscode: 503
#
kubectl
s log
:
# kubectl -n kube-system logs -f coredns-576cbf47c7-nm7gc
E1101 03:31:58.974836 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974836 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974857 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.975493 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.976732 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.977788 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.976164 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.977415 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.978332 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/01 03:33:08 [INFO] SIGTERM: Shutting down servers then terminating
E1101 03:33:31.976864 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.978080 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.979156 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
#
# kubectl -n kube-system log -f coredns-576cbf47c7-gqdgd
.:53
2018/11/05 04:04:13 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:13 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:13 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:19 [FATAL] plugin/loop: Seen "HINFO IN 3597544515206064936.6415437575707023337." more than twice, loop detected
# kubectl -n kube-system log -f coredns-576cbf47c7-hhmws
.:53
2018/11/05 04:04:18 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:18 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:18 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:24 [FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
#
describe
(apiserver
):
# kubectl -n kube-system describe pod kube-apiserver-suey.nknwn.local
Name: kube-apiserver-suey.nknwn.local
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: suey.nknwn.local/192.168.87.20
Start Time: Fri, 02 Nov 2018 00:28:44 -0400
Labels: component=kube-apiserver
tier=control-plane
Annotations: kubernetes.io/config.hash: 2433a531afe72165364aace3b746ea4c
kubernetes.io/config.mirror: 2433a531afe72165364aace3b746ea4c
kubernetes.io/config.seen: 2018-11-02T00:28:43.795663261-04:00
kubernetes.io/config.source: file
scheduler.alpha.kubernetes.io/critical-pod:
Status: Running
IP: 192.168.87.20
Containers:
kube-apiserver:
Container ID: docker://659456385a1a859f078d36f4d1b91db9143d228b3bc5b3947a09460a39ce41fc
Image: k8s.gcr.io/kube-apiserver:v1.12.2
Image ID: docker-pullable://k8s.gcr.io/[email protected]:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.87.20
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Running
Started: Sun, 04 Nov 2018 22:57:27 -0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 04 Nov 2018 20:12:06 -0500
Finished: Sun, 04 Nov 2018 22:55:24 -0500
Ready: True
Restart Count: 2
Requests:
cpu: 250m
Liveness: http-get https://192.168.87.20:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/ca-certificates from etc-ca-certificates (ro)
/etc/kubernetes/pki from k8s-certs (ro)
/etc/ssl/certs from ca-certs (ro)
/usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
/usr/share/ca-certificates from usr-share-ca-certificates (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
etc-ca-certificates:
Type: HostPath (bare Host directory volume)
Path: /etc/ca-certificates
HostPathType: DirectoryOrCreate
k8s-certs:
Type: HostPath (bare Host directory volume)
Path: /etc/kubernetes/pki
HostPathType: DirectoryOrCreate
ca-certs:
Type: HostPath (bare Host directory volume)
Path: /etc/ssl/certs
HostPathType: DirectoryOrCreate
usr-share-ca-certificates:
Type: HostPath (bare Host directory volume)
Path: /usr/share/ca-certificates
HostPathType: DirectoryOrCreate
usr-local-share-ca-certificates:
Type: HostPath (bare Host directory volume)
Path: /usr/local/share/ca-certificates
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute
Events: <none>
#
syslog (Host):
4. November 22:59:36 suey kubelet [1234]: E1104 22: 59: 36.139538 1234 pod_workers.go: 186] Fehler beim Synchronisieren des Pods d8146b7e-de57-11e8-a1e2-ec8eb57434c8 ("coredns-576cbf47c7-hhmws_kube-system (d8146b7e-de57-11e8-a1e2-ec8eb57434c8)"), Überspringen: "StartContainer" für "Coredns" mit .__ fehlgeschlagen. CrashLoopBackOff: "40-Sekunden-Neustart des Back-Offs fehlgeschlagen.
Bitte beraten.
Dieser Fehler
[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
wird verursacht, wenn CoreDNS eine Schleife in der Auflösungskonfiguration erkennt und das beabsichtigte Verhalten ist. Sie treffen dieses Problem:
https://github.com/kubernetes/kubeadm/issues/1162
https://github.com/coredns/coredns/issues/2087
Hacky-Lösung: Deaktivieren Sie die CoreDNS-Schleifenerkennung.
Bearbeiten Sie die CoreDNS-Configmap:
kubectl -n kube-system edit configmap coredns
Entfernen Sie die Zeile mit loop
oder kommentieren Sie sie aus, speichern Sie und beenden Sie sie.
Dann entfernen Sie die CoreDNS-Pods, damit mit der neuen Konfiguration neue erstellt werden können:
kubectl -n kube-system delete pod -l k8s-app=kube-dns
Danach sollte alles in Ordnung sein.
Bevorzugte Lösung: Entfernen Sie die Schleife in der DNS-Konfiguration.
Prüfen Sie zunächst, ob Sie systemd-resolved
verwenden. Wenn Sie Ubuntu 18.04 verwenden, ist dies wahrscheinlich der Fall.
systemctl list-unit-files | grep enabled | grep systemd-resolved
Wenn ja, prüfen Sie, welche resolv.conf
-Datei Ihr Cluster als Referenz verwendet:
ps auxww | grep kubelet
Möglicherweise sehen Sie eine Zeile wie:
/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf
Der wichtige Teil ist --resolv-conf
- wir ermitteln, ob systemd resolv.conf verwendet wird oder nicht.
Wenn es der resolv.conf
von systemd
ist, gehen Sie wie folgt vor:
Überprüfen Sie den Inhalt von /run/systemd/resolve/resolv.conf
, um festzustellen, ob ein Datensatz wie folgt vorhanden ist:
nameserver 127.0.0.1
Wenn es 127.0.0.1
gibt, ist es derjenige, der die Schleife verursacht.
Um es loszuwerden, sollten Sie diese Datei nicht bearbeiten, sondern an anderen Stellen überprüfen, ob sie ordnungsgemäß generiert wird.
Überprüfen Sie alle Dateien unter /etc/systemd/network
und ob Sie einen Datensatz finden
DNS=127.0.0.1
diesen Datensatz löschen. Überprüfen Sie auch /etc/systemd/resolved.conf
und machen Sie dasselbe, falls erforderlich. Stellen Sie sicher, dass mindestens ein oder zwei DNS-Server konfiguriert sind, z. B.
DNS=1.1.1.1 1.0.0.1
Starten Sie anschließend die systemd-Dienste neu, um die Änderungen zu übernehmen: Systemctl Neustart von systemd-networkd systemd-resolution
Vergewissern Sie sich danach, dass sich DNS=127.0.0.1
nicht mehr in der resolv.conf
-Datei befindet:
cat /run/systemd/resolve/resolv.conf
Lösen Sie anschließend die DNS-Pods erneut aus
kubectl -n kube-system delete pod -l k8s-app=kube-dns
Summary: Die Lösung besteht darin, die DNS-Lookup-Schleife aus der Host-DNS-Konfiguration zu entfernen. Die Schritte unterscheiden sich zwischen den verschiedenen resolv.conf-Managern/Implementierungen.
Hier sind einige Shell-Hacker, die Utku s answer automatisieren:
# remove loop from DNS config files
Sudo find /etc/systemd/network /etc/systemd/resolved.conf -type f \
-exec sed -i '/^DNS=127.0.0.1/d' {} +
# if necessary, configure some DNS servers (use cloudfare public)
if ! grep '^DNS=.*' /etc/systemd/resolved.conf; then
Sudo sed -i '$aDNS=1.1.1.1 1.0.0.1' /etc/systemd/resolved.conf
fi
# restart systemd services
Sudo systemctl restart systemd-networkd systemd-resolved
# force (re-) creation of the dns pods
kubectl -n kube-system delete pod -l k8s-app=kube-dns