突然之间,我无法部署一些以前可以部署的映像。我得到了以下豆荚状态:
[root@webdev2 origin]# oc get pods
NAME READY STATUS RESTARTS AGE
arix-3-yjq9w 0/1 ImagePullBackOff 0 10m
docker-registry-2-vqstm 1/1 Running 0 2d
router-1-kvjxq 1/1 Running 0 2d
应用程序就是无法启动。吊舱并没有试图运行容器。从事件页面,我已经退下拉图像“172.30.84.25:5000/default/arix@sha256:d326。我已经验证了我可以用docker pull的标签拉图像。
我也查了上一个集装箱的日志。因为某种原因,它被关闭了。我觉得太空舱至少应该试着重启一下。
我已经没有办法解决这个问题了。我还可以检查什么?
你可以使用“描述pod”语法
OpenShift使用:
oc describe pod <pod-id>
对于香草Kubernetes:
kubectl describe pod <pod-id>
检查输出的事件。
在我的例子中,它显示back off pull image unreachableserver/nginx:1.14.22222
在这种情况下,镜像unreachableserver/nginx:1.14.22222不能从互联网上提取,因为没有Docker注册表unreachableserver,并且镜像nginx:1.14.22222不存在。
注意:如果你没有看到任何感兴趣的事件,并且pod已经处于'ImagePullBackOff'状态一段时间了(似乎超过60分钟),你需要删除pod,并从新的pod中查看事件。
OpenShift使用:
oc delete pod <pod-id>
oc get pods
oc get pod <new-pod-id>
对于香草Kubernetes:
kubectl delete pod <pod-id>
kubectl get pods
kubectl get pod <new-pod-id>
样例输出:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32s default-scheduler Successfully assigned rk/nginx-deployment-6c879b5f64-2xrmt to aks-agentpool-x
Normal Pulling 17s (x2 over 30s) kubelet Pulling image "unreachableserver/nginx:1.14.22222"
Warning Failed 16s (x2 over 29s) kubelet Failed to pull image "unreachableserver/nginx:1.14.22222": rpc error: code = Unknown desc = Error response from daemon: pull access denied for unreachableserver/nginx, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Warning Failed 16s (x2 over 29s) kubelet Error: ErrImagePull
Normal BackOff 5s (x2 over 28s) kubelet Back-off pulling image "unreachableserver/nginx:1.14.22222"
Warning Failed 5s (x2 over 28s) kubelet Error: ImagePullBackOff
其他调试步骤
try to pull the docker image and tag manually on your computer
Identify the node by doing a 'kubectl/oc get pods -o wide'
ssh into the node (if you can) that can not pull the docker image
check that the node can resolve the DNS of the docker registry by performing a ping.
try to pull the docker image manually on the node
If you are using a private registry, check that your secret exists and the secret is correct. Your secret should also be in the same namespace. Thanks swenzel
Some registries have firewalls that limit ip address access. The firewall may block the pull
Some CIs create deployments with temporary docker secrets. So the secret expires after a few days (You are asking for production failures...)
在GKE上,如果pod已死,最好检查事件。
它将更详细地显示错误是关于什么的。
就我而言,我有:
Failed to pull image "gcr.io/project/imagename@sha256:c8e91af54fc17faa1c49e2a05def5cbabf8f0a67fc558eb6cbca138061a8400a":
rpc error: code = Unknown desc = error pulling image configuration: unknown blob
结果照片不知怎么被损坏了。在重新推它并使用新的散列部署它之后,它又可以工作了。
回顾一下,我认为图像被损坏了,因为在GCP中托管图像的桶设置了一个清理策略,基本上删除了图像。因此,可以在事件中看到上述消息。
其他常见的问题是错误的名称(gcr。IO vs . eu.gcr.io),也可能是无法到达注册表。再次强调,提示存在于事件中,那里的信息应该告诉你足够的信息。
更多的一般信息可以在这里找到(比如身份验证):
推拉图像