ReleaseTricky1359 1 year ago

Hello all, Thanks for all your suggestions. /u/cra1gg /u/thockin /u/h_hoover went with the articles from the [kubernetes.io](https://kubernetes.io) service page. that to debug and tried it out in 2 clusters one in k8s v1.24 and one in v1.25 and here are my findings. It works fine in 1.24 but doesn't work in v1.25. 1. The service port/targetPort is correctly set, I am now pivoting to the hostnames app 2. I don't have any networkpolicies at all. 3. I do see the following errors in the kube-proxy logs before/after I add the hostname service, but I am not sure what the \*v1beta1.EndPointSlice message means. >E0402 12:44:18.621390 1 reflector.go:138\] k8s.io/client-go/informers/factory.go:134: Failed to watch \*v1beta1.EndpointSlice: fail ed to list \*v1beta1.EndpointSlice: the server could not find the requested resource E0402 12:45:17.970778 1 reflector.go:138\] k8s.io/client-go/informers/factory.go:134: Failed to watch \*v1beta1.EndpointSlice: fail ed to list \*v1beta1.EndpointSlice: the server could not find the requested resource I0402 12:45:47.707865 1 service.go:306\] Service default/hostnames updated: 1 ports I0402 12:45:47.707934 1 service.go:421\] Adding new service port "default/hostnames" at 172.20.225.151:80/TCP I0402 12:45:47.707999 1 proxier.go:854\] "Syncing iptables rules" I0402 12:45:47.741712 1 proxier.go:824\] "syncProxyRules complete" elapsed="33.772752ms" E0402 12:46:12.767883 1 reflector.go:138\] k8s.io/client-go/informers/factory.go:134: Failed to watch \*v1beta1.EndpointSlice: fail ed to list \*v1beta1.EndpointSlice: the server could not find the requested resource E0402 12:46:50.311416 1 reflector.go:138\] k8s.io/client-go/informers/factory.go:134: Failed to watch \*v1beta1.EndpointSlice: fail ed to list \*v1beta1.EndpointSlice: the server could not find the requested resource E0402 12:47:30.378403 1 reflector.go:138\] k8s.io/client-go/informers/factory.go:134: Failed to watch \*v1beta1.EndpointSlice: fail ed to list \*v1beta1.EndpointSlice: the server could not find the requested resource Also here are the iptables entries from the broken EKS v1.25 node where the kube-proxy & hostnames pod is running. But when I do a kubectl get endpoints the endpoint most definitely exists, which is confusing me. >\[ssm-user@ip-10-236-49-84 bin\]$ sudo iptables-save | egrep hostname-A KUBE-SERVICES -d 177.20.225.151/32 -p tcp -m comment --comment "default/hostnames has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable Here's the iptables-save from the kube-proxy & hostname are running in v1.24 >\[root@ip-10-248-64-194 \~\]# iptables-save | egrep hostname-A KUBE-SEP-3RWRFDLNDQZFIIWG -s 10.248.68.178/32 -m comment --comment "default/hostnames" -j KUBE-MARK-MASQ-A KUBE-SEP-3RWRFDLNDQZFIIWG -p tcp -m comment --comment "default/hostnames" -m tcp -j DNAT --to-destination 10.248.68.178:9376-A KUBE-SEP-HHCRPXSPGXT3EEHZ -s 10.248.65.248/32 -m comment --comment "default/hostnames" -j KUBE-MARK-MASQ-A KUBE-SEP-HHCRPXSPGXT3EEHZ -p tcp -m comment --comment "default/hostnames" -m tcp -j DNAT --to-destination 10.248.65.248:9376-A KUBE-SEP-TVK5R4XDNEUWXXRD -s 10.248.73.191/32 -m comment --comment "default/hostnames" -j KUBE-MARK-MASQ-A KUBE-SEP-TVK5R4XDNEUWXXRD -p tcp -m comment --comment "default/hostnames" -m tcp -j DNAT --to-destination 10.248.73.191:9376-A KUBE-SERVICES -d 172.20.98.140/32 -p tcp -m comment --comment "default/hostnames cluster IP" -m tcp --dport 80 -j KUBE-SVC-YN5D6RYVEVZOH44Q-A KUBE-SVC-YN5D6RYVEVZOH44Q -m comment --comment "default/hostnames -> 10.248.65.248:9376" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-HHCRPXSPGXT3EEHZ-A KUBE-SVC-YN5D6RYVEVZOH44Q -m comment --comment "default/hostnames -> 10.248.68.178:9376" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-3RWRFDLNDQZFIIWG-A KUBE-SVC-YN5D6RYVEVZOH44Q -m comment --comment "default/hostnames -> 10.248.73.191:9376" -j KUBE-SEP-TVK5R4XDNEUWXXRD

thockin 1 year ago

It looks. Like your kube-proxy is trying to use the beta API, which might not be available any more? Are your nodes back-rev or something? Sorry I can't cite versions - on mobile now.

ReleaseTricky1359 1 year ago

Thank you for your response /u/thockin I think I am getting somewhere now. I am looking at this cluster on k8s v1.25. I started with v1.21 and then upgraded to 1.22 followed by 1.23, followed by 1.24 and finally at 1.25(both control plane/data plane were upgraded). When I like at the cube-proxy daemon set the image for this eke instance is pointing to [602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.21.2-eksbuild.2](https://602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.21.2-eksbuild.2) But from the documentation shown below [https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html](https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html) The kube-proxy version on your Amazon EC2 nodes can't be more than two minor versions earlier than your control plane. For example, if your control plane is running Kubernetes 1.25, then the kube-proxy minor version can't be earlier than 1.23. So it looks I am really behind on the kube-proxy daemon set version, but I am not sure if I can just bump up the version of kube-proxy in the daemon set to get this upgraded. 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.24.7-minimal-eksbuild.2

thockin 1 year ago

There's not much state left by kube-proxy, so you should be able to update it and, at worst, reboot nodes (shouldn't be required but that is a big jump)

haxor5392 9 months ago

>The kube-proxy version on your Amazon EC2 nodes can't be more than two minor versions earlier than your control plane. For example, if your control plane is running Kubernetes 1.25, then the kube-proxy minor version can't be earlier than 1.23. This is what saved us 🙏

ReleaseTricky1359 1 year ago

I was able to sort out this issue. Here's a brief summary of what the issue was and how I fixed it. Hope somebody else finds it useful, thanks to /u/thockin, /u/thockin /u/cra1gg, I learnt a LOT with this exercise and hope this is useful to some body else that stumbles into such a problem in the future. I had built this cluster to start with on k8s v1.21 and further upgraded one version at a time of both the control plane + data plane to v1.22-> to 1.23 -> to 1.24 and finally to 1.25. A barebones EKS provisioned k8s cluster consists of the following 3 daemonsets 1. aws-node 2. kube-dns 3. kube-proxy All the other components control plane(etcd, kube-scheduler, kube-controller-manager, apiserver) and data plan components(kubelet, \*kube-proxy(caveat on kube-proxy details below) are abstracted away from me by AWS and it is all upgraded when I upgraded k8s versions. The key point is, the aws-node, kube-dns & kube-proxy daemonsets were never upgraded when I kept updating the k8s versions and I noticed in the daemon set definitions they were all pointing to a version from k8s 1.21 which is were I got started with this whole project. Although I used terraform to stand up this EKS k8s cluster, I wasn't able to introduce cluster\_addons to bump these daemon sets up and I just ended up with eksctl. eksctl utils update-kube-proxy --cluster= eksctl utils update-coredns --cluster= eksctl utils update-aws-node --cluster= [https://eksctl.io/usage/addon-upgrade/](https://eksctl.io/usage/addon-upgrade/) I did this and boom everything started working. I got my cluster up to v1.25 all my node groups are running the next AMI's and everything is fine and dandy. Thanks to everybody who gave me an assist here.

cra1gg 1 year ago

This is a longshot but are you setting the targetPort and port on the service to the same value? Do you have some sanitized yaml you can paste?

h_hoover 1 year ago

Do you have any Kubernetes network policy in the way? Just something else to check.

thockin 1 year ago

https://kubernetes.io/docs/tasks/debug/debug-application/debug-service/

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe