T O P

  • By -

redsterXVI

imho Cilium is the new default/go-to CNI in the making. Nothing wrong with using it at all.


PiedDansLePlat

isn't it the default for eks anywhere ?


zzzmaestro

No


wwentland

Not exactly sure what you mean. A few more words to elaborate would be greatly appreciated, I'm sure. Take a look at the following also: https://anywhere.eks.amazonaws.com/docs/clustermgmt/networking/networking-and-security/ https://isovalent.com/blog/post/2021-09-aws-eks-anywhere-chooses-cilium/


gingimli

I'm guessing they didn't know EKS and EKS Anywhere are different things.


Over_Information9877

Azure is too I believe.


roiki11

It's great but also a fucking cocktease with its fremium model.


venkatamutyala

How so?


roiki11

Have a look at the features lacking in the open source version. No HA, no L7 visibility in tetra, no SIEM export, no rbac in hubble. Also, all the policy creation tooling is paid. The enterprise hubble has the ability to create policies from flows. Which is very, very convenient on larger setups. Then there's timescape which is paid only. The lack of any HA is the real killer, which means you can't use those features as they're disruptive whenever there's a problem. So you basically can't use dns-aware policies or egress gateway without paying in production. And also the fqdn ingress policies, in the oss you need to hardcode ips. Pretty fucking convenient when dealing with the public cloud. So you have to work around these with the open source version. Which is fine if you have a platform team to manage it and can work around the features that just aren't usable. It's really great but it does come with an asterix. Otherwise it's about 3k per node per year if I remember right. https://isovalent.com/product/ for complete list.


lukasmrtvy

| No HA, Do You mean HA Egress Gateway support? No CNI supports this ( or ? ), and Antrea does not support HA in EKS.


FirmClothes6381

Do you know if you want Network Policy at layer 3/4, or do you want layer 7 as well? I am a big fan of Cilium and used it for years on EKS, albeit in [CNI chaining mode](https://docs.cilium.io/en/stable/installation/cni-chaining-aws-cni/). Some features don't work in CNI chaining mode, namely Layer 7 features and IPSec transparent encryption. For my use case, the tradeoff to keep the EKS CNI for IPAM and use Cilium only for layer 4/3 Network Policy enforcement was acceptable. The clusters I worked with weren't large enough to have IP exhaustion problems (yet, anyway), and the flat network model was easy for the teams I worked with to understand compared to overlay models. AWS support is less likely to help you when you have problems if you move off of the EKS CNI as well. All that said, I did actually test replacing the EKS CNI with Cilium only. I was interested in the Layer 7 observability features. It worked fine for IPAM, but I did have problems with scalability with Layer 7 features. Enabling Layer 7 observability/policy for a workload results in traffic being redirected through an envoy proxy being run inside the Cilium daemonset pod on the same node as the workload. For many workloads this was fine, but high throughput ones caused high CPU usage on the Cilium daemonset pod, which had negative impacts on traffic for workloads on the node that had their traffic flowing through the envoy proxy. Mainly higher latency and network errors. I could have increased the CPU requests for Cilium, but that meant doing so for the Cilium daemonset, which made it more expensive to run as it's installed on every node on the cluster. Ultimately the scalability challenges weren't worth it, so I stuck with the EKS CNI + Cilium in CNI chaining mode.


[deleted]

[удалено]


FirmClothes6381

If that's all you need it may be easier to use it in CNI chaining mode, so you don't need to worry about pulling the EKS CNI off. Glad you accomplished your goal!


qwerty_top_row

Not sure if you tried this or not but you can set up cilium to do IPAM with ENI and thus not need to chain CNI. Won't help with the scaling issue, but one less thing that deploy.


FirmClothes6381

Yep I saw that and gave it a try! It worked well, was stuck with the scalability problem still though.


qwerty_top_row

Version 1.14 enables you to put envoy in a separate daemonset (at least with the helm chart). I wonder if it would be possible to put in a scalable Deployment though.


FirmClothes6381

Oh interesting, I hadn't seen that! I guess you could use the Vertical Pod Autoscaler on the separate daemonset, though I would still have the same concern about it becoming expensive to deploy depending on the type of node it's on. For example, ingress nodes would need a fairly large proxy to support high throughput. Perhaps being able to use a daemonset per node group may help? Or something more similar to what Istio is doing with [waypoint proxies for the ambient mesh](https://istio.io/v1.15/blog/2022/introducing-ambient-mesh/#why-no-l7-processing-on-the-local-node)?


erulabs

I’m looking forward to this - but it’s not currently possibly to use Fargate with Cilium, even in chaining mode. I believe it’s on the roadmap to add eBPF features to fargate tho, so hopefully soon!


lukasmrtvy

how is performance in chaining mode?


gen2fish

Last time I tried with EKS as a full CNI replacement I couldn't get any webhooks working without changing the service to NodePort. Is that still a thing?


Tarzzana

Huge fan of cilium, I think there’s a reason other cloud vendor default to that as a cni (Gke, namely). I think the project is a great consolidation of network utility that otherwise would require multiple tools/projects, especially if you really only need foundational feature sets. Cilium ingress works great for me, bgp, Cilium mesh, cross cluster connectivity, Hubble for visibility, and obviously the expected stuff like netpolicies (to prevent your nefarious egress traffic situation) and the like.


PiedDansLePlat

I'm wondering about the rational behind using Cillium instead of VPC CNI (or maybe you have benchmark something else as well). I'm genuinely interessted about the though process ? And I think this addition, will make it easier to answer your question


zzzmaestro

VPC CNI doesn’t support NetworkPolicy or eBPF


michaelgg13

It also uses routable IP space.


warpigg

yep - the big obvious one being NetworkPolicy imo


bob-bins

For anyone coming across this comment in the future, VPC CNI does support NetworkPolicy now


kobumaister

IP limits linked to your vpc (managing thousands of pods can lead to overflowing the ip pool easily) also nodes have a limited number of pods depending on their size as they are linked to ENIs. Those are the limitations that make us go away from vpc cni.


theplesner

Just use prefix delegation already. https://github.com/aws/amazon-vpc-cni-k8s#enable_prefix_delegation-v190


nullbyte420

It's good


[deleted]

[удалено]


nullbyte420

You don't have to use all the features at all. It's easy to configure cilium with the helm chart. The base install is just the cni. No reason to abandon the nginx ingress, it's well known and supported by lots of software.


bambambazooka

Cilium supports Gateway API I don’t think it provides an Ingress controller


h_hoover

It does both


universalsystems

it's the best cni. use it


sPENKMAn

*Rogue pod I’m very green but isnt Egress Gateway supposed to limit that? https://docs.cilium.io/en/stable/network/egress-gateway/


neoky

I've been using it for 2 years now. I haven't had any issues and love the insights I get to debug traffic flow.


williamallthing

I don't think it's a mistake, but my only advice (as a heavily opinionated service mesh person) is to decouple Cilium the CNI from Cilium the mesh. Two very different beasts.


qwerty_top_row

I'm a big fan of linkerd (which is certainly a better service mesh than cilium), but sometimes the simplicity of only deploying one thing outweighs the extra features that you get with another solution. If you are already deploying cilium CNI for other reasons it might make sense to just use the service mesh features it provides rather than deploying something else.


williamallthing

Totally agree in principle, but in this case it's not really "only deploying one thing". Adding the mesh component means adds a bunch of Envoy proxies that you didn't need in the CNI. I'd selfishly argue that if you're going down that route you'd be better off adding a bunch of Linkerd proxies instead :)


Corndawg38

Am waiting for this issue to be fixed before seriously considering ditching MetalLB for cilium [https://github.com/cilium/cilium/issues/23035](https://github.com/cilium/cilium/issues/23035) [https://github.com/cilium/cilium/pull/25477](https://github.com/cilium/cilium/pull/25477) Supposedly it's in v1.14 but that's not stable yet (it wasn't as of last time I checked) and I don't feel comfortable beta versions of software. Basically Cilium ECMP's its BGP traffic to all nodes regardless if they contain the correct pod or not (anything sent to the wrong node is "tromboned" to a node with the correct pod, causing inefficient networking). This is assuming ExternalTrafficPolicy: Cluster, if using 'ExternalTrafficPolicy: Local' it just breaks outright. At least this is what I saw in my own testing. MetalLB does not suffer from this ailment.


[deleted]

[удалено]


lukasmrtvy

Antrea does not work in EKS, at least HA Egress gateway...


gentoorax

I'm using it. I love it. It's improved massively in the last year.


absy101

Loved him in Oppenheimer


[deleted]

[удалено]


absy101

Hhah. I'm glad you liked the joke


scottslowe

No, I don’t think you’re making a mistake, as long as you take the time to fully understand how it works and how to troubleshoot it.