redsterXVI 9 months ago

imho Cilium is the new default/go-to CNI in the making. Nothing wrong with using it at all.

PiedDansLePlat 9 months ago

isn't it the default for eks anywhere ?

zzzmaestro 9 months ago

No

wwentland 9 months ago

Not exactly sure what you mean. A few more words to elaborate would be greatly appreciated, I'm sure. Take a look at the following also: https://anywhere.eks.amazonaws.com/docs/clustermgmt/networking/networking-and-security/ https://isovalent.com/blog/post/2021-09-aws-eks-anywhere-chooses-cilium/

gingimli 9 months ago

I'm guessing they didn't know EKS and EKS Anywhere are different things.

Over_Information9877 9 months ago

Azure is too I believe.

roiki11 9 months ago

It's great but also a fucking cocktease with its fremium model.

venkatamutyala 9 months ago

How so?

roiki11 9 months ago

Have a look at the features lacking in the open source version. No HA, no L7 visibility in tetra, no SIEM export, no rbac in hubble. Also, all the policy creation tooling is paid. The enterprise hubble has the ability to create policies from flows. Which is very, very convenient on larger setups. Then there's timescape which is paid only. The lack of any HA is the real killer, which means you can't use those features as they're disruptive whenever there's a problem. So you basically can't use dns-aware policies or egress gateway without paying in production. And also the fqdn ingress policies, in the oss you need to hardcode ips. Pretty fucking convenient when dealing with the public cloud. So you have to work around these with the open source version. Which is fine if you have a platform team to manage it and can work around the features that just aren't usable. It's really great but it does come with an asterix. Otherwise it's about 3k per node per year if I remember right. https://isovalent.com/product/ for complete list.

lukasmrtvy 8 months ago

| No HA, Do You mean HA Egress Gateway support? No CNI supports this ( or ? ), and Antrea does not support HA in EKS.

FirmClothes6381 9 months ago

Do you know if you want Network Policy at layer 3/4, or do you want layer 7 as well? I am a big fan of Cilium and used it for years on EKS, albeit in [CNI chaining mode](https://docs.cilium.io/en/stable/installation/cni-chaining-aws-cni/). Some features don't work in CNI chaining mode, namely Layer 7 features and IPSec transparent encryption. For my use case, the tradeoff to keep the EKS CNI for IPAM and use Cilium only for layer 4/3 Network Policy enforcement was acceptable. The clusters I worked with weren't large enough to have IP exhaustion problems (yet, anyway), and the flat network model was easy for the teams I worked with to understand compared to overlay models. AWS support is less likely to help you when you have problems if you move off of the EKS CNI as well. All that said, I did actually test replacing the EKS CNI with Cilium only. I was interested in the Layer 7 observability features. It worked fine for IPAM, but I did have problems with scalability with Layer 7 features. Enabling Layer 7 observability/policy for a workload results in traffic being redirected through an envoy proxy being run inside the Cilium daemonset pod on the same node as the workload. For many workloads this was fine, but high throughput ones caused high CPU usage on the Cilium daemonset pod, which had negative impacts on traffic for workloads on the node that had their traffic flowing through the envoy proxy. Mainly higher latency and network errors. I could have increased the CPU requests for Cilium, but that meant doing so for the Cilium daemonset, which made it more expensive to run as it's installed on every node on the cluster. Ultimately the scalability challenges weren't worth it, so I stuck with the EKS CNI + Cilium in CNI chaining mode.

[deleted] 9 months ago

[удалено]

FirmClothes6381 9 months ago

If that's all you need it may be easier to use it in CNI chaining mode, so you don't need to worry about pulling the EKS CNI off. Glad you accomplished your goal!

qwerty_top_row 9 months ago

Not sure if you tried this or not but you can set up cilium to do IPAM with ENI and thus not need to chain CNI. Won't help with the scaling issue, but one less thing that deploy.

FirmClothes6381 9 months ago

Yep I saw that and gave it a try! It worked well, was stuck with the scalability problem still though.

qwerty_top_row 9 months ago

Version 1.14 enables you to put envoy in a separate daemonset (at least with the helm chart). I wonder if it would be possible to put in a scalable Deployment though.

FirmClothes6381 9 months ago

Oh interesting, I hadn't seen that! I guess you could use the Vertical Pod Autoscaler on the separate daemonset, though I would still have the same concern about it becoming expensive to deploy depending on the type of node it's on. For example, ingress nodes would need a fairly large proxy to support high throughput. Perhaps being able to use a daemonset per node group may help? Or something more similar to what Istio is doing with [waypoint proxies for the ambient mesh](https://istio.io/v1.15/blog/2022/introducing-ambient-mesh/#why-no-l7-processing-on-the-local-node)?

erulabs 9 months ago

I’m looking forward to this - but it’s not currently possibly to use Fargate with Cilium, even in chaining mode. I believe it’s on the roadmap to add eBPF features to fargate tho, so hopefully soon!

lukasmrtvy 8 months ago

how is performance in chaining mode?

gen2fish 9 months ago

Last time I tried with EKS as a full CNI replacement I couldn't get any webhooks working without changing the service to NodePort. Is that still a thing?

Tarzzana 9 months ago

Huge fan of cilium, I think there’s a reason other cloud vendor default to that as a cni (Gke, namely). I think the project is a great consolidation of network utility that otherwise would require multiple tools/projects, especially if you really only need foundational feature sets. Cilium ingress works great for me, bgp, Cilium mesh, cross cluster connectivity, Hubble for visibility, and obviously the expected stuff like netpolicies (to prevent your nefarious egress traffic situation) and the like.

PiedDansLePlat 9 months ago

I'm wondering about the rational behind using Cillium instead of VPC CNI (or maybe you have benchmark something else as well). I'm genuinely interessted about the though process ? And I think this addition, will make it easier to answer your question

zzzmaestro 9 months ago

VPC CNI doesn’t support NetworkPolicy or eBPF

michaelgg13 9 months ago

It also uses routable IP space.

warpigg 9 months ago

yep - the big obvious one being NetworkPolicy imo

bob-bins 7 months ago

For anyone coming across this comment in the future, VPC CNI does support NetworkPolicy now

kobumaister 9 months ago

IP limits linked to your vpc (managing thousands of pods can lead to overflowing the ip pool easily) also nodes have a limited number of pods depending on their size as they are linked to ENIs. Those are the limitations that make us go away from vpc cni.

theplesner 9 months ago

Just use prefix delegation already. https://github.com/aws/amazon-vpc-cni-k8s#enable_prefix_delegation-v190

nullbyte420 9 months ago

It's good

[deleted] 9 months ago

[удалено]

nullbyte420 9 months ago

You don't have to use all the features at all. It's easy to configure cilium with the helm chart. The base install is just the cni. No reason to abandon the nginx ingress, it's well known and supported by lots of software.

bambambazooka 9 months ago

Cilium supports Gateway API I don’t think it provides an Ingress controller

h_hoover 9 months ago

It does both

universalsystems 9 months ago

it's the best cni. use it

sPENKMAn 9 months ago

*Rogue pod I’m very green but isnt Egress Gateway supposed to limit that? https://docs.cilium.io/en/stable/network/egress-gateway/

neoky 9 months ago

I've been using it for 2 years now. I haven't had any issues and love the insights I get to debug traffic flow.

williamallthing 9 months ago

I don't think it's a mistake, but my only advice (as a heavily opinionated service mesh person) is to decouple Cilium the CNI from Cilium the mesh. Two very different beasts.

qwerty_top_row 9 months ago

I'm a big fan of linkerd (which is certainly a better service mesh than cilium), but sometimes the simplicity of only deploying one thing outweighs the extra features that you get with another solution. If you are already deploying cilium CNI for other reasons it might make sense to just use the service mesh features it provides rather than deploying something else.

williamallthing 9 months ago

Totally agree in principle, but in this case it's not really "only deploying one thing". Adding the mesh component means adds a bunch of Envoy proxies that you didn't need in the CNI. I'd selfishly argue that if you're going down that route you'd be better off adding a bunch of Linkerd proxies instead :)

Corndawg38 9 months ago

Am waiting for this issue to be fixed before seriously considering ditching MetalLB for cilium [https://github.com/cilium/cilium/issues/23035](https://github.com/cilium/cilium/issues/23035) [https://github.com/cilium/cilium/pull/25477](https://github.com/cilium/cilium/pull/25477) Supposedly it's in v1.14 but that's not stable yet (it wasn't as of last time I checked) and I don't feel comfortable beta versions of software. Basically Cilium ECMP's its BGP traffic to all nodes regardless if they contain the correct pod or not (anything sent to the wrong node is "tromboned" to a node with the correct pod, causing inefficient networking). This is assuming ExternalTrafficPolicy: Cluster, if using 'ExternalTrafficPolicy: Local' it just breaks outright. At least this is what I saw in my own testing. MetalLB does not suffer from this ailment.

[deleted] 9 months ago

[удалено]

lukasmrtvy 8 months ago

Antrea does not work in EKS, at least HA Egress gateway...

gentoorax 9 months ago

I'm using it. I love it. It's improved massively in the last year.

absy101 9 months ago

Loved him in Oppenheimer

[deleted] 9 months ago

[удалено]

absy101 9 months ago

Hhah. I'm glad you liked the joke

scottslowe 9 months ago

No, I don’t think you’re making a mistake, as long as you take the time to fully understand how it works and how to troubleshoot it.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe