This repository has been archived on 2024-10-01. You can view files and clone it, but cannot push or open issues or pull requests.
badhouseplants-net-old/content/posts/dont-use-argocd-for-infrastructure/index.md

323 lines
21 KiB
Markdown
Raw Permalink Normal View History

---
title: "Don't use ArgoCD for your infrastructure"
date: 2023-02-09T12:47:32+01:00
draft: false
ShowToc: true
cover:
2023-04-25 08:34:03 +00:00
image: "/posts/dont-use-argocd-for-infrastructure/cover.png"
caption: "ArgoCD"
relative: false
responsiveImages: false
---
> Of course, it's just a clickbait title. Use whatever works for you. I will just describe why I wouldn't use `ArgoCD` for the infrastructure
## Prelude
`ArgoCD` is an incredibly popular tool and I see that many DevOps guys *(I know that it's not a job definition, but I feel like it's the best description that everybody can understand)* want to use everywhere. I wasn't an exception, but I've just changed my mind. I still think that `ArgoCD` is cool, and you need to use it, but not for the infrastructure.
## But why?
### One more prelude
Let's assume you are a team that is providing something as a service to other teams. Even if you're the only one member, it doesn't matter. And let's assume you're working with `Kubernetes` or you plan to work with it, otherwise I'm not sure why you would even read the post.
> It's very common to use separated clusters for different teams, customers, applications, etc. Let's say you have 3 clusters
![3 clusters and you](/posts/dont-use-argocd-for-infrastructure/3-clusters.png)
Setups may be different, you can use different clusters for different products, environments, teams, or you can have your own opinion on how to split workload between clusters. But these (in our case) 3 clusters are used directly by other teams. Also, you may want to have a cluster for providing services, let's assume, your company decided to use [Gitea](https://gitea.io/en-us/) as a `git` provider, and you deployed it to Kubernetes. *It may be a very controversial example, but I'm not talking about what should run in K8s and what shouldn't, so if you can think of any other thing, that is supposed to be used across the whole company (GitLab Runners, Bitwarden, ElasticSearch, etc...)*. So it's already 4 clusters. Let's call the fourth cluster a `DevOps Cluster`
![3 Clusters and gitea](/posts/dont-use-argocd-for-infrastructure/3-clusters-and-gitea.png)
I assume you need to have some common stuff deployed to each cluster, let's think of (Prometheus, Grafana and Loki).
And now you need to decide how to deploy it. You may have already known about `ArgoCD`, or you decided to look for **Best Practices** and found a lot about `ArgoCD`. And it sounds perfect. Everybody tends to use it. You can find a lot of information everywhere. People are helpful. GitHub repo is well-maintained.
>Why Argo CD?
>
>Application definitions, configurations, and environments should be declarative and version controlled. Application deployment and lifecycle management should be automated, >auditable, and easy to understand.
And now you need first deliver the `ArgoCD` itself and later start delivering everything with `ArgoCD`.
Let's first talk about how to deliver Argo. There are different options. For example, you can have one main installation in the `Devops Cluster` and use it to manage other clusters. That sounded good to me when I first heard about it. But I wanted to have all configuration as code, and to add other clusters to the main `Argo` you need to use the `argocd cli`, so it's either an addition step in the CI/CD, or a manual work. I didn't like both options, because I wanted to avoid adding scripts to pipelines, and manual work just wasn't an option. And also it's not very transparent anymore where all the applications in target clusters are coming from (or maybe I just couldn't find, I'd rather think that I was dumb). One more thing is that you obviously can't have several `K8s` resources with the same name in one namespace, so every `Application` must have a different name. I don't like long names, so it looks ugly to me. Especially, when you cluster have long names, like "the-first-product-production", and your application looks like "the-first-product-production-grafana". And you don't have to use the cluster name for the application, for sure, but you would like to have some logic there. And this logic must be as obvious as possible. But anyway, these are three main issues that I've faced, and that I can't live with, so here comes the second way to deliver `Argo`, install it to each cluster.
So I would go with 4 `ArgoCD` installations. So the first step is to install it, that is not a problem at all, there are many ways to do it. And after it's installed, we need to start delivering other applications. I'm aware of 3 ways of doing it:
1. Use `Application` manifests for applications
2. Use `Application` manifests to manage `Application` manifests from repo (the App of Apps pattern, or something like that)
3. Use `ApplicationSet` manifests to make `ArgoCD` render `Application` manifests and apply them
### Application
First option is really straightforward, isn't. All we need to do is to create manifests. `ArgoCD` devs have just published the versions 2.6 with `multi-source` applications support. *But currently I can't say it's usable. The main issue for me is that the `argocd` cli doesn't work with them, that makes the whole thing pointless to me. Without cli I can't implement CD, then I see no reason to use them at all. I could use the `AutoSync` option, but I won't do that, and later I'll come back to this point and describe why, maybe in the next post, or later*. So I can't use multi-source applications right now. Let's look at the list of applications that I need to install one more time:
To all clusters:
- Prometheus
- Grafana
- Loki
To the DevOps cluster only:
- Gitea
There are many ways to install applications to `K8s`. But actually, I think, that there is only one real way: [helm](https://helm.sh/). Why? Because each of those applications are a huge amount of manifests, that you need to combine, install and maintain. You probably won't write those manifests yourself. There are other options to install apps, but all of them seem super complicated. And I doubt that you want to spend 8 hours per day editing `yaml` files. At least I don't, so I'm choosing helm.
>I need to say that I'm not 100% happy with helm. There are some issues that seem very important to me, but it's good enough to use it. But maybe we can talk about them later.
Let's try the first approach (`Application` for an application)
First, package
```YAML
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prometheus
namespace: argocd
spec:
destination:
namespace: monitoring # Let's not touch namespace management this time. Let's assume we already solved this issue
server: https://kubernetes.default.svc
project: monitoring
source:
chart: kube-prometheus-stack
helm:
valueFiles:
- values.yaml
path: .
repoURL: https://prometheus-community.github.io/helm-charts
targetRevision: 45.0.0
```
But what about values? Single-source application will not be able to find values files in your repo if you use a remote chart, so you have two options (that I'm aware of)
1. Add values directly to you source like this:
```YAML
spec.source.helm.values: |
you-values: here
```
2. Create a CMP for handling helm packages and values
Second way is good, but complicated. Because it's a self-written tool that you should implement, that should work with argo, that you should maintain, and without any guarantees that it will keep working after `ArgoCD` is updated. I was using Argo with custom CMP, it's no fun.
But anyway, the `Application` way is not scalable, because you will have to create a manifest for each cluster, and is not secure, because you can't easily encrypt the data. Also, if you've seen values for `kube-prometheus-stack`, you know that they are huge. So now you have 4 huge manifests with unencrypted secrets. And it's only for one app, so it probably looks like this:
```
manifests/
cluster1/
prometheus.yaml
loki.yaml
grafana.yaml
cluster2/
prometheus.yaml
loki.yaml
grafana.yaml
cluster3/
prometheus.yaml
loki.yaml
grafana.yaml
cluster-devops/
prometheus.yaml
loki.yaml
grafana.yaml
gitea.yaml
```
In my experience, each `Application` like this with a proper configuration will contain about 150 - 200 lines of code, so you have about 1950 - 1600 lines of code to install 4 applications. One of them is really special, and others will most probably will have only several lines that are not duplicating, e.g. for ingress and passwords.
I think it's not a way to go. To solve this problem, many guys save charts to the same git repo where they store values, using helm-freeze for example. So it looks like
```
helm-freeze.yaml
vendored_charts/
prometheus/...
grafana/...
loke/...
gitea/...
manifests/
cluster1/
prometheus.yaml
loki.yaml
grafana.yaml
cluster2/
prometheus.yaml
loki.yaml
grafana.yaml
cluster3/
prometheus.yaml
loki.yaml
grafana.yaml
cluster-devops/
prometheus.yaml
loki.yaml
grafana.yaml
gitea.yaml
values/
prometheus/...
grafana/...
loki/...
gitea/...
```
Yes, now you can use values from files, you can encrypt secrets and your `Applications` are not that huge anymore. But I'm strongly against vendoring external charts. Why? First, it's my ideology, briefly, if you don't trust packagers, you don't use their packages. Vendoring charts into a git repo also means that you need to add a manual step to download them. With helm-freeze, for example, you need to execute `helm-freeze sync`. It's either pre-commit hook, or a manual execution, or a step in CI/CD. I don't like all the options for different reasons, but if I stop on every little point, this article will never be finished. So If it's interesting, feel free to ask.
> I would give up already here. I don't understand why you need to suffer that much just to use such a great tool
### App of Apps
It's actually pretty much the same thing. But instead of applying `Application` manifests one by one, you will create an additional `Application` manifests, that `ArgoCD` will use to generate others.
```YAML
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: applications
namespace: argocd
spec:
destination:
namespace: argo
server: https://kubernetes.default.svc
project: system
source:
path: ./manifests/cluster1
repoURL: $YOUR_GIT_REPO
targetRevision: main
```
You will create 4 manifests, one for each cluster and apply them. And when you push your (Not App of Apps) `Application` manifests to the main branch, `ArgoCD` will do something about it. It doesn't solve anything, as I see. You still have an ugly amount of `yaml` files, but also you have 4 additional that are not so huge. This concept might simplify the deployment process, but also it will steal a certain amount of control from you. Because now you're not responsible for deploying applications, but Argo.
> I think that GitOps and other automations are important, and it's the only way to do development right now, but you're probably hired as a DevOps Engineer or SRE, or whoever. You're supposed to be able to do something apart from pushing to git. You can't hand over all the responsibility git and pipelines and live a happy life. Once you will have to execute `kubectl edit deployment` and then you won't be happy if `ArgoCD` decide to rewrite your changes right after they are applied, because you're not following the Git Flow. You need to have control, and that's why you're paid. Not because you can edit `yaml` files
### ApplicationSets
It's a nice concept. *In theory*. You create one manifest for all applications in a cluster, or even one manifest for all applications across your clusters. The unique one, that will work everywhere. I won't provide an example, sorry, but you can do a lot of templating there, so one manifests will work for four clusters and will decrease amount of code. I'm using `ApplicationSets` myself, for my personal stuff, where I don't have any kind of obligations, and no one will sue me for breaking everything down. And actually I've done the breaking thing not so long ago. I'm not blaming `ArgoCD` for that, it was entirely my fault. But let's see what I've done. And let me know (anyhow) if you were able to spot the problem before the `kubectl apply` happened.
#### My file structure
I have one `ApplicationSet` for helm releases, that looks like that:
./helm-releases.yaml
```YAML
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: helm-releases
namespace: argo-system
spec:
generators:
- git:
repoURL: git@github.com:allanger/my-repo.git
revision: HEAD
files:
- path: "releases/*"
template:
metadata:
name: "{{ argo.application }}"
namespace: argo-system
spec:
project: "{{ argo.project }}"
source:
path: "{{ argo.path }}"
helm:
valueFiles:
- values.yaml
values: |-
{{ values }}
repoURL: "{{ chart.repo }}"
targetRevision: "{{ chart.version }}"
chart: "{{ chart.name }}"
destination:
server: "{{ argo.cluster }}"
namespace: "{{ argo.namespace }}"
ignoreDifferences:
- group: admissionregistration.k8s.io
kind: ValidatingWebhookConfiguration
jqPathExpressions:
- .webhooks[]?.clientConfig.caBundle
- .webhooks[]?.failurePolicy
```
And a certain amount of generators files in the `./releases` folder. I'm using the first approach, like this:
```YAML
argo:
cluster: https://kubernetes.default.svc
application: cert-manager
project: system
namespace: cert-manager
path: .
chart:
version: 1.10.1
name: cert-manager
repo: https://charts.jetstack.io
values: |
...
```
I don't like having values here, and when `ArgoCD` 2.6 were released, I've decided to try multi-source applications, so I've created a new directory: `./releases_v2`, and a new `ApplicationSet` manifests
./helm-releases-v2.yaml:
```YAML
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: helm-releases
namespace: argo-system
spec:
generators:
- git:
repoURL: git@github.com:allanger/argo-deployment.git
revision: HEAD
files:
- path: "releases_v2/*"
template:
metadata:
name: "{{ argo.application }}"
namespace: argo-system
spec:
project: "{{ argo.project }}"
sources:
- path: "./values"
repoURL: git@github.com:allanger/argo-deployment.git
ref: values
- path: "{{ argo.path }}"
helm:
valueFiles:
- "$values/values/{{ chart.name }}/values.yaml"
repoURL: "{{ chart.repo }}"
targetRevision: "{{ chart.version }}"
chart: "{{ chart.name }}"
destination:
server: "{{ argo.cluster }}"
namespace: "{{ argo.namespace }}"
ignoreDifferences:
- group: admissionregistration.k8s.io
kind: ValidatingWebhookConfiguration
jqPathExpressions:
- .webhooks[]?.clientConfig.caBundle
- .webhooks[]?.failurePolicy
```
And executed `kubectl apply -f helm-releases-v2.yaml`
And for some reason `ArgoCD` stopped responding. And, actually, everything were gone. Nothing were left in my cluster. And after I realized what I've done: "How am I a DevOps engineer after all?". *In case you wonder, I was able to safe 100% of important persistent data that were there, and all the workload were back in 15 minutes, but still...*
One of the most important thing about your infrastructure is its sustainability. And if you happen to have a setup like this in you company, hire a junior engineer and he/she/they makes the same mistake, you have no right to punish him/her/them (I'm sorry if I'm not writing it right, I just don't know how to), on the contrary, you need to punish yourself, that you were able to build something that is so easy to destroy. And I know that there are options to avoid resources destruction when `Applications` or `ApplicationSet` are gone, or that you need to use `argocd` and not `kubectl` to manage these resources *(and I don't agree with that at all)*. But I think that adding additional fields to manifests to preserve resources that are eventually created by an operator after applying a CR manifests is rather not obvious and dangerous out of the box. When I need something to be reliable, I'd rather have a more complicated and less obvious, or maybe not automated at all, process for removing it.
> You'd rather think twice before executing `rm -rf ./something`, than do `git push` and wait until it's executed automatically, wouldn't you?
But `ApplicationSets` are not bad. I'm still using them, but now with additional fields, so I'm not afraid to remove everything accidentally. And yet it's not perfect. Because without multi-source applications they don't make any sense for bigger projects, than a Minecraft server that is used by 4 guys, *unless you're vendoring helm charts, of course*
Even when multi-source apps have a full support, and I can move values files to real values files, there is still no way to do `argocd appset diff`, and I'm aware of [this github issue](https://github.com/argoproj/argo-cd/issues/10895#issuecomment-1423566000). And you can read my concerns about server-side rendering implementation, that they want to implement, there.
So let's assume that cli supports multi-source apps and applications sets can be diffed, and your server is not overloaded when 1000 manifests are being rendered on each pipeline run just for diffing, and [helm repos are not DDoSed](https://todoist.com/app/project/2232733866/task/6521274379) *(Because it's not nice to DDoS something that is used by a huge amount of users across the world)*. And you're added all the fields to manifests to make your infra reliable. Sounds nice!
But there is one more problem that I see. What many teams don't think about, is that they, as a team, provide services to other teams. So, if you have clusters: `cluster-production`, `cluster-development`, `cluster-demo`, and `cluster-devops`, where should you deploy infra changes first? I think a lot of you could say, to the `cluster-development`, because it's not facing real customers at least. And... I totally don't agree. You're the team that provide other teams with services, and your real customers are those teams. Of course, you won't treat the production environment the same way you treat the development environment, but it's still not a playground for you. It's a playground for developers, that should be stable and reliable for them. I'm sure that there are many ways to handle it. But I think, that you should have one more cluster, a `cluster-infra-test`. Where you will deliver your changes first. And where you can test your changes before they affect other teams. So, it's a 5th `ArgoCD` with a very similar setup *(actually, the setup must be repeating all the other setups so you're sure you're testing what's going to be delivered later)*. And with the `ApplicationSet` and, for example, git generators that are pointed to the main branch on "production" environments (`cluster-production`, `cluster-development`, `cluster-demo`, and `cluster-devops`), but here changes must come not only from the main, but also from other branches *(assuming that your workflow is something like this: cut a branch, update the infra code, create a pull requests, and merge)*, because you need to test anything before it's in the main branch. So you have either a very complicated `ApplicationSet` *(I'm not even sure, that it's possible to do with templates)*, or you have different manifests for the test and the rest, so you have to remember updating both every time one is updated, or you have an additional step in a pipeline, that will get the `ApplicationSet` from the `cluster-infra-test` and add a new branch to generators *(because you must not overwrite and break test environments that are created by another members of your team)*
### Really?
Are you ready to go through all of this just to use Argo? Is there really nothing that can stop you from doing that? I was even tired of writing this post. I was stubborn, and I wanted to use the best `GitOps Kubernetes` tool, and I went through all of this, I was trying to convince others that it's cool. Just a little amount of work, and we're happy `Argo` users. But looking back, all I can say, just use [Helmfile](https://github.com/helmfile/helmfile)! `ArgoCD` is literally not solving any issue that `Helmfile` can solve (when it comes to the infrastructure deployment). And with a huge amount of work and compromises you can achieve a result that will be close to what you would have with a proper `helmfile` configuration (that is extremely easy and reliable).
Later I will create a repo where I show all the examples with configuration and CI/CD for different `ArgoCD` approaches and a `helmfile`. And so if you don't trust me now, you'll be able to see a difference or try to convince me, that I'm wrong.
> And using `helmfile`, I will install `ArgoCD` to my clusters, of course, because it's an awesome tool, without any doubts. But don't manage your infrastructure with it, because it's a part of your infrastructure, and it's a service that you provide to other teams. And I'll talk about in one of the next posts.
Thanks,
Oi!
---