Start following the GitFLow
21 KiB
title | date | draft | ShowToc | cover | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Don't use ArgoCD for your infrastructure | 2023-02-09T12:47:32+01:00 | false | true |
|
Of course, it's just a clickbait title. Use whatever works for you. I will just describe why I wouldn't use
ArgoCD
for the infrastructure
Prelude
ArgoCD
is an incredibly popular tool and I see that many DevOps guys (I know that it's not a job definition, but I feel like it's the best description that everybody can understand) want to use everywhere. I wasn't an exception, but I've just changed my mind. I still think that ArgoCD
is cool, and you need to use it, but not for the infrastructure.
But why?
One more prelude
Let's assume you are a team that is providing something as a service to other teams. Even if you're the only one member, it doesn't matter. And let's assume you're working with Kubernetes
or you plan to work with it, otherwise I'm not sure why you would even read the post.
It's very common to use separated clusters for different teams, customers, applications, etc. Let's say you have 3 clusters
Setups may be different, you can use different clusters for different products, environments, teams, or you can have your own opinion on how to split workload between clusters. But these (in our case) 3 clusters are used directly by other teams. Also, you may want to have a cluster for providing services, let's assume, your company decided to use Gitea as a git
provider, and you deployed it to Kubernetes. It may be a very controversial example, but I'm not talking about what should run in K8s and what shouldn't, so if you can think of any other thing, that is supposed to be used across the whole company (GitLab Runners, Bitwarden, ElasticSearch, etc...). So it's already 4 clusters. Let's call the fourth cluster a DevOps Cluster
I assume you need to have some common stuff deployed to each cluster, let's think of (Prometheus, Grafana and Loki).
And now you need to decide how to deploy it. You may have already known about ArgoCD
, or you decided to look for Best Practices and found a lot about ArgoCD
. And it sounds perfect. Everybody tends to use it. You can find a lot of information everywhere. People are helpful. GitHub repo is well-maintained.
Why Argo CD?
Application definitions, configurations, and environments should be declarative and version controlled. Application deployment and lifecycle management should be automated, >auditable, and easy to understand.
And now you need first deliver the ArgoCD
itself and later start delivering everything with ArgoCD
.
Let's first talk about how to deliver Argo. There are different options. For example, you can have one main installation in the Devops Cluster
and use it to manage other clusters. That sounded good to me when I first heard about it. But I wanted to have all configuration as code, and to add other clusters to the main Argo
you need to use the argocd cli
, so it's either an addition step in the CI/CD, or a manual work. I didn't like both options, because I wanted to avoid adding scripts to pipelines, and manual work just wasn't an option. And also it's not very transparent anymore where all the applications in target clusters are coming from (or maybe I just couldn't find, I'd rather think that I was dumb). One more thing is that you obviously can't have several K8s
resources with the same name in one namespace, so every Application
must have a different name. I don't like long names, so it looks ugly to me. Especially, when you cluster have long names, like "the-first-product-production", and your application looks like "the-first-product-production-grafana". And you don't have to use the cluster name for the application, for sure, but you would like to have some logic there. And this logic must be as obvious as possible. But anyway, these are three main issues that I've faced, and that I can't live with, so here comes the second way to deliver Argo
, install it to each cluster.
So I would go with 4 ArgoCD
installations. So the first step is to install it, that is not a problem at all, there are many ways to do it. And after it's installed, we need to start delivering other applications. I'm aware of 3 ways of doing it:
- Use
Application
manifests for applications - Use
Application
manifests to manageApplication
manifests from repo (the App of Apps pattern, or something like that) - Use
ApplicationSet
manifests to makeArgoCD
renderApplication
manifests and apply them
Application
First option is really straightforward, isn't. All we need to do is to create manifests. ArgoCD
devs have just published the versions 2.6 with multi-source
applications support. But currently I can't say it's usable. The main issue for me is that the argocd
cli doesn't work with them, that makes the whole thing pointless to me. Without cli I can't implement CD, then I see no reason to use them at all. I could use the AutoSync
option, but I won't do that, and later I'll come back to this point and describe why, maybe in the next post, or later. So I can't use multi-source applications right now. Let's look at the list of applications that I need to install one more time:
To all clusters:
- Prometheus
- Grafana
- Loki To the DevOps cluster only:
- Gitea
There are many ways to install applications to K8s
. But actually, I think, that there is only one real way: helm. Why? Because each of those applications are a huge amount of manifests, that you need to combine, install and maintain. You probably won't write those manifests yourself. There are other options to install apps, but all of them seem super complicated. And I doubt that you want to spend 8 hours per day editing yaml
files. At least I don't, so I'm choosing helm.
I need to say that I'm not 100% happy with helm. There are some issues that seem very important to me, but it's good enough to use it. But maybe we can talk about them later.
Let's try the first approach (Application
for an application)
First, package
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prometheus
namespace: argocd
spec:
destination:
namespace: monitoring # Let's not touch namespace management this time. Let's assume we already solved this issue
server: https://kubernetes.default.svc
project: monitoring
source:
chart: kube-prometheus-stack
helm:
valueFiles:
- values.yaml
path: .
repoURL: https://prometheus-community.github.io/helm-charts
targetRevision: 45.0.0
But what about values? Single-source application will not be able to find values files in your repo if you use a remote chart, so you have two options (that I'm aware of)
- Add values directly to you source like this:
spec.source.helm.values: |
you-values: here
- Create a CMP for handling helm packages and values
Second way is good, but complicated. Because it's a self-written tool that you should implement, that should work with argo, that you should maintain, and without any guarantees that it will keep working after ArgoCD
is updated. I was using Argo with custom CMP, it's no fun.
But anyway, the Application
way is not scalable, because you will have to create a manifest for each cluster, and is not secure, because you can't easily encrypt the data. Also, if you've seen values for kube-prometheus-stack
, you know that they are huge. So now you have 4 huge manifests with unencrypted secrets. And it's only for one app, so it probably looks like this:
manifests/
cluster1/
prometheus.yaml
loki.yaml
grafana.yaml
cluster2/
prometheus.yaml
loki.yaml
grafana.yaml
cluster3/
prometheus.yaml
loki.yaml
grafana.yaml
cluster-devops/
prometheus.yaml
loki.yaml
grafana.yaml
gitea.yaml
In my experience, each Application
like this with a proper configuration will contain about 150 - 200 lines of code, so you have about 1950 - 1600 lines of code to install 4 applications. One of them is really special, and others will most probably will have only several lines that are not duplicating, e.g. for ingress and passwords.
I think it's not a way to go. To solve this problem, many guys save charts to the same git repo where they store values, using helm-freeze for example. So it looks like
helm-freeze.yaml
vendored_charts/
prometheus/...
grafana/...
loke/...
gitea/...
manifests/
cluster1/
prometheus.yaml
loki.yaml
grafana.yaml
cluster2/
prometheus.yaml
loki.yaml
grafana.yaml
cluster3/
prometheus.yaml
loki.yaml
grafana.yaml
cluster-devops/
prometheus.yaml
loki.yaml
grafana.yaml
gitea.yaml
values/
prometheus/...
grafana/...
loki/...
gitea/...
Yes, now you can use values from files, you can encrypt secrets and your Applications
are not that huge anymore. But I'm strongly against vendoring external charts. Why? First, it's my ideology, briefly, if you don't trust packagers, you don't use their packages. Vendoring charts into a git repo also means that you need to add a manual step to download them. With helm-freeze, for example, you need to execute helm-freeze sync
. It's either pre-commit hook, or a manual execution, or a step in CI/CD. I don't like all the options for different reasons, but if I stop on every little point, this article will never be finished. So If it's interesting, feel free to ask.
I would give up already here. I don't understand why you need to suffer that much just to use such a great tool
App of Apps
It's actually pretty much the same thing. But instead of applying Application
manifests one by one, you will create an additional Application
manifests, that ArgoCD
will use to generate others.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: applications
namespace: argocd
spec:
destination:
namespace: argo
server: https://kubernetes.default.svc
project: system
source:
path: ./manifests/cluster1
repoURL: $YOUR_GIT_REPO
targetRevision: main
You will create 4 manifests, one for each cluster and apply them. And when you push your (Not App of Apps) Application
manifests to the main branch, ArgoCD
will do something about it. It doesn't solve anything, as I see. You still have an ugly amount of yaml
files, but also you have 4 additional that are not so huge. This concept might simplify the deployment process, but also it will steal a certain amount of control from you. Because now you're not responsible for deploying applications, but Argo.
I think that GitOps and other automations are important, and it's the only way to do development right now, but you're probably hired as a DevOps Engineer or SRE, or whoever. You're supposed to be able to do something apart from pushing to git. You can't hand over all the responsibility git and pipelines and live a happy life. Once you will have to execute
kubectl edit deployment
and then you won't be happy ifArgoCD
decide to rewrite your changes right after they are applied, because you're not following the Git Flow. You need to have control, and that's why you're paid. Not because you can edityaml
files
ApplicationSets
It's a nice concept. In theory. You create one manifest for all applications in a cluster, or even one manifest for all applications across your clusters. The unique one, that will work everywhere. I won't provide an example, sorry, but you can do a lot of templating there, so one manifests will work for four clusters and will decrease amount of code. I'm using ApplicationSets
myself, for my personal stuff, where I don't have any kind of obligations, and no one will sue me for breaking everything down. And actually I've done the breaking thing not so long ago. I'm not blaming ArgoCD
for that, it was entirely my fault. But let's see what I've done. And let me know (anyhow) if you were able to spot the problem before the kubectl apply
happened.
My file structure
I have one ApplicationSet
for helm releases, that looks like that:
./helm-releases.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: helm-releases
namespace: argo-system
spec:
generators:
- git:
repoURL: git@github.com:allanger/my-repo.git
revision: HEAD
files:
- path: "releases/*"
template:
metadata:
name: "{{ argo.application }}"
namespace: argo-system
spec:
project: "{{ argo.project }}"
source:
path: "{{ argo.path }}"
helm:
valueFiles:
- values.yaml
values: |-
{{ values }}
repoURL: "{{ chart.repo }}"
targetRevision: "{{ chart.version }}"
chart: "{{ chart.name }}"
destination:
server: "{{ argo.cluster }}"
namespace: "{{ argo.namespace }}"
ignoreDifferences:
- group: admissionregistration.k8s.io
kind: ValidatingWebhookConfiguration
jqPathExpressions:
- .webhooks[]?.clientConfig.caBundle
- .webhooks[]?.failurePolicy
And a certain amount of generators files in the ./releases
folder. I'm using the first approach, like this:
argo:
cluster: https://kubernetes.default.svc
application: cert-manager
project: system
namespace: cert-manager
path: .
chart:
version: 1.10.1
name: cert-manager
repo: https://charts.jetstack.io
values: |
...
I don't like having values here, and when ArgoCD
2.6 were released, I've decided to try multi-source applications, so I've created a new directory: ./releases_v2
, and a new ApplicationSet
manifests
./helm-releases-v2.yaml:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: helm-releases
namespace: argo-system
spec:
generators:
- git:
repoURL: git@github.com:allanger/argo-deployment.git
revision: HEAD
files:
- path: "releases_v2/*"
template:
metadata:
name: "{{ argo.application }}"
namespace: argo-system
spec:
project: "{{ argo.project }}"
sources:
- path: "./values"
repoURL: git@github.com:allanger/argo-deployment.git
ref: values
- path: "{{ argo.path }}"
helm:
valueFiles:
- "$values/values/{{ chart.name }}/values.yaml"
repoURL: "{{ chart.repo }}"
targetRevision: "{{ chart.version }}"
chart: "{{ chart.name }}"
destination:
server: "{{ argo.cluster }}"
namespace: "{{ argo.namespace }}"
ignoreDifferences:
- group: admissionregistration.k8s.io
kind: ValidatingWebhookConfiguration
jqPathExpressions:
- .webhooks[]?.clientConfig.caBundle
- .webhooks[]?.failurePolicy
And executed kubectl apply -f helm-releases-v2.yaml
And for some reason ArgoCD
stopped responding. And, actually, everything were gone. Nothing were left in my cluster. And after I realized what I've done: "How am I a DevOps engineer after all?". In case you wonder, I was able to safe 100% of important persistent data that were there, and all the workload were back in 15 minutes, but still...
One of the most important thing about your infrastructure is its sustainability. And if you happen to have a setup like this in you company, hire a junior engineer and he/she/they makes the same mistake, you have no right to punish him/her/them (I'm sorry if I'm not writing it right, I just don't know how to), on the contrary, you need to punish yourself, that you were able to build something that is so easy to destroy. And I know that there are options to avoid resources destruction when Applications
or ApplicationSet
are gone, or that you need to use argocd
and not kubectl
to manage these resources (and I don't agree with that at all). But I think that adding additional fields to manifests to preserve resources that are eventually created by an operator after applying a CR manifests is rather not obvious and dangerous out of the box. When I need something to be reliable, I'd rather have a more complicated and less obvious, or maybe not automated at all, process for removing it.
You'd rather think twice before executing
rm -rf ./something
, than dogit push
and wait until it's executed automatically, wouldn't you?
But ApplicationSets
are not bad. I'm still using them, but now with additional fields, so I'm not afraid to remove everything accidentally. And yet it's not perfect. Because without multi-source applications they don't make any sense for bigger projects, than a Minecraft server that is used by 4 guys, unless you're vendoring helm charts, of course
Even when multi-source apps have a full support, and I can move values files to real values files, there is still no way to do argocd appset diff
, and I'm aware of this github issue. And you can read my concerns about server-side rendering implementation, that they want to implement, there.
So let's assume that cli supports multi-source apps and applications sets can be diffed, and your server is not overloaded when 1000 manifests are being rendered on each pipeline run just for diffing, and helm repos are not DDoSed (Because it's not nice to DDoS something that is used by a huge amount of users across the world). And you're added all the fields to manifests to make your infra reliable. Sounds nice!
But there is one more problem that I see. What many teams don't think about, is that they, as a team, provide services to other teams. So, if you have clusters: cluster-production
, cluster-development
, cluster-demo
, and cluster-devops
, where should you deploy infra changes first? I think a lot of you could say, to the cluster-development
, because it's not facing real customers at least. And... I totally don't agree. You're the team that provide other teams with services, and your real customers are those teams. Of course, you won't treat the production environment the same way you treat the development environment, but it's still not a playground for you. It's a playground for developers, that should be stable and reliable for them. I'm sure that there are many ways to handle it. But I think, that you should have one more cluster, a cluster-infra-test
. Where you will deliver your changes first. And where you can test your changes before they affect other teams. So, it's a 5th ArgoCD
with a very similar setup (actually, the setup must be repeating all the other setups so you're sure you're testing what's going to be delivered later). And with the ApplicationSet
and, for example, git generators that are pointed to the main branch on "production" environments (cluster-production
, cluster-development
, cluster-demo
, and cluster-devops
), but here changes must come not only from the main, but also from other branches (assuming that your workflow is something like this: cut a branch, update the infra code, create a pull requests, and merge), because you need to test anything before it's in the main branch. So you have either a very complicated ApplicationSet
(I'm not even sure, that it's possible to do with templates), or you have different manifests for the test and the rest, so you have to remember updating both every time one is updated, or you have an additional step in a pipeline, that will get the ApplicationSet
from the cluster-infra-test
and add a new branch to generators (because you must not overwrite and break test environments that are created by another members of your team)
Really?
Are you ready to go through all of this just to use Argo? Is there really nothing that can stop you from doing that? I was even tired of writing this post. I was stubborn, and I wanted to use the best GitOps Kubernetes
tool, and I went through all of this, I was trying to convince others that it's cool. Just a little amount of work, and we're happy Argo
users. But looking back, all I can say, just use Helmfile! ArgoCD
is literally not solving any issue that Helmfile
can solve (when it comes to the infrastructure deployment). And with a huge amount of work and compromises you can achieve a result that will be close to what you would have with a proper helmfile
configuration (that is extremely easy and reliable).
Later I will create a repo where I show all the examples with configuration and CI/CD for different ArgoCD
approaches and a helmfile
. And so if you don't trust me now, you'll be able to see a difference or try to convince me, that I'm wrong.
And using
helmfile
, I will installArgoCD
to my clusters, of course, because it's an awesome tool, without any doubts. But don't manage your infrastructure with it, because it's a part of your infrastructure, and it's a service that you provide to other teams. And I'll talk about in one of the next posts.
Thanks,
Oi!