This repository has been archived on 2024-10-01. You can view files and clone it, but cannot push or open issues or pull requests.
badhouseplants-net-old/content/posts/dont-use-argocd-for-infrastructure/index.md
2023-04-17 17:37:45 +02:00

21 KiB

title date draft ShowToc cover
Don't use ArgoCD for your infrastructure 2023-02-09T12:47:32+01:00 false true
image caption relative responsiveImages
cover.png ArgoCD false false

Of course, it's just a clickbait title. Use whatever works for you. I will just describe why I wouldn't use ArgoCD for the infrastructure

Prelude

ArgoCD is an incredibly popular tool and I see that many DevOps guys (I know that it's not a job definition, but I feel like it's the best description that everybody can understand) want to use everywhere. I wasn't an exception, but I've just changed my mind. I still think that ArgoCD is cool, and you need to use it, but not for the infrastructure.

But why?

One more prelude

Let's assume you are a team that is providing something as a service to other teams. Even if you're the only one member, it doesn't matter. And let's assume you're working with Kubernetes or you plan to work with it, otherwise I'm not sure why you would even read the post.

It's very common to use separated clusters for different teams, customers, applications, etc. Let's say you have 3 clusters

3 clusters and you

Setups may be different, you can use different clusters for different products, environments, teams, or you can have your own opinion on how to split workload between clusters. But these (in our case) 3 clusters are used directly by other teams. Also, you may want to have a cluster for providing services, let's assume, your company decided to use Gitea as a git provider, and you deployed it to Kubernetes. It may be a very controversial example, but I'm not talking about what should run in K8s and what shouldn't, so if you can think of any other thing, that is supposed to be used across the whole company (GitLab Runners, Bitwarden, ElasticSearch, etc...). So it's already 4 clusters. Let's call the fourth cluster a DevOps Cluster

3 Clusters and gitea

I assume you need to have some common stuff deployed to each cluster, let's think of (Prometheus, Grafana and Loki).

And now you need to decide how to deploy it. You may have already known about ArgoCD, or you decided to look for Best Practices and found a lot about ArgoCD. And it sounds perfect. Everybody tends to use it. You can find a lot of information everywhere. People are helpful. GitHub repo is well-maintained.

Why Argo CD?

Application definitions, configurations, and environments should be declarative and version controlled. Application deployment and lifecycle management should be automated, >auditable, and easy to understand.

And now you need first deliver the ArgoCD itself and later start delivering everything with ArgoCD.

Let's first talk about how to deliver Argo. There are different options. For example, you can have one main installation in the Devops Cluster and use it to manage other clusters. That sounded good to me when I first heard about it. But I wanted to have all configuration as code, and to add other clusters to the main Argo you need to use the argocd cli, so it's either an addition step in the CI/CD, or a manual work. I didn't like both options, because I wanted to avoid adding scripts to pipelines, and manual work just wasn't an option. And also it's not very transparent anymore where all the applications in target clusters are coming from (or maybe I just couldn't find, I'd rather think that I was dumb). One more thing is that you obviously can't have several K8s resources with the same name in one namespace, so every Application must have a different name. I don't like long names, so it looks ugly to me. Especially, when you cluster have long names, like "the-first-product-production", and your application looks like "the-first-product-production-grafana". And you don't have to use the cluster name for the application, for sure, but you would like to have some logic there. And this logic must be as obvious as possible. But anyway, these are three main issues that I've faced, and that I can't live with, so here comes the second way to deliver Argo, install it to each cluster.

So I would go with 4 ArgoCD installations. So the first step is to install it, that is not a problem at all, there are many ways to do it. And after it's installed, we need to start delivering other applications. I'm aware of 3 ways of doing it:

  1. Use Application manifests for applications
  2. Use Application manifests to manage Application manifests from repo (the App of Apps pattern, or something like that)
  3. Use ApplicationSet manifests to make ArgoCD render Application manifests and apply them

Application

First option is really straightforward, isn't. All we need to do is to create manifests. ArgoCD devs have just published the versions 2.6 with multi-source applications support. But currently I can't say it's usable. The main issue for me is that the argocd cli doesn't work with them, that makes the whole thing pointless to me. Without cli I can't implement CD, then I see no reason to use them at all. I could use the AutoSync option, but I won't do that, and later I'll come back to this point and describe why, maybe in the next post, or later. So I can't use multi-source applications right now. Let's look at the list of applications that I need to install one more time:

To all clusters:

  • Prometheus
  • Grafana
  • Loki To the DevOps cluster only:
  • Gitea

There are many ways to install applications to K8s. But actually, I think, that there is only one real way: helm. Why? Because each of those applications are a huge amount of manifests, that you need to combine, install and maintain. You probably won't write those manifests yourself. There are other options to install apps, but all of them seem super complicated. And I doubt that you want to spend 8 hours per day editing yaml files. At least I don't, so I'm choosing helm.

I need to say that I'm not 100% happy with helm. There are some issues that seem very important to me, but it's good enough to use it. But maybe we can talk about them later.

Let's try the first approach (Application for an application) First, package

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prometheus
  namespace: argocd 
spec:
  destination:
    namespace: monitoring # Let's not touch namespace management this time. Let's assume we already solved this issue
    server: https://kubernetes.default.svc
  project: monitoring
  source:
    chart: kube-prometheus-stack
    helm:
      valueFiles:
      - values.yaml
    path: .
    repoURL: https://prometheus-community.github.io/helm-charts
    targetRevision: 45.0.0

But what about values? Single-source application will not be able to find values files in your repo if you use a remote chart, so you have two options (that I'm aware of)

  1. Add values directly to you source like this:
spec.source.helm.values: |
    you-values: here    
  1. Create a CMP for handling helm packages and values

Second way is good, but complicated. Because it's a self-written tool that you should implement, that should work with argo, that you should maintain, and without any guarantees that it will keep working after ArgoCD is updated. I was using Argo with custom CMP, it's no fun.

But anyway, the Application way is not scalable, because you will have to create a manifest for each cluster, and is not secure, because you can't easily encrypt the data. Also, if you've seen values for kube-prometheus-stack, you know that they are huge. So now you have 4 huge manifests with unencrypted secrets. And it's only for one app, so it probably looks like this:

manifests/
 cluster1/
   prometheus.yaml
   loki.yaml
   grafana.yaml
 cluster2/
   prometheus.yaml
   loki.yaml
   grafana.yaml
 cluster3/
   prometheus.yaml
   loki.yaml
   grafana.yaml
 cluster-devops/
   prometheus.yaml
   loki.yaml
   grafana.yaml
   gitea.yaml

In my experience, each Application like this with a proper configuration will contain about 150 - 200 lines of code, so you have about 1950 - 1600 lines of code to install 4 applications. One of them is really special, and others will most probably will have only several lines that are not duplicating, e.g. for ingress and passwords.

I think it's not a way to go. To solve this problem, many guys save charts to the same git repo where they store values, using helm-freeze for example. So it looks like

helm-freeze.yaml
vendored_charts/
  prometheus/...
  grafana/...
  loke/...
  gitea/...
manifests/
 cluster1/
   prometheus.yaml
   loki.yaml
   grafana.yaml
 cluster2/
   prometheus.yaml
   loki.yaml
   grafana.yaml
 cluster3/
   prometheus.yaml
   loki.yaml
   grafana.yaml
 cluster-devops/
   prometheus.yaml
   loki.yaml
   grafana.yaml
   gitea.yaml
values/
  prometheus/...
  grafana/...
  loki/...
  gitea/...

Yes, now you can use values from files, you can encrypt secrets and your Applications are not that huge anymore. But I'm strongly against vendoring external charts. Why? First, it's my ideology, briefly, if you don't trust packagers, you don't use their packages. Vendoring charts into a git repo also means that you need to add a manual step to download them. With helm-freeze, for example, you need to execute helm-freeze sync. It's either pre-commit hook, or a manual execution, or a step in CI/CD. I don't like all the options for different reasons, but if I stop on every little point, this article will never be finished. So If it's interesting, feel free to ask.

I would give up already here. I don't understand why you need to suffer that much just to use such a great tool

App of Apps

It's actually pretty much the same thing. But instead of applying Application manifests one by one, you will create an additional Application manifests, that ArgoCD will use to generate others.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: applications
  namespace: argocd 
spec:
  destination:
    namespace: argo
    server: https://kubernetes.default.svc
  project: system
  source:
    path: ./manifests/cluster1
    repoURL: $YOUR_GIT_REPO
    targetRevision: main

You will create 4 manifests, one for each cluster and apply them. And when you push your (Not App of Apps) Application manifests to the main branch, ArgoCD will do something about it. It doesn't solve anything, as I see. You still have an ugly amount of yaml files, but also you have 4 additional that are not so huge. This concept might simplify the deployment process, but also it will steal a certain amount of control from you. Because now you're not responsible for deploying applications, but Argo.

I think that GitOps and other automations are important, and it's the only way to do development right now, but you're probably hired as a DevOps Engineer or SRE, or whoever. You're supposed to be able to do something apart from pushing to git. You can't hand over all the responsibility git and pipelines and live a happy life. Once you will have to execute kubectl edit deployment and then you won't be happy if ArgoCD decide to rewrite your changes right after they are applied, because you're not following the Git Flow. You need to have control, and that's why you're paid. Not because you can edit yaml files

ApplicationSets

It's a nice concept. In theory. You create one manifest for all applications in a cluster, or even one manifest for all applications across your clusters. The unique one, that will work everywhere. I won't provide an example, sorry, but you can do a lot of templating there, so one manifests will work for four clusters and will decrease amount of code. I'm using ApplicationSets myself, for my personal stuff, where I don't have any kind of obligations, and no one will sue me for breaking everything down. And actually I've done the breaking thing not so long ago. I'm not blaming ArgoCD for that, it was entirely my fault. But let's see what I've done. And let me know (anyhow) if you were able to spot the problem before the kubectl apply happened.

My file structure

I have one ApplicationSet for helm releases, that looks like that:

./helm-releases.yaml

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: helm-releases
  namespace: argo-system
spec:
  generators:
    - git:
        repoURL: git@github.com:allanger/my-repo.git
        revision: HEAD
        files:
          - path: "releases/*"
  template:
    metadata:
      name: "{{ argo.application }}"
      namespace: argo-system
    spec:
      project: "{{ argo.project }}"
      source:
        path: "{{ argo.path }}"
        helm:
          valueFiles:
            - values.yaml
          values: |-
            {{ values }}            
        repoURL: "{{ chart.repo }}"
        targetRevision: "{{ chart.version }}"
        chart: "{{ chart.name }}"
      destination:
        server: "{{ argo.cluster }}"
        namespace: "{{ argo.namespace }}"
      ignoreDifferences:
        - group: admissionregistration.k8s.io
          kind: ValidatingWebhookConfiguration
          jqPathExpressions:
            - .webhooks[]?.clientConfig.caBundle
            - .webhooks[]?.failurePolicy

And a certain amount of generators files in the ./releases folder. I'm using the first approach, like this:

argo:
  cluster: https://kubernetes.default.svc
  application: cert-manager
  project: system
  namespace: cert-manager
  path: .
chart:
  version: 1.10.1
  name: cert-manager
  repo: https://charts.jetstack.io
values: |
...

I don't like having values here, and when ArgoCD 2.6 were released, I've decided to try multi-source applications, so I've created a new directory: ./releases_v2, and a new ApplicationSet manifests

./helm-releases-v2.yaml:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: helm-releases
  namespace: argo-system
spec:
  generators:
    - git:
        repoURL: git@github.com:allanger/argo-deployment.git
        revision: HEAD
        files:
          - path: "releases_v2/*"
  template:
    metadata:
      name: "{{ argo.application }}"
      namespace: argo-system
    spec:
      project: "{{ argo.project }}"
      sources:
        - path: "./values"
          repoURL: git@github.com:allanger/argo-deployment.git
          ref: values
        - path: "{{ argo.path }}"
          helm:
            valueFiles:
             - "$values/values/{{ chart.name }}/values.yaml"
          repoURL: "{{ chart.repo }}"
          targetRevision: "{{ chart.version }}"
          chart: "{{ chart.name }}"
      destination:
        server: "{{ argo.cluster }}"
        namespace: "{{ argo.namespace }}"
      ignoreDifferences:
        - group: admissionregistration.k8s.io
          kind: ValidatingWebhookConfiguration
          jqPathExpressions:
            - .webhooks[]?.clientConfig.caBundle
            - .webhooks[]?.failurePolicy

And executed kubectl apply -f helm-releases-v2.yaml

And for some reason ArgoCD stopped responding. And, actually, everything were gone. Nothing were left in my cluster. And after I realized what I've done: "How am I a DevOps engineer after all?". In case you wonder, I was able to safe 100% of important persistent data that were there, and all the workload were back in 15 minutes, but still...

One of the most important thing about your infrastructure is its sustainability. And if you happen to have a setup like this in you company, hire a junior engineer and he/she/they makes the same mistake, you have no right to punish him/her/them (I'm sorry if I'm not writing it right, I just don't know how to), on the contrary, you need to punish yourself, that you were able to build something that is so easy to destroy. And I know that there are options to avoid resources destruction when Applications or ApplicationSet are gone, or that you need to use argocd and not kubectl to manage these resources (and I don't agree with that at all). But I think that adding additional fields to manifests to preserve resources that are eventually created by an operator after applying a CR manifests is rather not obvious and dangerous out of the box. When I need something to be reliable, I'd rather have a more complicated and less obvious, or maybe not automated at all, process for removing it.

You'd rather think twice before executing rm -rf ./something, than do git push and wait until it's executed automatically, wouldn't you?

But ApplicationSets are not bad. I'm still using them, but now with additional fields, so I'm not afraid to remove everything accidentally. And yet it's not perfect. Because without multi-source applications they don't make any sense for bigger projects, than a Minecraft server that is used by 4 guys, unless you're vendoring helm charts, of course

Even when multi-source apps have a full support, and I can move values files to real values files, there is still no way to do argocd appset diff, and I'm aware of this github issue. And you can read my concerns about server-side rendering implementation, that they want to implement, there.

So let's assume that cli supports multi-source apps and applications sets can be diffed, and your server is not overloaded when 1000 manifests are being rendered on each pipeline run just for diffing, and helm repos are not DDoSed (Because it's not nice to DDoS something that is used by a huge amount of users across the world). And you're added all the fields to manifests to make your infra reliable. Sounds nice!

But there is one more problem that I see. What many teams don't think about, is that they, as a team, provide services to other teams. So, if you have clusters: cluster-production, cluster-development, cluster-demo, and cluster-devops, where should you deploy infra changes first? I think a lot of you could say, to the cluster-development, because it's not facing real customers at least. And... I totally don't agree. You're the team that provide other teams with services, and your real customers are those teams. Of course, you won't treat the production environment the same way you treat the development environment, but it's still not a playground for you. It's a playground for developers, that should be stable and reliable for them. I'm sure that there are many ways to handle it. But I think, that you should have one more cluster, a cluster-infra-test. Where you will deliver your changes first. And where you can test your changes before they affect other teams. So, it's a 5th ArgoCD with a very similar setup (actually, the setup must be repeating all the other setups so you're sure you're testing what's going to be delivered later). And with the ApplicationSet and, for example, git generators that are pointed to the main branch on "production" environments (cluster-production, cluster-development, cluster-demo, and cluster-devops), but here changes must come not only from the main, but also from other branches (assuming that your workflow is something like this: cut a branch, update the infra code, create a pull requests, and merge), because you need to test anything before it's in the main branch. So you have either a very complicated ApplicationSet (I'm not even sure, that it's possible to do with templates), or you have different manifests for the test and the rest, so you have to remember updating both every time one is updated, or you have an additional step in a pipeline, that will get the ApplicationSet from the cluster-infra-test and add a new branch to generators (because you must not overwrite and break test environments that are created by another members of your team)

Really?

Are you ready to go through all of this just to use Argo? Is there really nothing that can stop you from doing that? I was even tired of writing this post. I was stubborn, and I wanted to use the best GitOps Kubernetes tool, and I went through all of this, I was trying to convince others that it's cool. Just a little amount of work, and we're happy Argo users. But looking back, all I can say, just use Helmfile! ArgoCD is literally not solving any issue that Helmfile can solve (when it comes to the infrastructure deployment). And with a huge amount of work and compromises you can achieve a result that will be close to what you would have with a proper helmfile configuration (that is extremely easy and reliable).

Later I will create a repo where I show all the examples with configuration and CI/CD for different ArgoCD approaches and a helmfile. And so if you don't trust me now, you'll be able to see a difference or try to convince me, that I'm wrong.

And using helmfile, I will install ArgoCD to my clusters, of course, because it's an awesome tool, without any doubts. But don't manage your infrastructure with it, because it's a part of your infrastructure, and it's a service that you provide to other teams. And I'll talk about in one of the next posts.

Thanks,

Oi!