Debugging Argo CD and OIDC logins
Over the past years a constant reoccurring pain in my daily job is when we have spun up a fresh kubernetes cluster and need to sign into Argo CD for the first time. We are greeted with the following failure:
failed to get token: oauth2: “invalid client” “invalid client credentials.”
We know the issue is with Argo CD itself since our Identity Provider is used for logging with 5+ different applications within the same cluster.
Now I have lost hope that this login issue will be resolved by updating Argo CD, at least without someone (me, I guess) pinpointing the root cause.
Let me take you through my journey of discovery or you can skip to The Root Cause.
The Test Setup
First I made an empty kind cluster and started building a minimum setup to make an OIDC login into Argo CD where I used the following products:
- Argo CD
- OIDC identity provider | Dex
- Ingress | ingress-nginx
This means this simple cluster looks like this:
Knowing I would need to run and re-run setups, teardowns and tests many, many, many times to know if the issue was consistently reproduced I figured it would be a good idea to organize everything in a Makefile.
Some of code snippets in this post will have variables that when run by make
is replaced by actual values by envsubst
. This is to make sure they are the same value across all of the configuration.
Example variables:
$ARGO
- The host for Argo CD$CLIENT_SECRET
- A shared secret
Let us review the rest of the relevant configuration before we mimic the OIDC login flow.
Dex
Dex is a widely used OpenID Connect Provider that can be configured with static clients and static user logins which is beneficial for our test setup.
The installation uses its Helm chart and for completeness you may review the values file used below.
View Dex values file
envFrom:
- secretRef:
name: dex-client-secrets
config:
issuer: http://$DEX
staticClients:
- id: $CLIENT_ID
name: ArgoCD
secretEnv: CLIENT_SECRET
redirectURIs:
- http://$ARGO/auth/callback
enablePasswordDB: true
staticPasswords:
- email: "admin@example.com"
# bcrypt hash of the string "password": $(echo password | htpasswd -BinC 10 admin | cut -d: -f2)
hash: "$2a$10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
username: "admin"
userID: "08a8684b-db88-4b73-90a9-3cd1661f5466"
oauth2:
skipApprovalScreen: true
passwordConnector: local
storage:
type: sqlite3
config:
file: /var/dex/dex.db
Argo CD
Argo CD is installed using its Helm chart and with mostly default values. The important non-default settings are:
- Disabling the builtin Dex
- Providing OIDC configuration to use the Dex we installed
That makes the values file look like this:
configs:
params:
server.insecure: true
cm:
url: http://$ARGO
admin.enabled: false
oidc.config: |
name: Dex
issuer: http://$DEX
clientID: $CLIENT_ID
clientSecret: $argocd-client-secrets:clientSecret
requestedScopes:
- openid
- profile
- email
- groups
dex:
enabled: false
Kind
It was not trouble-free to setup Kind. You must configure port mapping to make anything inside the Kind cluster available on the outside.
You could use port forwarding instead instead of configuring Kind, however managing those ports becomes painful when you want to run multiple make
recipes to run test cases.
The easier solution was to create the Kind cluster with the following configuration:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
Ingress
There was a guide to setting up ingress on Kind and we only needed simple host mapping. You can review the ingress manifests below.
View ingress manifests
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: argocd
namespace: argocd
spec:
rules:
- host: $ARGO
http:
paths:
- pathType: ImplementationSpecific
backend:
service:
name: argocd-server
port:
number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: dex
namespace: dex
spec:
rules:
- host: $DEX
http:
paths:
- pathType: ImplementationSpecific
backend:
service:
name: dex
port:
number: 5556
Client Secrets
For the OIDC login flow to function both Argo CD and Dex will need to know a shared client secret and both applications can be configured to read this client secret from Kubernetes secrets.
The Makefile contains a variable that will be used as the shared secret to generate two secrets with the same value to configure both applications.
For instance the secret provided for Dex looks like this:
apiVersion: v1
kind: Secret
metadata:
name: dex-client-secrets
namespace: dex
type: Opaque
stringData:
CLIENT_SECRET: $CLIENT_SECRET
Bypass HTTPS/SSL requirements
It is common to run into issues with SSL when testing locally because you want to keep everything simple while you debug. It is also common for there to be a way to turn the SSL requirement off.
Currently we only have port 80
/http
mapped in our Kind cluster. The ingress happily routes traffic on port 80
towards our applications. Dex does not - at least not per default - enforce using SSL. Argo CD needs to be configured to allow using unencrypted http
. This is done with the following line in its values file:
server.insecure: true
Bypass DNS requirements
Both applications we have set up are routed to using the hostnames dex.host
and argocd.host
and these hostnames do not exist in any DNS anywhere.
From the Outside
I wanted to make sure running the make recipes for testing purposes would work in isolation therefore adding the hostnames to the /etc/hosts
file was a no-go.
It was a good thing that I went looking for a different way to handle hostnames because I learned a new thing about curl
from this list of name resolving tricks.
The --resolve
option can override resolving a hostname:port
combination to a specific address (or addresses). This means I can run curl command towards the Argo CD running in my Kind cluster with the following command:
curl -v \
--resolve argocd.host:80:127.0.0.1 \
http://argocd.host/
Running the curl in verbose mode lets you know that the option simply adds fake records to the DNS cache for the current execution with the following logging output:
* Added argocd.host:80:127.0.0.1 to DNS cache
* Hostname argocd.host was found in DNS cache
* Trying 127.0.0.1:80...
* Connected to argocd.host (127.0.0.1) port 80
...
On the Inside
As you will see later ArgoCD needs to resolve the hostname for Dex internally in the Kind cluster. We do not have much choice on the inside of the cluster and will need to manipulate CoreDNS.
Thankfully manipulating CoreDNS is as easy as providing a custom ConfigMap
with its configuration.
We grab and expand the current configuration to make CoreDNS resolve the hostname for Dex as the service named http.dex.svc
. See the highlighted line 14
in this ConfigMap
:
|
|
Note that http.dex.svc
is a service we will have to add ourselves. You can review the service manifest below.
View service manifest
apiVersion: v1
kind: Service
metadata:
name: http
namespace: dex
spec:
selector:
app.kubernetes.io/instance: dex
app.kubernetes.io/name: dex
ports:
- appProtocol: http
name: http
port: 80
protocol: TCP
targetPort: http
Mimicking OIDC Login Flow
After preparing the setup and configuration we are now ready to run tests. Obviously I ran the setup and test recipes many times to get the setup and configuration working too.
My very first step before debugging this issue was to find some way of using curl
to mimic OIDC logins and I had found this StackOverflow answer. I was able to piece together the commands that leads to a token using the curl
commands from that answer (in verbose mode) supported by inspection of logins into an Argo CD in the wild.
I am letting the diagram below do most of the explaining with HTTP verbs and paths, but there will be more explanation after the diagram.
Diagram additional explanations
#2 The cookies must be captured since Argo CD uses them for verification in #8.
#5 The html for the login form must be captured since we need to POST to the action endpoint of that form in #6.
#6 This step includes the username and password in the POST payload.
#9 This step was the problematic DNS lookup that required tinkering with CoreDNS.
#11 The cookies must be captured since they now contain the JWT.
Conclusion
At this point while getting the OIDC login in a working state I had already stumbled over the problem that leads to the failure message:
failed to get token: oauth2: “invalid client” “invalid client credentials.”
The root cause I found is only relevant when Argo CD is configured to use “SSO clientSecret with secret references”, see Argo CD documentation for details.
We are referencing such a secret at line 12
in the Argo CD values file.
|
|
The problem is that the secrets and config maps with a app.kubernetes.io/part-of: argocd
label is queried and replaced into Argo CD configuration only under certain conditions:
- Initially during startup and is cached afterwards
- CRUD operations related to the list of cluster managed
- Performing CLI operations
You may check my setup and test cases by reviewing the Makefile I have been referencing, or run its recipes:
# start out working and then break it
make working test
make break
make test
# start out broken and then fix it
make broken test
make fix
make test
Show Makefile
cluster := debugging-argocd
argo := argocd.host
dex := dex.host
argoVersion := 7.4.4
dexVersion := 0.19.1
ingressVersion := 4.11.2
clientId := argo
clientSecret := some-secret-here
tmp := /tmp/$(cluster)
jar := $(tmp)/cookie.jar
kubectl := kubectl --context kind-$(cluster)
curl := curl -sf --resolve $(argo):80:127.0.0.1 --resolve $(dex):80:127.0.0.1 --cookie $(jar) --cookie-jar $(jar)
_ := $(shell mkdir -p $(tmp) ; find $(tmp) -delete -mindepth 1)
verify-deps:
which docker kind kubectl helm curl jq envsubst base64 > /dev/null
clean: verify-deps
-kind -q delete cluster --name $(cluster)
-find $(tmp) -delete -mindepth 1
_setup: verify-deps clean
ARGO=$(argo) DEX=$(dex) CLIENT_ID=$(clientId) CLIENT_SECRET=$(clientSecret) envsubst '$$ARGO,$$DEX,$$CLIENT_ID,$$CLIENT_SECRET' < values/argocd.yaml > $(tmp)/values-argocd.yaml
ARGO=$(argo) DEX=$(dex) CLIENT_ID=$(clientId) CLIENT_SECRET=$(clientSecret) envsubst '$$ARGO,$$DEX,$$CLIENT_ID,$$CLIENT_SECRET' < values/dex.yaml > $(tmp)/values-dex.yaml
ARGO=$(argo) DEX=$(dex) CLIENT_ID=$(clientId) CLIENT_SECRET=$(clientSecret) envsubst '$$ARGO,$$DEX,$$CLIENT_ID,$$CLIENT_SECRET' < manifests/ingress.yaml > $(tmp)/ingress.yaml
ARGO=$(argo) DEX=$(dex) CLIENT_ID=$(clientId) CLIENT_SECRET=$(clientSecret) envsubst '$$ARGO,$$DEX,$$CLIENT_ID,$$CLIENT_SECRET' < manifests/coredns.yaml > $(tmp)/coredns.yaml
ARGO=$(argo) DEX=$(dex) CLIENT_ID=$(clientId) CLIENT_SECRET=$(clientSecret) envsubst '$$ARGO,$$DEX,$$CLIENT_ID,$$CLIENT_SECRET' < manifests/dex-secret.yaml > $(tmp)/dex-secret.yaml
ARGO=$(argo) DEX=$(dex) CLIENT_ID=$(clientId) CLIENT_SECRET=$(clientSecret) envsubst '$$ARGO,$$DEX,$$CLIENT_ID,$$CLIENT_SECRET' < manifests/argocd-secret.yaml > $(tmp)/argocd-secret.yaml
curl -sfL https://raw.githubusercontent.com/kubernetes/ingress-nginx/helm-chart-$(ingressVersion)/deploy/static/provider/kind/deploy.yaml > $(tmp)/ingress-nginx.yaml
helm template argocd argo-cd --version $(argoVersion) --repo https://argoproj.github.io/argo-helm -n argocd -f $(tmp)/values-argocd.yaml --create-namespace > $(tmp)/argocd.yaml
helm template dex dex --version $(dexVersion) --repo https://charts.dexidp.io -n dex -f $(tmp)/values-dex.yaml --create-namespace > $(tmp)/dex.yaml
kind -q create cluster --config kind.config --name $(cluster)
@echo
$(kubectl) create namespace ingress-nginx
$(kubectl) create namespace dex
$(kubectl) create namespace argocd
@echo
$(kubectl) apply -ningress-nginx -f $(tmp)/ingress-nginx.yaml
sleep 5 # `kubectl wait` requires the resource to exist
$(kubectl) wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=90s
@echo
$(kubectl) apply --filename $(tmp)/ingress.yaml
$(kubectl) apply --filename $(tmp)/coredns.yaml
$(kubectl) rollout restart -n kube-system deployment/coredns
@echo
$(kubectl) apply --filename $(tmp)/dex-secret.yaml
$(kubectl) apply --filename manifests/dex-service.yaml
$(kubectl) apply -ndex -f $(tmp)/dex.yaml
sleep 5 # `kubectl wait` requires the resource to exist
$(kubectl) wait -n dex \
--for=condition=Ready pod \
--selector=app.kubernetes.io/name=dex \
--timeout=90s
@echo
working: _setup
$(kubectl) apply --filename $(tmp)/argocd-secret.yaml
$(kubectl) apply -nargocd -f $(tmp)/argocd.yaml
sleep 5 # `kubectl wait` requires the resource to exist
$(kubectl) wait -n argocd \
--for=condition=Ready pod \
--selector=app.kubernetes.io/name=argocd-server \
--timeout=90s
@sleep 2
@echo
broken: _setup
$(kubectl) apply -nargocd -f $(tmp)/argocd.yaml
sleep 5 # `kubectl wait` requires the resource to exist
$(kubectl) wait -n argocd \
--for=condition=Ready pod \
--selector=app.kubernetes.io/name=argocd-server \
--timeout=90s
$(kubectl) apply --filename $(tmp)/argocd-secret.yaml
@sleep 2
@echo
test:
touch $(jar)
@echo
$(curl) -Lo $(tmp)/login.html http://$(argo)/auth/login
grep -o 'action="[^"]*"' < $(tmp)/login.html | cut -d\" -f2 | sed 's/&/\&/g' > $(tmp)/path
@echo
$(curl) -D $(tmp)/header.log -XPOST -d "login=admin@example.com&password=password" "http://$(dex)$$(cat $(tmp)/path)"
grep ^Location $(tmp)/header.log | cut -d' ' -f2 | tr -d '\r' > $(tmp)/endpoint
@echo
$(curl) -o /dev/null "$$(cat $(tmp)/endpoint)"
@echo
grep argocd.token $(jar) | cut -f7- | tee $(tmp)/token
@echo
@echo Token payload:
(cut -d. -f2 < $(tmp)/token|tr -d '\n'; echo '===') | base64 -d | jq
@echo
fix:
$(kubectl) rollout restart -nargocd deployment
$(kubectl) rollout restart -nargocd sts
_break:
@head -c 12 /dev/random | base64 | base64 > $(tmp)/random-secret
break: _break
$(kubectl) patch -nargocd secret argocd-client-secrets --type='json' -p='[{"op" : "replace" ,"path" : "/data/clientSecret" ,"value" : "$(shell cat $(tmp)/random-secret)"}]'
$(kubectl) patch -ndex secret dex-client-secrets --type='json' -p='[{"op" : "replace" ,"path" : "/data/CLIENT_SECRET" ,"value" : "$(shell cat $(tmp)/random-secret)"}]'
$(kubectl) rollout restart -ndex deployment
logs:
$(kubectl) logs -nargocd -l app.kubernetes.io/name=argocd-server --since=1m
$(kubectl) logs -ndex -l app.kubernetes.io/name=dex --since=1m
The Root Cause
Summarized, Argo CD does not detect if any of your secrets have changed.
Therefore it is imperative that all secrets are created prior to installing Argo CD or to restart the Argo CD deployments when secrets are updated.
Not really a satisfying solution, but good enough for now.