GKE Autopilot and Workload Identity

4 min readFeb 1, 2023

Workload Identity enables GKE workloads to impersonate IAM service accounts, allowing them to access Google Cloud services.

The easiest way to test out Workload identity is using a GKE Autopilot cluster which provides sane, production-ready defaults, one of which is to turn on Workload identity on the cluster. This post will go over a quick Workload identity demo with a GKE Autopilot-based workload.

Workload Identity Need

Consider a GKE cluster with one application running in it. This application needs to access a Google Cloud Storage Bucket:

An application would use a service account with appropriate IAM roles to access the Cloud Storage buckets. Prior to Workload Identity the way to expose this service account to an application running inside a GKE pod was the following:

Create a Google Cloud Service Account, grant it the required roles, then export the service key as a json file. Mount this json file as a config map in pods and use the credentials. There is a huge problem with this approach though as the keys cannot be easily rotated and has to be managed in a very combursome way
Using the underlying Compute engine service accounts — this approach is not granular, every pod on a node would end up using the same credentials.

This is where Workload Identity fits in and solves the issues of the previous approaches — the credentials are granular to the specific workload in the pod, credentials are not directly exposed to the workload and thus can be rotated easily.

Basics

Google Cloud Services Clients running on a Google Cloud application runtime like GCE, Cloud Function, Cloud Run etc typically retrieve their credentials using a metadata server.

Metadata server responds to an endpoint with a hostname of “metadata.google.internal”, a call to retrieve the token for a service account from the metadata server looks something like this:

curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token

Workload identity ultimately enables a Metadata server to be available to GKE workloads, this way an application can use the familiar approach of retrieving its credentials from the Metadata server and calling the Google Cloud Services.

Since GKE is a Kubernetes distribution, it understands a Kubernetes Service Account, however, GCP services understand a IAM Service Account. Workload Identity maps one to the other and exposes the GCP service account via a Metadata server visible to the pod.

All the logic to map the Kubernetes Service Account to IAM service account is embedded in the metadata server which is provided as a daemonset and resolves calls to “metadata.google.internal” hostname.

Mapping Process

Alright, so how does the Metadata Server know how to map the Kubernetes Service Account to an IAM Service Account. This is done through some explicit mappings:

By providing an explicit annotation on the Kubernetes Service Account pointing to the Google Service Account:

kubectl annotate serviceaccount sample-ksa \
    --namespace default \
    iam.gke.io/gcp-service-account=sample-gsa@myproj.iam.gserviceaccount.com

This alone is not sufficient however, the metadata server needs a way to pull the token of the IAM Service Account, this is done by creating a policy binding between allowing the Kubernetes service account to retrieve a token on behalf of the IAM Service Account:

gcloud iam service-accounts add-iam-policy-binding sample-gsa@myproj.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:myproj.svc.id.goog[default/sample-ksa]"

Finally, the IAM Service account needs to have the right roles to invoke the Google Cloud Services, say if a Cloud Storage bucket needs to be listed then the appropriate role can be set this way:

gcloud projects add-iam-policy-binding myproj \
    --member "serviceAccount:sample-gsa@myproj.iam.gserviceaccount.com" \
    --role "role/storage.viewer"

Demo

1. Let’s start by provisioning an Autopilot cluster:

gcloud container clusters create-auto workload-demo-cluster \
    --region us-west1 \
    --project=my-proj

2. Create an IAM Service Account, with permission to list Cloud Storage buckets:

gcloud iam service-accounts create sample-gsa \
    --project=myproj

gcloud projects add-iam-policy-binding myproj \
    --member "serviceAccount:sample-gsa@myproj.iam.gserviceaccount.com" \
    --role "roles/storage.admin"

3. Create a Kubernetes Service Account:

kubectl create serviceaccount sample-ksa

4. Annotate the Kubernetes service account with the IAM Service Account, and allow the Kubernetes service account permission to get token on behalf of the IAM Service Account:

kubectl annotate serviceaccount sample-ksa \
    iam.gke.io/gcp-service-account=sample-gsa@myproj.iam.gserviceaccount.com

gcloud iam service-accounts add-iam-policy-binding sample-gsa@myproj.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:myproj.svc.id.goog[default/sample-ksa]"

5. Now spin up a pod with the associated Kubernetes Service Account:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: workload-identity-test
spec:
  containers:
  - image: google/cloud-sdk:slim
    name: workload-identity-test
    command: ["sleep","infinity"]
  serviceAccountName: sample-ksa
EOF

6. Connect to the pod, the gcloud auth list command shows that the pod is indeed associated with the right IAM Service Account:

kubectl exec -it workload-identity-test \
  -- /bin/bash

root@workload-identity-test:/# gcloud auth list
                    Credentialed Accounts
ACTIVE  ACCOUNT
*       sample-gsa@myproj.iam.gserviceaccount.com

7. Ensure that the pod can list the contents of a storage bucket:

gcloud storage ls

Conclusion

At the end of the day, the underlying implementation of Workload Identity is complicated with the mechanics of Metadata server, how it exchanges the Kubernetes Service Account for a IAM Service Account, however, all this is abstracted out well in a GKE Autopilot environment. All that needs to be done from a Workloads perspective is the right roles and a simple mapping via annotation and you are set.