Skip to main content

Command Palette

Search for a command to run...

Short-lived OIDC for CI: kill every long-lived GitHub Actions token

AWS OIDC, GCP WIF, Azure federated credentials

Published
8 min read
Short-lived OIDC for CI: kill every long-lived GitHub Actions token
S

A versatile DevSecOps Engineer specialized in creating secure, scalable, and efficient systems that bridge development and operations. My expertise lies in automating complex processes, integrating AI-driven solutions, and ensuring seamless, secure delivery pipelines. With a deep understanding of cloud infrastructure, CI/CD, and cybersecurity, I thrive on solving challenges at the intersection of innovation and security, driving continuous improvement in both technology and team dynamics.


GitHub OIDC to AWS/GCP/Azure federated credentials killed the need for long-lived PATs in CI. Most orgs still have PATs in use in 2026. Short-lived OIDC is the one-day win.

Narrative arc

The PAT failure modes -> the OIDC-to-cloud federation pattern -> scope-down per workflow -> the quarterly audit loop.

What most people believe (and why it's wrong)

"We rotate PATs every 90 days." 90 days is ~86,300 minutes for an attacker to exfil.

Rotation is better than no rotation. The frame shift is that OIDC federation makes the token lifetime minutes, not months, for every cloud destination a workflow touches. The effort to switch is lower than most teams estimate.

The timeline / evidence

  • 2024-09, CSA + Astrix publish NHI survey: credential leakage, stale access, and undermanaged OAuth apps.

  • 2026-Q1, NHI Reality Report: avg enterprise has 250,000 NHIs, 71% not rotated in policy window, 97% excessive privilege.

  • 2026-Q1, Ratios: 40-100:1 NHI-to-human in enterprise, 144:1 in cloud-native, 500:1 in hyper-automated orgs.

  • 2026-04-23, Red Hat ships zero-trust workload identity manager on OpenShift using upstream SPIRE.

  • 2026, OWASP NHI Top 10 (draft) formalizes ownership, rotation, scope-minimization, and attestation as platform controls.

The decision tree / matrix / runbook

  1. Does every cloud destination (AWS/GCP/Azure) accept OIDC from your CI?

  2. Is the trust policy scoped per repo and per workflow file?

  3. Is the session duration under 1 hour, ideally 15 minutes?

  4. Is there an audit of remaining long-lived PATs? Quarterly sweep.

  5. Is the migration tracked as a platform KPI with a deadline?

The reference architecture

The short-lived OIDC pattern lets every workflow mint a 15-minute cloud credential bound to the workflow's OIDC identity. No long-lived PATs in org secrets; no static keys on disk.

Architecture notes:

  • GitHub Actions id-token: write permission per workflow.

  • AWS IAM trust policy scoped to repo + workflow file.

  • GCP Workload Identity Federation pool + provider per CI identity.

  • Azure federated credentials per workflow.

  • Quarterly sweep of long-lived tokens; alarm on new ones.

Decision tree

Real-world example, how this plays out in production

The setup. Falconnet Banking (challenger bank) discovered the 2026 NHI ratio the hard way: a leaked GitHub Actions PAT. Every long-lived cloud token is deleted in three weeks. The team treated identity as a platform product, not a ticket queue: SPIFFE as the substrate, IaaS attestation as the trust root, OIDC as the cloud bridge, cert-manager and SPIRE for rotation, and a Backstage catalogue for ownership. Migration was per-workload; the legacy bridge sidecar carried the workloads that could not yet read from the Workload API.

The lesson the team wrote on the whiteboard. Identity in a 144-to-1 world is platform engineering; rotation is automation, not on-call. This piece walks through the SPIRE deployment, the federation setup per cloud, the rotation pipeline, and the tests that proved a workload could obtain a verifiable identity at startup with no static secret.

End-to-end implementation guide

A precise build order from zero to production with the manifests and scripts the team actually shipped. Every block below corresponds to a file in code/
So you can read each step in isolation, then run the suite together.

Step 1: Stand up SPIRE with the cloud node attestor

SPIRE is the upstream SPIFFE implementation. The server config below trusts the cloud's metadata service to attest each node and issues an SVID per workload. Production runs SPIRE in HA with three replicas; the agent is a DaemonSet.

# Short-lived OIDC for CI: kill every long-lived GitHub Actions token
# SPIRE server config enabling IaaS-level node attestation + OIDC federation.
server:
  bind_address: "0.0.0.0"
  bind_port: "8081"
  trust_domain: "example.com"
  data_dir: "/var/lib/spire/server"

plugins:
  NodeAttestor "aws_iid":
    plugin_data:
      region: "us-east-1"
  NodeAttestor "gcp_iit":
    plugin_data:
      projectid_allow_list: ["my-gcp-project"]
  NodeAttestor "azure_msi":
    plugin_data:
      tenants:
        "00000000-0000-0000-0000-000000000000":
          resource_id: "https://acme.com/spire"

  KeyManager "aws_kms":
    plugin_data:
      region: "us-east-1"
      key_policy_file: "/run/spire/kms-policy.json"

Step 2: Federate SPIFFE into AWS, GCP, and Azure IAM

SPIRE exposes an OIDC discovery endpoint; each cloud's IAM trusts it as an external identity provider. The Terraform module below wires the AWS half; the equivalent for GCP and Azure ships in the same module set. The result: one SPIFFE ID maps to a narrow IAM role per cloud.

resource "aws_iam_openid_connect_provider" "spire" {
  url             = "https://oidc.spire.example.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["..."]
}
resource "aws_iam_role" "workload" {
  name = "workload-payments"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Federated = aws_iam_openid_connect_provider.spire.arn }
      Action    = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = { "oidc.spire.example.com:sub" = "spiffe://example.com/payments" }
      }
    }]
  })
}

Step 3: Replace long-lived GitHub Actions tokens with short-lived OIDC

Every cloud destination accepts OIDC from GitHub Actions in 2026. The workflow below mints a 15-minute AWS credential per run; no secret is ever stored in the repo. Quarterly sweeps audit and delete any surviving long-lived PATs.

# Short-lived OIDC for CI: kill every long-lived GitHub Actions token
# GitHub Actions: short-lived OIDC to AWS, no long-lived PATs.
name: deploy
on: { push: { branches: [main] } }
permissions:
  id-token: write
  contents: read
jobs:
  apply:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/gha-deploy
          aws-region: us-east-1
          role-duration-seconds: 900
      - run: aws sts get-caller-identity
# Tech components referenced: GitHub Actions OIDC, AWS IAM Roles with OIDC trust, GCP Workload Identity Federation, Azure federated credentials, actions/attest-build-provenance, Vault JWT auth method.

Step 4: Wire cert-manager and SPIRE for 24-hour rotation

Rotation is a platform service, not a ticket. cert-manager handles Kubernetes-side certs; SPIRE handles workload SVIDs with a sub-hour TTL. The legacy bridge sidecar below carries workloads that still want a .env-shaped secret.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata: { name: api-mtls, namespace: payments }
spec:
  secretName: api-mtls
  duration: 24h
  renewBefore: 8h
  privateKey:
    rotationPolicy: Always
    algorithm: ECDSA
    size: 256
  issuerRef: { name: spire-ca, kind: ClusterIssuer }
  dnsNames: [api.payments.svc]

Step 5: Surface ownership in the platform catalogue

An NHI without an owner is a 2028 incident. The Backstage catalogue entry per NHI captures owner, class, rotation cadence, and platform API endpoints (whoOwns, expiresAt). On reorgs, the ownership graph re-resolves; expired ownership is a blocking condition for new deploys.

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
  name: payments-api-svid
  annotations:
    nhi.class: workload
    nhi.rotation: 24h
    nhi.spiffe-id: "spiffe://example.com/payments"
spec:
  type: nhi
  owner: payments-team
  lifecycle: production

Testing the implementation

The test plan that gates the rollout. Run each in a non-production cluster first; expand to staging once the green-path tests pass, and the negative tests reject the bad input the way the policy says they will.

Test 1: Workload obtains an SVID at startup

kubectl exec -n payments deploy/api -- /opt/spire-agent/bin/spire-agent api fetch

Expected: Returns a valid SVID with the SPIFFE ID spiffe://example.com/payments.

Test 2: Federated AWS role assumed without a static key

kubectl exec -n payments deploy/api -- aws sts get-caller-identity

Expected: Assumes the federated role; ARN ends in :assumed-role/workload-payments/....

Test 3: Cert auto-rotates without a restart

kubectl get cert api-mtls -n payments -o jsonpath='{.status.renewalTime}'

Expected: Renewal time is within 8 hours; pod uptime unchanged across renewal.

Tech components

GitHub Actions OIDC, AWS IAM Roles with OIDC trust, GCP Workload Identity Federation, Azure federated credentials, actions/attest-build-provenance, Vault JWT auth method.

Production observability and gotchas

  • Track NHIs without an owner weekly; the target is zero.

  • SVID issuance rate per workload; a workload not refreshing within TTL is misconfigured.

  • Federated-role assumption count vs static-key fallbacks; static-key use is an exception.

  • Cert-renewal failures; an expired cert is the loudest possible failure mode.

  • Quarterly audit of the ownership graph against IdP group membership; reorg drift is the common cause.

Failure modes

  1. Trust policy uses a wildcard subject claim; any workflow in the org assumes the role. Scope to repo:org/repo:ref:refs/heads/main or tighter.

  2. Session duration is 12h by default; attacker wins a long window. Set session duration to 15 minutes minimum viable.

  3. Legacy tool requires static AWS keys; team reintroduces a PAT. Document exception with sunset date; sandbox the tool.

When NOT to do this

Read-only CI jobs with no cloud-state changes may be acceptable without OIDC, but these are rare. For any workflow that writes to cloud state, OIDC is the default.

What to ship this quarter

  • Migrate every AWS / GCP / Azure call to short-lived OIDC.

  • Scope trust policies per repo and per workflow.

  • Set session duration to 15 minutes minimum.

  • Audit and delete long-lived PATs org-wide.

  • Track remaining long-lived tokens as a weekly platform KPI.

Further reading

  1. GitHub Docs, Security hardening with OpenID Connect, The primary reference.

  2. AWS Docs, Configuring OpenID Connect in Amazon Web Services, The AWS trust-policy reference.

  3. GCP Docs, Workload Identity Federation, The GCP federation reference.

  4. references.md

S08: Identity for the 144-to-1 World

Part 1 of 1

Non-human identities outnumber humans 144 to 1 in Q1 2026. Service accounts, agents, bots, sidecars, CI jobs. Your IdP was built for a 1:1 world. The old patterns (long-lived PATs, static .env, keyless-only-at-the-edge) fall apart at this scale. Identity is a platform product, not an IT ticket. SPIFFE as the lingua franca, short-lived OIDC as the cloud bridge, IaaS attestation as the trust root, cert-manager as the rotation service. This series wires the full stack.