Modern CI/CD pipelines are often treated as untouchable “trusted builds” – locked down by code review and best practices but that trust is a myth. A pipeline is a prime attack surface, containing everything an attacker needs: deployment keys, API tokens, container registries, test artifacts and implicit trust that code executed in the pipeline is safe. Attackers know this. Real incidents (SolarWinds, Codecov, large GitHub Actions supply-chain campaigns) have shown how a compromised build system becomes a stealthy delivery vehicle for malware and secret exfiltration. In short: a “compliant” pipeline can still be subverted from the inside.

This post takes the red-team viewpoint: real bypass paths (compromised runners, poisoned caches, dependency confusion, indirect pipeline poisoning), why they work and a concrete, cloud-agnostic redesign for GitHub Actions running Python microservices. You’ll get code, an implementation plan, testing approaches and a gripping real-world-inspired story to help this land with impact.

Quick summary (TL;DR for the impatient)

Treat the build environment as untrusted. Ephemeral runners, least privilege and strict workflow triggers reduce risk.
Caches, third-party actions and dependencies are blind trust boundaries that validate, sign or remove them for critical builds.
Use attestations (SLSA/in-toto), signed artifacts (cosign/Sigstore), SBOMs and reproducible builds to prove an artifact’s provenance.
Harden runners, avoid running unreviewed code on privileged runners and adopt runtime detection (Falco, Trivy) for defense in depth.

Red-team POV: real bypass paths

Attack Path #1 - Compromised Runners (Persistent Backdoors)

Self-hosted or shared runners are especially dangerous. If an attacker can make the runner execute untrusted code (through a malicious PR or specially crafted workflow trigger), they can persist on the host, harvest environment variables and secrets, tamper with build outputs and later push poisoned artifacts. Even ephemeral GitHub-hosted runners are safer only because they’re ephemeral; self-hosted runners can survive malicious actions and provide an attacker with ongoing access.

What attackers do: exfiltrate GITHUB_TOKEN or other secrets, install persistence (systemd/cron), modify artifacts mid-build or pivot into internal networks reachable from the runner.

Why it works: runners have access to code, build caches and sometimes cloud credentials. Workflows often run scripts and test suites with no strong isolation between "build logic" and "developer-supplied code".

Attack Path #2 - Poisoned Caches & Artifacts

Build caches and artifact storage are performance shortcuts and trust shortcuts. Many cache systems will extract an archive and place its contents into the workspace without validating each file’s origin or integrity. An attacker with temporary access can push a crafted cache archive that overwrites files or injects malicious files that later get executed.

What attackers do: obtain cache tokens or a write path, upload a malicious tarball (containing backdoors or altered dependencies), then wait for downstream builds to restore that cache and run the tainted content.

Why it works: cache restore steps and some action-marketplace items assume the cache is benign; there is little to no path-level validation when extracting.

Attack Path #3 - Dependency Confusion & Malicious Packages

If your pipeline pulls dependencies from remote registries (PyPI, npm, etc.) using ambiguous names, attackers can publish a public package that shadows your internal one. When building scripts or tests pip install mycorp-utils, the public malicious package can be fetched and executed sometimes via post-install hooks.

What attackers do: publish a malicious package with the same name as an internal package or craft a trojanized version of a transitive dependency.

Why it works: developers or CI scripts use permissive installation rules and don’t pin exact sources or hashes.

Attack Path #4 - Indirect Pipeline Poisoning (PPE)

Attackers can change artifacts that the pipeline executes without modifying the workflow itself. For example, if the pipeline runs pytest, an attacker who commits a malicious test will have their code executed by CI. The YAML looks clean; the real problem is that the runnable artifacts called by the YAML are controlled by code that may not be strictly reviewed.

What attackers do: commit scripts, tests or makefiles that contain exfiltration or persistence code; then the pipeline runs them as part of normal testing or packaging.

Why it works: workflows invoke project scripts without verifying the content or authorship of those scripts.

Why this goes viral (and scares execs)

Hacker framing: the attack is easy to explain as “the build baked a backdoor” It’s visceral and scary.
Security fear + practical fixes: readers want practical, implementable mitigations; this post gives them them from baked-in platform config to SLSA-level attestations.
Observable risk: artifacts are shipped and can be used to breach customers and partners; the stakes are huge which drives virality.

Concrete case study: Python microservices on GitHub Actions (cloud-agnostic)

Scenario: two microservices (service-auth and service-data) packaged as Docker containers; GitHub repo with main branch and PR workflow protections. The pipeline builds, tests, pushes images and deploys to a Kubernetes cluster (any cloud or on-prem).

Repo layout (example)

├── .github/workflows/ci-cd.yml
├── service-auth/
│   ├── Dockerfile
│   ├── app.py
│   ├── requirements.txt
│   └── tests/
├── service-data/
│   ├── Dockerfile
│   ├── main.py
│   ├── requirements.txt
│   └── tests/
└── k8s/
    ├── deployment-auth.yaml
    └── deployment-data.yaml

Vulnerable pipeline (what an attacker abuses)

A naive workflow:

name: CI/CD Pipeline
on:
  push:
    branches: [main]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker build -t myregistry/service-auth:latest ./service-auth
      - run: docker build -t myregistry/service-data:latest ./service-data
      - run: docker push myregistry/service-auth:latest
      - run: docker push myregistry/service-data:latest
      - run: kubectl apply -f k8s/

Weaknesses exploited: runs-on shared environment, no scanning, no signing, caches or transitive dependencies unchecked, main branch pushes may accept automated commits and tests/scripts are executed without provenance.

Redesigned secure pipeline; principles first

Ephemeral & Isolated Runners
- Prefer GitHub-hosted runners (runs-on: ubuntu-latest) for sensitive jobs.
- If self-hosted runners are necessary, host them in isolated ephemeral VMs/containers with daily rebuilds and no network access to internal systems.
Least Privilege & Narrow Permissions
- Use the permissions: field in GitHub Actions to restrict tokens.
- Avoid exposing secrets to PRs from forks. Require maintainers to trigger privileged jobs manually or via protected labels.
Pin & Verify Action Versions
- Pin marketplace actions to immutable commits or release SHAs (avoid floating @v3 where possible).
- Maintain an allowlist of trusted actions.
No Blind Cache Restore for Critical Paths
- Avoid using cache restore for anything that could alter build behaviour in security-sensitive pipelines or scope cache keys narrowly and validate contents.
- Consider disabling caches for the critical path; accept slower builds for stronger security.
Dependency Verification
- Pin dependencies and use hash verification in requirements.txt (pip supports --require-hashes).
- Use private package registries or package proxying (e.g., mirror PyPI internally).
- Implement dependency reviews for new packages.
Artifact Signing & Attestations
- Sign built images with cosign (Sigstore) and publish signatures to a transparency log (Rekor).
- Generate SBOMs and store them alongside the artifact.
- Adopt SLSA/in-toto-style attestations tying the artifact back to the exact build inputs.
Split Build & Deploy
- Build artifacts in one pipeline and store them as signed immutable images (by digest).
- Run a separate, approved release pipeline that only takes signed artifacts and deploys them.
Runtime Detection
- Use runtime security (Falco) and container scanning (Trivy) to detect anomalies in running workloads.

Implementation: step-by-step

Below is a focused, practical pipeline that implements the secure recommendations. It’s opinionated and intended as a starting point.

A - Harden GitHub Actions workflow (ci-cd.yml)

name: Secure CI/CD

on:
  push:
    branches: [ main ]

permissions:
  contents: read
  id-token: write       # for OIDC token exchange to cloud (no long-lived creds)
  packages: write

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

jobs:
  build:
    name: Build, Test, Sign and Attest
    runs-on: ubuntu-latest
    environment: ci
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install build deps
        run: |
          python -m pip install --upgrade pip setuptools wheel

      - name: Run unit tests (isolated)
        run: |
          python -m pytest ./service-auth/tests -q
          python -m pytest ./service-data/tests -q

      - name: Build container images (immutable tag by digest)
        run: |
          docker build -t myregistry/service-auth:ci-${{ github.sha }} ./service-auth
          docker build -t myregistry/service-data:ci-${{ github.sha }} ./service-data

      - name: Scan images with Trivy
        uses: aquasecurity/trivy-action@v1
        with:
          image-ref: |
            myregistry/service-auth:ci-${{ github.sha }}
            myregistry/service-data:ci-${{ github.sha }}

      - name: Push images to registry
        uses: docker/build-push-action@v3
        with:
          push: true
          tags: |
            myregistry/service-auth:ci-${{ github.sha }}
            myregistry/service-data:ci-${{ github.sha }}

      - name: Sign images with cosign (keyless via OIDC)
        env:
          COSIGN_EXPERIMENTAL: "1"
        run: |
          cosign sign --keyless myregistry/service-auth:ci-${{ github.sha }}
          cosign sign --keyless myregistry/service-data:ci-${{ github.sha }}

      - name: Create SBOMs
        run: |
          # example using syft
          syft packages:docker:myregistry/service-auth:ci-\({{ github.sha }} -o spdx-json > sbom-auth-\){{ github.sha }}.json
          syft packages:docker:myregistry/service-data:ci-\({{ github.sha }} -o spdx-json > sbom-data-\){{ github.sha }}.json

      - name: Upload artifacts (SBOM & attestations)
        uses: actions/upload-artifact@v4
        with:
          name: sboms-and-attestations-${{ github.sha }}
          path: |
            sbom-auth-${{ github.sha }}.json
            sbom-data-${{ github.sha }}.json

      - name: Create in-toto attestation
        run: |
          # pseudo-command tie it into your in-toto workflow
          in-toto-record --step build --materials . --products "myregistry/service-auth@sha256:..." --subject "sha256:${{ github.sha }}"

  promote:
    name: Promote signed artifacts to production (manual & audited)
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment:
      name: production
      url: https://your-k8s-console.example
    permissions:
      contents: read
      id-token: write
      packages: read
    steps:
      - name: Download SBOM & Attestations
        uses: actions/download-artifact@v4
        with:
          name: sboms-and-attestations-${{ needs.build.outputs.sha }}

      - name: Verify cosign signatures & Rekor
        run: |
          cosign verify --keyless myregistry/service-auth:ci-${{ needs.build.outputs.sha }}
          cosign verify --keyless myregistry/service-data:ci-${{ needs.build.outputs.sha }}

      - name: Deploy (only signed images by digest)
        env:
          KUBECONFIG: ${{ secrets.KUBECONFIG }}
        run: |
          kubectl set image deployment/auth auth=myregistry/service-auth@sha256:<digest>
          kubectl set image deployment/data data=myregistry/service-data@sha256:<digest>
          kubectl rollout status deployment/auth
          kubectl rollout status deployment/data

Notes on the workflow above

We build once, sign the images with cosign using keyless OIDC (no long-lived signing keys required), generate SBOMs (using syft) and upload attestation artifacts for later verification.
Deployment is a separate job that verifies signatures before applying changes. This prevents “build-time tampering” from automatically reaching production.
Use fetch-depth: 0 so the build has full history for reproducibility checks where necessary.

B - Lockdown dependency installs (requirements.txt with hashes)

Use pip’s --require-hashes option to guarantee packages haven’t changed:

# requirements.txt
flask==2.0.1 \
    --hash=sha256:abcdef...
requests==2.25.1 \
    --hash=sha256:123456...

Install with: pip install --require-hashes -r requirements.txt

Generate hashes with pip-compile or pip hash when creating a lockfile.

C - Limit runner reach & secrets exposure

Use the permissions: block to limit GITHUB_TOKEN scopes.
For any job that uses a secret, prefer OIDC-based short-lived tokens (GitHub OIDC) to obtain cloud credentials dynamically, rather than storing long-lived secrets in GitHub Secrets.
Disable secret access in workflows triggered by untrusted events.

D - Protect caches or avoid them for sensitive builds

If you must cache, scope the cache keys narrowly and validate the contents of restored caches before use.
For the highest assurance builds, disable cache restore and accept longer build times.

E - Runtime detection

Add Falco or similar runtime detection inside the cluster to watch for suspicious syscalls, spawned shells inside containers or unexpected outbound traffic:
- Falco rules can alert on curl/wget from unexpected processes or writes to /etc/cron* or creation of suspicious network connections.

Tests and validation techniques

Red-team your CI
- Spin up a disposable repo that simulates forks and PR flows; test whether a crafted PR can exfiltrate secrets or alter caches. Attempt to poison a cache and verify whether a later build restores malicious files. This is a pragmatic way to test proofs-of-concept.
Reproducible-build checks
- Rebuild the same commit twice in different clean environments and compare digests/hashes. If artifacts differ, investigate non-determinism sources.
Attestation verification
- For each released artifact, verify cosign signatures and Rekor entries. Write an automated check that fails CI if the Rekor proof is missing.
SBOM and vulnerability scanning
- Scan generated SBOMs for known CVEs and compare SBOMs across builds. Use Trivy as part of CI and a daily scheduled job.
Runtime anomaly detection
- Deploy Falco rules in staging and production. Trigger synthetic anomalies and confirm logging/alerting pipelines work.
Periodic key & secret audits
- Rotate any secret that ever was accessible to runners. Keep a tight inventory of service accounts with deploy permissions.

Real-world life story (Inspired by real incidents)

The Midnight Commit: How one build led to a breach

This story is inspired by real supply chain incidents (publicly reported SolarWinds and Codecov investigations and later GitHub Actions campaigns). Names and some details have been fictionalized for clarity.

It was 02:08 local time when an engineer on the on-call rotation noticed a small alert: the outbound network firewall had logged an odd POST to an external IP from a CI runner. The runner had just completed a nightly build of the company’s payment microservice. At first it looked like a failed telemetry call but the payload contained a blob that, when decoded, revealed multiple environment-like variables.

A frantic investigation followed. The team discovered that three weeks prior, an innocuous cache-key collision had allowed a malicious archive to be stored in the build cache. A contractor had merged a tiny patch into a tooling repo that populated a tarball on the main branch; the cache key was shared across repos. The attacker’s archive contained a small Python module that would, under CI execution, scan the environment for tokens and POST them to an external collector.

How did it evade detection? The malicious code sat inside a file that only test runners executed; the normal code-review checks focused on YAML workflow changes and missed a new test file in a deep tests/ package. The organization’s CI used caches aggressively and had a few self-hosted runners accessible from the corporate network. The attacker had exploited a misconfigured pull-request trigger on a low-privilege repo to place the payload into the cache, then waited for the payment-service build to restore the cache. When the collector received the first tokens, the attacker quickly used them to pull private images and access a staging environment. From there, lateral movement found a misconfigured database user and a handful of customer records were exposed to a breach that could have been orders of magnitude worse.

After-action findings catalyzed fast change: the company turned off shared caches for critical pipelines, replaced self-hosted runners with ephemeral hosted runners for production jobs, introduced SBOMs and cosign signatures into the build flow and validated that only signed artifacts could be promoted to production. The next year they ran a red-team exercise that recreated the attack and this time the detection pipeline caught the exfil attempt in minutes.

The verdict was simple: it wasn’t the test suite that was the problem; it was the trust they had given to automation. Once they treated the build as untrusted, the attack surface shrank dramatically.

Conclusion: actionable checklist

Use ephemeral GitHub-hosted runners for sensitive jobs; isolate any self-hosted runners.
Pin and audit GitHub Actions and third-party actions. Use immutable references where possible.
Disable or narrowly scope caches for sensitive builds; validate cache contents.
Pin dependency versions and use hash verification; prefer private mirrors.
Sign artifacts (cosign/Sigstore), publish attestations (Rekor) and adopt SLSA/in-toto where practical.
Split build and promotion pipelines; only signed artifacts should be promoted.
Add runtime detection (Falco) and container scanning (Trivy).
Run red-team CI tests: simulate cache poisoning, compromised runner and dependency confusion.

How Attackers Bypass Your “Compliant” CI/CD Pipeline (And How to Redesign It)

Quick summary (TL;DR for the impatient)

Red-team POV: real bypass paths

Attack Path #1 - Compromised Runners (Persistent Backdoors)

Attack Path #2 - Poisoned Caches & Artifacts

Attack Path #3 - Dependency Confusion & Malicious Packages

Attack Path #4 - Indirect Pipeline Poisoning (PPE)

Why this goes viral (and scares execs)

Concrete case study: Python microservices on GitHub Actions (cloud-agnostic)

Repo layout (example)

Vulnerable pipeline (what an attacker abuses)

Redesigned secure pipeline; principles first

Implementation: step-by-step

A - Harden GitHub Actions workflow (ci-cd.yml)

B - Lockdown dependency installs (requirements.txt with hashes)

C - Limit runner reach & secrets exposure

D - Protect caches or avoid them for sensitive builds

E - Runtime detection

Tests and validation techniques

Real-world life story (Inspired by real incidents)

The Midnight Commit: How one build led to a breach

Conclusion: actionable checklist

References

Comments

AI-Native Infrastructure & Security Architecture Research | Subhanshu Mohan Gupta

The $0 Compliance Stack

More from this blog

Trust the Silicon. They Said.

The EU CRA countdown

Crypto inventory: the platform workstream nobody scoped

The agentic SOC is here

The distributed monolith tax

Command Palette

Quick summary (TL;DR for the impatient)

Red-team POV: real bypass paths

Attack Path #1 - Compromised Runners (Persistent Backdoors)

Attack Path #2 - Poisoned Caches & Artifacts

Attack Path #3 - Dependency Confusion & Malicious Packages

Attack Path #4 - Indirect Pipeline Poisoning (PPE)

Why this goes viral (and scares execs)

Concrete case study: Python microservices on GitHub Actions (cloud-agnostic)

Repo layout (example)

Vulnerable pipeline (what an attacker abuses)

Redesigned secure pipeline; principles first

Implementation: step-by-step

A - Harden GitHub Actions workflow (ci-cd.yml)

B - Lockdown dependency installs (requirements.txt with hashes)

C - Limit runner reach & secrets exposure

D - Protect caches or avoid them for sensitive builds

E - Runtime detection

Tests and validation techniques

Real-world life story (Inspired by real incidents)

The Midnight Commit: How one build led to a breach

Conclusion: actionable checklist

References

Comments

AI-Native Infrastructure & Security Architecture Research | Subhanshu Mohan Gupta

The $0 Compliance Stack

More from this blog