Skip to main content

Vault Sprawl Risk Patterns and a Secrets Governance Model for Multi-Team CI/CD

· 5 min read
Victor Jimenez
Software Engineer & AI Agent Builder

Vault sprawl in multi-team CI/CD is usually a governance failure, not a tooling failure. The practical model that works is: short-lived identity-based access (OIDC/workload identity), path ownership boundaries, policy-as-code with review gates, and measurable rotation/usage controls per team.

The problem

As teams scale, secrets handling drifts into four repeating failure patterns:

Sprawl patternWhat breaksTypical incident
One shared Vault namespace for many teamsNo clear ownership, broad blast radiusTeam A pipeline can read Team B secrets
Long-lived CI tokens in repo/org secretsRotations lag, credentials leak and persistExposed token keeps working for weeks
Inconsistent secret paths/namesAutomation and auditing become brittleRotation scripts miss critical paths
Manual exceptions outside policy reviewShadow access accumulatesEmergency grants never removed
Blast Radius

Kubernetes guidance still warns that native secrets can be mishandled without encryption-at-rest and strict RBAC. The same pattern appears in CI: if identity and policy are weak, secret stores become high-value failure hubs instead of controls.

The solution

Governance blueprint

Control planeStandardEnforce in CI
IdentityOIDC/workload identity only for CIBlock static token auth in pipelines
AuthorizationTeam-scoped Vault paths + least privilegeValidate policy diffs on PR
LifecycleTTL defaults + max TTL + mandatory rotation SLAFail builds for expired owners/rotation metadata
ObservabilityAudit logs mapped to repo/team/serviceDaily drift report to platform + team owners

Reference policy contract

vault/policies/team-payments-ci.hcl
path "kv/data/payments/prod/*" {
capabilities = ["read"]
}

path "database/creds/payments-ci" {
capabilities = ["read"]
}

Migration from deprecated pattern

- # Deprecated: static CI secrets for Vault/cloud auth
- env:
- VAULT_TOKEN: ${{ secrets.VAULT_TOKEN }}
+ # Replacement: OIDC federation + dynamic secrets + bounded TTL
+ permissions:
+ id-token: write

OIDC authentication flow

Secret lifecycle states

Operating rules that prevent re-sprawl

  1. One team owner for each secret path prefix (kv/data/<team>/<env>/...).
  2. Every secret includes metadata: owner, rotation_sla_days, source_system.
  3. PR checks reject policy changes without owner approval.
  4. Any manual break-glass access auto-expires and creates a follow-up ticket.
Fastest Risk Reduction

OIDC plus short-lived credentials is the fastest risk reduction move in CI/CD. Start there before adding more tooling.

Migration checklist

  • Audit all static CI tokens across repos
  • Configure OIDC/workload identity for all CI pipelines
  • Establish team-scoped Vault path ownership
  • Write and review Vault policies as code
  • Set TTL defaults and max TTL for all secret paths
  • Add rotation SLA metadata to all secrets
  • Configure CI to fail on expired owners/rotation metadata
  • Set up daily drift reports for platform and team owners
  • Remove all static long-lived tokens from CI secrets
Related posts

Why this matters for Drupal and WordPress

Drupal and WordPress deployments often run on platform or agency CI: Pantheon, Acquia, WP Engine, or custom pipelines that build, test, and deploy sites and contrib/plugins. Those pipelines need DB credentials, API keys, and sometimes Vault (or similar) for secrets. Multi-tenant or multi-team setups suffer the same sprawl — shared namespaces, long-lived tokens in GitHub Actions or GitLab CI, and no clear path ownership. Applying this governance model (OIDC for CI, team-scoped paths, policy-as-code, rotation SLAs as CI gates) reduces risk for any team that deploys Drupal/WordPress from CI. If you maintain contrib modules or plugins and use a shared secrets store, push for identity-based access and required rotation metadata so one leaked token doesn't expose every site or environment.

What I learned

  • Worth trying when many teams share one secrets platform: enforce path ownership before adding more tooling.
  • OIDC plus short-lived credentials is the fastest risk reduction move in CI/CD.
  • Avoid in production: emergency policy exceptions without expiry and ticketed cleanup.
  • Rotation SLAs are only useful when encoded as CI gates, not documentation.

References


Looking for an Architect who doesn't just write code, but builds the AI systems that multiply your team's output? View my enterprise CMS case studies at victorjimenezdev.github.io or connect with me on LinkedIn.