Skip to main content

Drupal Entity Reference Integrity: Auto-Fix Broken References at Scale

· 5 min read
Victor Jimenez
Software Engineer & AI Agent Builder

The first version of drupal-entity-reference-integrity could scan a Drupal site and report broken entity references. That was the easy part. The hard part -- actually fixing them at scale without taking the site down -- was missing. This upgrade adds auto-fix mode, batch processing, and the test coverage needed to trust both in production.

What Changed

The original module scanned entity reference fields and printed a report. Useful for audits, useless for remediation. You still had to fix every broken reference by hand.

The upgraded module introduces two capabilities that make it production-ready:

  • Auto-fix mode -- pass the --fix flag to the Drush command and the module removes broken references from entity fields and saves the entities automatically. No manual editing. No exporting config and grepping through YAML.
  • Batch processing -- the --batch-size flag (default 50) controls how many entities are loaded and processed per cycle. Large sites with tens of thousands of nodes no longer risk memory exhaustion or timeout. The scanner processes one batch, flushes, and moves to the next.

The Drush command still supports table, CSV, and JSON output for reporting. You can run a dry scan first to review what would change, then re-run with --fix to apply corrections.

Tech Stack

ComponentTechnologyWhy
CMSDrupal 10 and 11No deprecated API calls, no version-locked service defs
CLIDrush command--fix, --batch-size, --format flags
TestingPHPUnit (9 tests)Mocked entity storage, no database required
OutputTable, CSV, JSONWhatever downstream tools need
LicenseMITOpen for adoption
Dry Scan First, Then Fix

The --fix flag is intentionally opt-in. The default behavior is still a non-destructive scan. Always run a dry scan first, review the report, and then re-run with --fix. This is especially important on large sites where you want to verify the scope of changes before applying them.

Batch Size Matters on Large Sites

The default batch size of 50 is conservative. On sites with tens of thousands of nodes, you may want to increase it for speed or decrease it if memory is tight. Watch PHP's memory_limit and the Drush timeout settings.

drush-scan.sh
# Scan and report — no changes applied
drush entity-reference-integrity:scan --format=table
drush entity-reference-integrity:scan --format=json > report.json
drush entity-reference-integrity:scan --format=csv > report.csv

Test Coverage

9 PHPUnit tests with comprehensive mocking validate the scanner, the auto-fix logic, the batch processing boundaries, and the output formatting. Entity storage, field definitions, and entity references are all mocked so the tests run without a database. Every code path that touches entity data is covered.

Test coverage areas
AreaWhat is tested
ScannerDetecting broken references across field types
Auto-fixRemoving broken refs and saving entities
Batch processingBoundary handling, flush between batches
Output formattingTable, CSV, JSON serialization
MockingEntity storage, field definitions, references

Why this matters for Drupal and WordPress

Broken entity references are one of the most common data integrity problems on Drupal sites with complex content models -- paragraphs referencing deleted media, taxonomy terms pointing to removed vocabularies, or node references to content that was bulk-deleted. This module automates what agencies otherwise do manually with database queries. WordPress sites face the equivalent problem with orphaned post meta, broken ACF relationship fields, and dangling WooCommerce product references. The batch-scan-then-fix pattern translates directly to a WP-CLI command or plugin that audits wp_postmeta for stale foreign keys.

Technical Takeaway

Detection without remediation is a report that sits in someone's inbox. Auto-fix changes the workflow: scan, review, apply. Batch processing makes this viable on real sites where entity counts are in the tens of thousands. The --fix flag is intentionally opt-in -- the default behavior is still a non-destructive scan -- so the module is safe to add to existing sites without risk of unintended writes.

References


Looking for an Architect who doesn't just write code, but builds the AI systems that multiply your team's output? View my enterprise CMS case studies at victorjimenezdev.github.io or connect with me on LinkedIn.