Skip to main content

Stop Shipping Blind RAG: SearXNG for Drupal AI Assistants That Respect Privacy

· 4 min read
Victor Jimenez
Software Engineer & AI Agent Builder

SearXNG in Drupal AI assistants matters because it gives you controllable, inspectable retrieval without funneling your org's prompts and context into yet another black-box "trust us" API.

I am tired of teams shipping AI features that cannot answer the most basic engineering question: "Where did this answer come from, and who saw the query?"

The problem

If your assistant fetches from random hosted search APIs, you inherit:

  • unclear logging policies
  • unknown ranking behavior
  • compliance headaches you only notice during incident review
Compliance Risk

For Drupal teams in regulated orgs and nonprofits, this is not an edge case. It is Tuesday. If your assistant cannot show source provenance, it is a demo, not a tool.

Hosted vs. self-hosted retrieval

AspectHosted Search APISearXNG (Self-Hosted)
Query loggingProvider-controlledYou control logs
Ranking behaviorOpaqueConfigurable per source
Source provenanceOften hiddenFull URL trail
Compliance auditDepends on vendorInfrastructure-level
Operational costAPI feesSelf-hosted ops
Provider lock-inYesNo

The solution

Use SearXNG as the retrieval layer for Drupal AI assistants so search is:

  • self-hostable
  • provider-agnostic
  • auditable at the infra layer

The important point is control, not novelty. "New" is cheap. "Debuggable in production" is expensive.

Architecture

Half-measures are worse

If you do this halfway, you get the worst of both worlds: self-hosted ops burden and still-noisy retrieval. Tune sources, rate limits, and filtering early.

Configuration approach

searxng/settings.yml
search:
safe_search: 2
default_lang: en
formats:
- json
- html

engines:
- name: duckduckgo
engine: duckduckgo
shortcut: ddg
- name: wikipedia
engine: wikipedia
shortcut: wp

Maintained module check

For Drupal, the AI ecosystem is actively maintained, and this SearXNG direction fits that trajectory. This is not an abandoned side quest module held together by hope and stale issue comments.

Deployment checklist

  • Deploy SearXNG instance (Docker recommended)
  • Configure search engines and rate limits
  • Disable unused engines to reduce noise
  • Wire Drupal AI assistant to SearXNG JSON API
  • Add source provenance to assistant responses
  • Set up audit logging for all queries
  • Test with production-like query patterns
  • Monitor search quality and source diversity weekly
Related implementation posts

Why this matters for Drupal and WordPress

Drupal's AI Initiative is actively building search integration patterns, and SearXNG fits directly into the Drupal AI module ecosystem as a privacy-first retrieval backend. For Drupal agencies serving government, healthcare, or nonprofit clients, self-hosted search eliminates the compliance risk of sending user queries to third-party APIs. WordPress developers building AI-powered plugins face the same challenge -- the SearXNG JSON API integration pattern shown here works identically from a WordPress plugin using wp_remote_get(), giving WordPress AI tools the same auditable retrieval without vendor lock-in.

What I learned

  • SearXNG is worth trying when legal/privacy constraints make hosted retrieval a non-starter.
  • Avoid "default everything" configs in production; noisy sources ruin answer quality fast.
  • If your assistant cannot show source provenance, it is a demo, not a tool.
  • Self-hosted search is extra ops work, but at least the tradeoff is honest and measurable.

References


Looking for an Architect who doesn't just write code, but builds the AI systems that multiply your team's output? View my enterprise CMS case studies at victorjimenezdev.github.io or connect with me on LinkedIn.