Stop Shipping Blind RAG: SearXNG for Drupal AI Assistants That Respect Privacy

February 24, 2026 · 4 min read

Software Engineer & AI Agent Builder

SearXNG in Drupal AI assistants matters because it gives you controllable, inspectable retrieval without funneling your org's prompts and context into yet another black-box "trust us" API.

I am tired of teams shipping AI features that cannot answer the most basic engineering question: "Where did this answer come from, and who saw the query?"

The problem

If your assistant fetches from random hosted search APIs, you inherit:

unclear logging policies
unknown ranking behavior
compliance headaches you only notice during incident review

Compliance Risk

For Drupal teams in regulated orgs and nonprofits, this is not an edge case. It is Tuesday. If your assistant cannot show source provenance, it is a demo, not a tool.

Hosted vs. self-hosted retrieval

Aspect	Hosted Search API	SearXNG (Self-Hosted)
Query logging	Provider-controlled	You control logs
Ranking behavior	Opaque	Configurable per source
Source provenance	Often hidden	Full URL trail
Compliance audit	Depends on vendor	Infrastructure-level
Operational cost	API fees	Self-hosted ops
Provider lock-in	Yes	No

The solution

Use SearXNG as the retrieval layer for Drupal AI assistants so search is:

self-hostable
provider-agnostic
auditable at the infra layer

The important point is control, not novelty. "New" is cheap. "Debuggable in production" is expensive.

Architecture

Half-measures are worse

If you do this halfway, you get the worst of both worlds: self-hosted ops burden and still-noisy retrieval. Tune sources, rate limits, and filtering early.

Configuration approach

Minimal SearXNG Config
Drupal Integration Pattern

searxng/settings.yml
search:
  safe_search: 2
  default_lang: en
  formats:
- json
- html

engines:
- name: duckduckgo
engine: duckduckgo
shortcut: ddg
- name: wikipedia
engine: wikipedia
shortcut: wp

src/Service/SearxngSearchProvider.php
class SearxngSearchProvider implements SearchProviderInterface {

  public function search(string $query, array $options = []): SearchResults {
$response = $this->httpClient->get($this->baseUrl . '/search', [
'query' => [
'q' => $query,
'format' => 'json',
'categories' => $options['categories'] ?? 'general',
],
]);

return $this->normalizeResults($response);
  }
}

Maintained module check

For Drupal, the AI ecosystem is actively maintained, and this SearXNG direction fits that trajectory. This is not an abandoned side quest module held together by hope and stale issue comments.

Deployment checklist

Deploy SearXNG instance (Docker recommended)
Configure search engines and rate limits
Disable unused engines to reduce noise
Wire Drupal AI assistant to SearXNG JSON API
Add source provenance to assistant responses
Set up audit logging for all queries
Test with production-like query patterns
Monitor search quality and source diversity weekly

Why this matters for Drupal and WordPress

Drupal's AI Initiative is actively building search integration patterns, and SearXNG fits directly into the Drupal AI module ecosystem as a privacy-first retrieval backend. For Drupal agencies serving government, healthcare, or nonprofit clients, self-hosted search eliminates the compliance risk of sending user queries to third-party APIs. WordPress developers building AI-powered plugins face the same challenge -- the SearXNG JSON API integration pattern shown here works identically from a WordPress plugin using wp_remote_get(), giving WordPress AI tools the same auditable retrieval without vendor lock-in.

What I learned

SearXNG is worth trying when legal/privacy constraints make hosted retrieval a non-starter.
Avoid "default everything" configs in production; noisy sources ruin answer quality fast.
If your assistant cannot show source provenance, it is a demo, not a tool.
Self-hosted search is extra ops work, but at least the tradeoff is honest and measurable.

References

Drupal AI Initiative: SearXNG - Privacy-First Web Search for Drupal AI Assistants

Looking for an Architect who doesn't just write code, but builds the AI systems that multiply your team's output? View my enterprise CMS case studies at victorjimenezdev.github.io or connect with me on LinkedIn.

The problem​

Hosted vs. self-hosted retrieval​

The solution​

Architecture​

Configuration approach​

Maintained module check​

Deployment checklist​

Why this matters for Drupal and WordPress​

What I learned​

References​