# Research: Integrating Brave Search API into Annie

**Date:** 2026-03-23
**Status:** Research Complete
**Question:** Can Brave Search API replace or augment SearXNG to reduce Annie's search latency for complex queries?
**Context:** Annie uses self-hosted SearXNG (localhost:8888) for web search. Complex queries trigger 9+ sequential SearXNG searches via the research sub-agent, taking 5+ minutes. Brave Search API offers a direct, fast alternative with an LLM-optimized endpoint.

---

## TL;DR -- The Verdict

**Use Brave Search LLM Context as the primary search backend. Keep SearXNG as fallback.**

| Approach | Latency (single query) | Cost | Quality | Recommendation |
|----------|----------------------|------|---------|----------------|
| SearXNG (current) | 200-800ms per query, meta-search variability | $0 (self-hosted) | Good snippets, no content extraction | Keep as fallback |
| Brave Web Search | <1s at p95 | $0.005/query | Title + URL + snippet | Good for simple lookups |
| **Brave LLM Context** | **<600ms at p90** | **$0.005/query** | **Pre-extracted text chunks, tables, code** | **Primary for all queries** |
| SearXNG + fetch_webpage | 200ms + 5s per URL | $0 | Full page text (slow) | Replace with LLM Context |

**Key insight:** Brave LLM Context eliminates the need for a separate `fetch_webpage` step. Instead of search (snippets) -> fetch_webpage (full text) -> LLM reasoning, you get search + extracted content in a single API call at <600ms. This is the biggest win -- it collapses two tool rounds into one.

**Estimated cost:** ~$3-5/month at current usage (600-1000 queries/month), well within the $5 free monthly credit.

---

## 1. Brave Search API Deep Dive

### 1.1 Endpoints

| Endpoint | URL | What It Returns |
|----------|-----|-----------------|
| **Web Search** | `GET /res/v1/web/search` | Title, URL, description snippet, age |
| **LLM Context** | `GET|POST /res/v1/llm/context` | Pre-extracted page content: text chunks, tables, code blocks, sources |
| Images | `GET /res/v1/images/search` | Image results |
| News | `GET /res/v1/news/search` | News articles |
| Videos | `GET /res/v1/videos/search` | Video results |
| Place Search | `GET /res/v1/local/pois` | 200M+ points of interest |

### 1.2 Authentication

Single header: `X-Subscription-Token: <API_KEY>`

All endpoints use the same key. No OAuth, no bearer tokens, no complex auth flows.

### 1.3 Rate Limits & Pricing

**Search Plan:** $5 per 1,000 requests (covers Web, LLM Context, Images, News, Videos).

**Free credit:** $5/month that renews -- covers 1,000 queries/month for free.

**Rate limit headers in every response:**
- `X-RateLimit-Policy`: Window sizes (e.g., `1;w=1, 15000;w=2592000` = 1 req/sec, 15K/month)
- `X-RateLimit-Remaining`: Requests left in each window
- `X-RateLimit-Reset`: Seconds until window resets

**Rate limit exceeded:** HTTP 429. Only successful (non-error) responses count against quota.

**Per-second limit:** 1 req/sec on the free tier. This matters -- sequential rapid-fire searches need throttling or will 429.

### 1.4 LLM Context Endpoint -- The Key Differentiator

This is why Brave is worth adding. The LLM Context endpoint does NOT return snippets -- it returns **actual extracted page content**.

**Request:**
```
GET https://api.search.brave.com/res/v1/llm/context?q=tallest+mountains
Headers: Accept: application/json, X-Subscription-Token: BSA...
```

**Response structure:**
```json
{
  "grounding": {
    "generic": [
      {
        "url": "https://example.com/article",
        "title": "Article Title",
        "snippets": [
          "First extracted text chunk with actual content...",
          "Second chunk, could be a table or code block...",
          "Third chunk with more relevant information..."
        ]
      }
    ]
  },
  "sources": [
    {
      "url": "https://example.com/article",
      "hostname": "example.com",
      "date": "2026-03-20"
    }
  ]
}
```

**What makes this different from regular Web Search:**

| Feature | Web Search (`/web/search`) | LLM Context (`/llm/context`) |
|---------|--------------------------|------------------------------|
| Content depth | Snippet (1-2 sentences) | Full extracted text chunks |
| Tables/code | No | Yes (extracted and structured) |
| Freshness filter | Yes (day/week/month/year) | No |
| Date range filter | Yes (date_after/date_before) | No |
| UI language | Yes | No |
| Per-result size | ~100 chars | ~500-2000 chars per snippet |
| Best for | Quick factual lookups | RAG, complex analysis, grounding |

**Latency:** <600ms at p90 (Brave's measurement includes <130ms overhead on top of normal search).

### 1.5 Web Search Endpoint

For when LLM Context is overkill (simple factual lookups like "weather in Bangalore"):

**Request parameters:**
- `q` (required): Search query
- `count`: 1-10 results (default 5)
- `country`: 2-letter ISO code (e.g., "IN", "US")
- `search_lang`: Language code (e.g., "en", "kn" for Kannada)
- `freshness`: `day`, `week`, `month`, `year`
- `date_after`, `date_before`: YYYY-MM-DD date range

**Response:** `web.results[]` with `title`, `url`, `description`, `age`.

---

## 2. Current Annie Search Architecture

### 2.1 Data Flow (Current)

```
User: "What's the latest news on NVIDIA stock?"
  |
  v
LLM calls web_search(query="NVIDIA stock news")
  |
  v
tools.search_web() --> SearXNG localhost:8888 --> 5 snippets (title + 1-2 sentences)
  |
  v
LLM reads snippets, optionally calls fetch_webpage(url) for details
  |
  v
tools.fetch_webpage() --> httpx GET --> readability-lxml --> 2000 chars
  |
  v
LLM synthesizes answer
```

**Problem 1: Two-step search.** LLM often needs to search, then fetch 1-3 URLs, then synthesize. Each fetch_webpage is 5-10s (HTTP + HTML parsing + readability).

**Problem 2: Research sub-agent fan-out.** The `invoke_researcher` sub-agent (subagent_tools.py) calls `_search_web()` + potentially multiple fetches. For complex queries, this means 9+ sequential SearXNG calls, each 200-800ms, plus fetch_webpage calls at 5s each.

### 2.2 Files That Touch Search

| File | Role | What Changes |
|------|------|-------------|
| `tools.py` | Core search/fetch functions + Pipecat handlers | Add `search_brave()` function |
| `text_llm.py` | Tool definitions (CLAUDE_TOOLS, OPENAI_TOOLS) + `_execute_tool()` | Add brave tool execution path |
| `bot.py` | Voice pipeline tool registration (FunctionSchema) | Add brave tool schema for voice |
| `subagent_tools.py` | Research sub-agent web search helper | Replace `_search_web()` with Brave |
| `docker-compose.yml` | SearXNG container | Keep as-is (fallback) |

### 2.3 Current Tool Registration Pattern

**Voice (bot.py):** `FunctionSchema` objects in `_BASE_TOOL_SCHEMAS` list, registered on `OpenAILLMContext`.

**Text chat (text_llm.py):** `CLAUDE_TOOLS` (Anthropic format) and `OPENAI_TOOLS` (auto-generated from CLAUDE_TOOLS). Tool dispatch in `_execute_tool()` with if/elif chain.

**Key observation:** Both paths call the same `search_web()` function from `tools.py`. Replacing the backend in `tools.py` propagates to both voice and text chat automatically.

---

## 3. OpenClaw/NemoClaw Patterns Worth Adopting

From `vendor/openclaw/extensions/brave/src/brave-web-search-provider.ts`:

### 3.1 Dual-Mode Support

OpenClaw supports both `web` and `llm-context` modes via a `mode` config field. The tool description changes dynamically based on mode:
- Web mode: "Returns titles, URLs, and snippets for fast research"
- LLM Context mode: "Returns pre-extracted page content optimized for LLM grounding"

**Adopt:** We should support both modes and default to `llm-context`.

### 3.2 In-Memory Cache with TTL

OpenClaw caches search results by composite key: `[provider, mode, query, count, country, lang, freshness, dates]`. Default TTL: 15 minutes. Cache is checked before every API call.

**Adopt:** Simple dict cache with TTL in tools.py. Prevents duplicate searches for same query within a session.

### 3.3 Missing API Key Handling

OpenClaw returns a structured error with setup instructions when the API key is missing. The error includes the config command and docs URL.

**Adopt:** Graceful degradation -- if `BRAVE_API_KEY` is not set, fall back to SearXNG silently.

### 3.4 Language Normalization

OpenClaw validates and normalizes Brave's language codes (e.g., `ja` -> `jp`, `zh` -> `zh-hans`). Invalid codes return a structured error rather than a raw API failure.

**Skip for now:** Annie operates in English. Add language support later if needed.

---

## 4. Tiered Search Strategy -- Recommendation

### Option A: LLM Decides (Two Separate Tools)

Give the LLM both `web_search` (SearXNG) and `brave_search` (Brave API).

**Pros:** Maximum flexibility. LLM can choose based on query complexity.
**Cons:** Doubles the tool surface area. 9B Nano model already sometimes struggles with tool selection. Two search tools = more confusion. Also wastes prompt tokens on two tool descriptions.

**Verdict: Reject.** Adding tools to a 9B voice model is expensive (prompt space) and risky (tool confusion).

### Option B: Automatic Escalation (SearXNG First, Brave Fallback)

Try SearXNG first. If it fails/times out/returns zero results, escalate to Brave.

**Pros:** Zero cost for queries SearXNG handles well. Brave only used when needed.
**Cons:** Adds latency (SearXNG timeout + Brave latency). Complex failure detection logic. Still two HTTP calls in the failure path. Optimizing for the wrong thing (saving $0.005/query at the cost of user experience).

**Verdict: Reject.** The latency penalty of trying SearXNG first defeats the purpose.

### Option C: Always Brave (Simplest)

Replace SearXNG backend with Brave API. One tool, one backend.

**Pros:** Simplest. Fastest. One API call. One code path to maintain.
**Cons:** $3-5/month cost. Brave API outage = no search at all. Per-second rate limit (1 req/sec on free tier) could bottleneck rapid sequential searches.

**Verdict: Close, but needs a fallback.**

### Option D: Brave Primary + SearXNG Fallback (RECOMMENDED)

Use Brave LLM Context as the primary backend. If Brave fails (429, 500, timeout, no API key), transparently fall back to SearXNG.

```
search_web(query) flow:
  1. BRAVE_API_KEY set? --> Brave LLM Context API
     - Success --> return extracted content
     - 429/500/timeout --> fall through to SearXNG
  2. SearXNG fallback --> return snippets (degraded but functional)
```

**Pros:**
- Fastest path for the common case (single Brave API call at <600ms)
- LLM Context eliminates most `fetch_webpage` calls (content already extracted)
- SearXNG fallback means never fully offline
- Same tool name (`web_search`) -- zero changes to LLM tool definitions
- ~$3-5/month, within free credit
- SearXNG Docker container can stay running as passive fallback

**Cons:**
- Two code paths to maintain (but SearXNG path already exists)
- 1 req/sec Brave rate limit could bottleneck research sub-agent

**Rate limit mitigation:** The research sub-agent can batch queries with 1s delay between calls, or we upgrade to a paid tier with higher per-second limits when needed.

**Verdict: RECOMMENDED.** Best UX, minimal code change, graceful degradation.

---

## 5. Implementation Plan

### 5.1 Phase 1: Core Integration (1-2 hours)

**File: `services/annie-voice/tools.py`**

Add Brave search function alongside existing SearXNG:

```python
# New constants
BRAVE_API_KEY = os.getenv("BRAVE_API_KEY", "")
BRAVE_LLM_CONTEXT_URL = "https://api.search.brave.com/res/v1/llm/context"
BRAVE_WEB_SEARCH_URL = "https://api.search.brave.com/res/v1/web/search"
BRAVE_TIMEOUT = 8.0  # generous timeout, Brave p90 is <600ms

# In-memory cache (query -> (result, expiry_time))
_brave_cache: dict[str, tuple[str, float]] = {}
BRAVE_CACHE_TTL = 900  # 15 minutes, matches OpenClaw

async def search_brave_llm_context(query: str) -> str | None:
    """Search via Brave LLM Context API. Returns extracted content or None on failure."""
    if not BRAVE_API_KEY:
        return None

    # Check cache
    cache_key = f"llm:{query}"
    if cache_key in _brave_cache:
        result, expiry = _brave_cache[cache_key]
        if time.time() < expiry:
            return result
        del _brave_cache[cache_key]

    try:
        async with httpx.AsyncClient(timeout=httpx.Timeout(BRAVE_TIMEOUT)) as client:
            resp = await client.get(
                BRAVE_LLM_CONTEXT_URL,
                params={"q": query},
                headers={
                    "Accept": "application/json",
                    "X-Subscription-Token": BRAVE_API_KEY,
                },
            )
            resp.raise_for_status()
            data = resp.json()

        results = data.get("grounding", {}).get("generic", [])
        if not results:
            return None

        lines = []
        for i, r in enumerate(results[:MAX_RESULTS], 1):
            title = r.get("title", "")
            snippets = r.get("snippets", [])
            content = " ".join(snippets)[:MAX_TEXT_CHARS]
            url = r.get("url", "")
            lines.append(f"{i}. {title} ({url}): {content}")

        result = "\n".join(lines)
        _brave_cache[cache_key] = (result, time.time() + BRAVE_CACHE_TTL)
        return result
    except Exception as e:
        logger.warning("Brave LLM Context failed: {}, falling back to SearXNG", e)
        return None
```

**Modify existing `search_web()` to use Brave-first strategy:**

```python
async def search_web(query: str, base_url: str = SEARXNG_BASE_URL) -> str:
    """Search web: Brave LLM Context (primary) -> SearXNG (fallback)."""
    # Try Brave first
    brave_result = await search_brave_llm_context(query)
    if brave_result:
        return brave_result

    # Fallback to SearXNG
    return await _search_searxng(query, base_url)
```

Rename current `search_web` to `_search_searxng` (private, fallback only).

### 5.2 Phase 2: Dashboard Observability (30 min)

Update `handle_web_search` in `tools.py` to emit provider info in creature events:

```python
emit_event("chimera", "start",
           data={"query": query, "provider": "brave" if brave_result else "searxng"},
           reasoning=f"{'Brave LLM Context' if brave_result else 'SearXNG'} query: '{query}'")
```

This lets the dashboard show which provider served each search.

### 5.3 Phase 3: Sub-Agent Optimization (30 min)

**File: `services/annie-voice/subagent_tools.py`**

Replace `_search_web()` to use the same Brave-first strategy:

```python
async def _search_web(query: str) -> str:
    """Web search via Brave LLM Context (primary) or SearXNG (fallback)."""
    from tools import search_web
    try:
        return await asyncio.wait_for(search_web(query), timeout=10.0)
    except asyncio.TimeoutError:
        return ""
```

This is the biggest win: the research sub-agent currently fires 9+ SearXNG queries. With Brave LLM Context, each query returns pre-extracted content, likely reducing the number of required queries from 9 to 3-4 (since fetch_webpage calls become unnecessary).

### 5.4 Phase 4: Brave Web Search for Simple Queries (Optional, 30 min)

Add a `search_brave_web()` for lightweight queries where LLM Context is overkill (weather, time, simple facts). This saves content extraction overhead and returns faster.

Gate the choice inside `search_web()`:

```python
async def search_web(query: str, base_url: str = SEARXNG_BASE_URL) -> str:
    """Search web with provider cascade."""
    if BRAVE_API_KEY:
        # Use LLM Context for complex queries, Web Search for simple ones
        if _is_simple_query(query):  # weather, time, stock price, etc.
            return await search_brave_web(query) or await _search_searxng(query, base_url)
        return await search_brave_llm_context(query) or await _search_searxng(query, base_url)
    return await _search_searxng(query, base_url)
```

This is optional and can be deferred -- LLM Context works fine for simple queries too, just returns more data than needed.

### 5.5 Configuration

**Environment variable:** `BRAVE_API_KEY` (same as OpenClaw convention).

**Where to set it:**
- `.env` in `services/annie-voice/` (dev)
- `start.sh` export (production on Titan)

**No changes needed to:**
- Tool definitions (CLAUDE_TOOLS, OPENAI_TOOLS, _BASE_TOOL_SCHEMAS) -- same `web_search` tool
- LLM system prompt -- same tool name and behavior
- Docker compose -- SearXNG stays as fallback

---

## 6. Caching Strategy

### 6.1 In-Memory Cache

```python
import time

_brave_cache: dict[str, tuple[str, float]] = {}
BRAVE_CACHE_TTL = 900  # 15 minutes

def _cache_get(key: str) -> str | None:
    if key in _brave_cache:
        result, expiry = _brave_cache[key]
        if time.time() < expiry:
            return result
        del _brave_cache[key]
    return None

def _cache_set(key: str, value: str) -> None:
    # Evict expired entries periodically
    now = time.time()
    if len(_brave_cache) > 100:
        _brave_cache.clear()  # simple eviction, good enough for single-user
    _brave_cache[key] = (value, now + BRAVE_CACHE_TTL)
```

**Why not Redis/disk?** Annie is single-user, single-process. In-memory dict is simpler, faster, and good enough. OpenClaw uses in-memory too.

### 6.2 Cache Key Design

Composite key: `f"{mode}:{query.lower().strip()}"`. Keep it simple -- no country/language params needed since Annie operates in English from India.

---

## 7. Failure Modes & Fallback Behavior

| Failure | Detection | Fallback |
|---------|-----------|----------|
| No API key | `BRAVE_API_KEY` empty | SearXNG directly |
| Brave 429 (rate limit) | HTTP status | SearXNG for this query |
| Brave 500 (server error) | HTTP status | SearXNG for this query |
| Brave timeout (>8s) | httpx.TimeoutException | SearXNG for this query |
| Brave 401 (bad key) | HTTP status + log WARNING | SearXNG + log to investigate |
| SearXNG also fails | Exception | Return "Search unavailable" error |
| Brave returns 0 results | Empty `grounding.generic[]` | Fall through to SearXNG |

**Important:** Every Brave failure logs a warning. Repeated failures should be visible in the dashboard via chimera creature events with `provider: "searxng_fallback"`.

---

## 8. Cost Projection

| Usage Pattern | Queries/Month | Monthly Cost | Notes |
|---------------|---------------|-------------|-------|
| Light (current) | 300-500 | $0 (free credit) | 2-3 searches/day average |
| Moderate (research sub-agent) | 600-1000 | $0-2.50 | Free credit covers first 1000 |
| Heavy (research-heavy days) | 1000-2000 | $2.50-7.50 | Caching reduces effective count |
| With cache hits (~30%) | 700-1400 effective | $0-5 | Cache eliminates repeat queries |

**Budget cap:** Set usage limit in Brave dashboard to $10/month. Alert if approaching.

---

## 9. What This Does NOT Change

- **Tool name stays `web_search`** -- no LLM retraining, no prompt changes
- **SearXNG Docker container stays running** -- passive fallback, zero maintenance
- **fetch_webpage tool stays available** -- LLM can still fetch specific URLs if needed, but will need to less often since LLM Context provides extracted content
- **Voice pipeline unchanged** -- same Pipecat handlers, same tool schemas
- **No new dependencies** -- httpx is already used everywhere

---

## 10. Testing Plan

### 10.1 Unit Tests (in `tests/test_tools.py`)

```python
# Test Brave LLM Context with mocked httpx
async def test_search_brave_llm_context_success():
    """Brave API returns extracted content."""

async def test_search_brave_llm_context_no_api_key():
    """Returns None when BRAVE_API_KEY not set."""

async def test_search_brave_llm_context_timeout():
    """Returns None on timeout, logs warning."""

async def test_search_brave_llm_context_429():
    """Returns None on rate limit, logs warning."""

async def test_search_brave_llm_context_empty_results():
    """Returns None when grounding.generic is empty."""

async def test_search_web_brave_primary():
    """search_web uses Brave when available."""

async def test_search_web_searxng_fallback():
    """search_web falls back to SearXNG when Brave fails."""

async def test_brave_cache_hit():
    """Cached result returned within TTL."""

async def test_brave_cache_expiry():
    """Expired cache entry triggers fresh API call."""
```

### 10.2 Integration Test

```bash
# Set BRAVE_API_KEY and test real API
BRAVE_API_KEY=BSA... python -c "
import asyncio
from tools import search_web
result = asyncio.run(search_web('latest NVIDIA news'))
print(result)
print(f'Length: {len(result)} chars')
"
```

### 10.3 Acceptance Criteria

- [ ] `web_search` returns Brave LLM Context content when API key is set
- [ ] `web_search` falls back to SearXNG when Brave fails
- [ ] `web_search` works without `BRAVE_API_KEY` (SearXNG only)
- [ ] Dashboard chimera events show `provider` field
- [ ] Research sub-agent uses Brave (fewer total searches per task)
- [ ] Cache prevents duplicate Brave API calls within 15 min
- [ ] All existing tests pass (no tool interface changes)

---

## 11. Migration Sequence

1. **Get API key:** Sign up at https://brave.com/search/api/, generate key under Search plan
2. **Implement Phase 1:** Modify `tools.py` -- add `search_brave_llm_context()`, modify `search_web()`
3. **Test locally:** Run with `BRAVE_API_KEY=BSA... python bot.py`, test via text chat
4. **Add to .env:** Add `BRAVE_API_KEY` to production `.env` on Titan
5. **Deploy:** `git push`, pull on Titan, restart Annie Voice
6. **Monitor:** Watch dashboard chimera events for `provider` field, check Brave dashboard for usage
7. **Phase 2-3:** Dashboard observability + sub-agent optimization (same session or next)

---

## 12. Open Questions

1. **Per-second rate limit on free tier:** The free tier is 1 req/sec. The research sub-agent fires searches sequentially -- will it hit this limit? If so, add a 1s delay between searches or upgrade to a higher tier.

2. **LLM Context content quality:** How good is the extracted content compared to our readability-lxml pipeline? Need to compare side-by-side on a few real queries before fully trusting it.

3. **Brave API stability:** How reliable is Brave's API? OpenClaw uses it as a primary provider, which suggests it is production-grade. Monitor for a week before removing SearXNG from docker-compose.

4. **Brave Search inside SearXNG:** SearXNG already supports Brave as a backend engine (see `searxng/settings.yml.new` lines 2576-2604). An alternative approach is to configure SearXNG to use Brave as its backend, getting Brave results through the existing SearXNG pipeline. However, this loses the LLM Context endpoint (SearXNG only proxies Web Search), so direct integration is still preferred.

---

## References

- [Brave Search API Portal](https://brave.com/search/api/)
- [Brave API Documentation](https://api-dashboard.search.brave.com/documentation)
- [Brave LLM Context Endpoint](https://api-dashboard.search.brave.com/documentation/services/llm-context)
- [Brave API Rate Limiting](https://api-dashboard.search.brave.com/documentation/guides/rate-limiting)
- [Brave API Pricing](https://api-dashboard.search.brave.com/documentation/pricing)
- [OpenClaw Brave Provider](vendor/openclaw/extensions/brave/src/brave-web-search-provider.ts) -- reference implementation
- [OpenClaw Brave Docs](vendor/openclaw/docs/tools/brave-search.md) -- config patterns
- [SearXNG Brave Engine](https://docs.searxng.org/dev/engines/online/brave.html) -- SearXNG has built-in Brave support