# Research: Robust WhatsApp Automation Alternatives

**Date:** 2026-04-07
**Status:** RESEARCHED — recommendation ready
**Motivation:** Session 11 required fixing 7 DOM selectors during deployment. Every WhatsApp Web update risks breaking Annie's WhatsApp channel. Rajesh flagged Playwright selector-chasing as fragile.

## Current Architecture (what breaks)

Annie's WhatsApp channel (`services/whatsapp-agent/`) uses:
- **Reading**: Playwright polls `.message-in` elements, parses `data-pre-plain-text` for sender/timestamp
- **Sending**: Playwright clicks `span[data-icon="plus-rounded"]`, uses `expect_file_chooser` for images, clicks `span[data-icon="send"]`
- **Session**: Persistent Chromium profile on Panda, QR code detection, Telegram alerts

**What breaks:** `data-icon` values change, `data-testid` attributes stripped entirely (confirmed 2026-04-07), modal DOM structures rearranged. Reading side is semi-stable (`.message-in`, `#pane-side`, `#main` survive updates). **Sending side is fragile** — attachment flow, media preview, and send button selectors broke 7 times in session 11.

## Approaches Evaluated

### 1. WAHA (WhatsApp HTTP API) — Self-hosted Docker

**What it is:** NestJS server with 3 swappable engines behind a unified REST API.

| Feature | Detail |
|---------|--------|
| **Engines** | WEBJS (Puppeteer + whatsapp-web.js), NOWEB (Baileys WebSocket), GOWS (Go WebSocket) |
| **Image sending** | `POST /api/sendImage` with chatId + base64/URL. No selectors in your code |
| **Ban risk** | WEBJS=low (real browser), NOWEB=medium (protocol fingerprinting), GOWS=medium |
| **Selector stability** | NOWEB/GOWS: no selectors at all. WEBJS: whatsapp-web.js team handles updates |
| **Self-hosting** | Docker, 2 CPU / 4 GB RAM (WEBJS with Chromium), ~200 MB for NOWEB. amd64 official, ARM64 community image |
| **License** | Core free (Apache 2.0). Plus version (media conversion, multi-session) is paid |
| **GitHub** | devlikeapro/waha, ~6.4K stars, last commit 2026-04-06, releases 2-4x/month |
| **Session** | QR via REST API, session persistence, health endpoints |
| **Key advantage** | Engine-swappable: if WEBJS breaks, switch to NOWEB with zero code changes |

**Verdict:** Best abstraction layer. NOWEB engine eliminates selectors entirely. REST API is language-agnostic (Python httpx calls). Runs on Panda (ARM64 community image, NOWEB engine only ~200 MB).

### 2. Baileys (WhiskeySockets/Baileys) — Protocol-level

| Feature | Detail |
|---------|--------|
| **How it works** | Reverse-engineers WhatsApp's binary Noise protocol over WebSocket. No browser |
| **Image sending** | Direct protocol: encrypt client-side, upload to WhatsApp CDN, send message with media URL |
| **Ban risk** | SIGNIFICANT. Multiple 2025-2026 reports: bans on status uploads, bulk messaging, repeated number bans |
| **Maintenance** | Active (last commit 2026-03-27). v7.0.0-rc.9 had breaking changes. Protocol updates 2-4x/year |
| **Language** | TypeScript/npm only. **Python port:** Neonize (krypton-byte/neonize, 364 stars, built on Whatsmeow Go, Apache 2.0) |
| **GitHub** | 8.9K stars, 308 open issues, MIT license |

**Verdict:** Maximum capability but highest ban risk. Direct protocol use = WhatsApp can detect non-browser clients. NOT recommended for Rajesh's personal number. Neonize (Python/Go) is interesting but inherits the same ban risk.

### 3. whatsapp-web.js — Browser + Internal Module Injection

| Feature | Detail |
|---------|--------|
| **How it works** | Puppeteer + hooks into WhatsApp's internal webpack modules (`WAWebCmd`, `WAWebCollections`) |
| **Image sending** | Via WhatsApp's internal `sendMessage` JS function, NOT DOM clicks. `MessageMedia` object (base64) |
| **Ban risk** | Lower than Baileys (real browser session). Same as manual WhatsApp Web use |
| **Key insight** | Does NOT chase DOM selectors. Calls WhatsApp's own internal JS APIs via `page.evaluate()` |
| **Maintenance** | Very active (last commit 2026-04-06). Regular releases (v1.34.6 Jan 2026) |
| **GitHub** | 21.6K stars, most popular WhatsApp automation library |

**Verdict:** The architectural lesson here is critical — it uses a browser like our Playwright but **avoids selectors by hooking into WhatsApp's internal JavaScript modules**. This is why it survives updates that break raw selector-based automation.

### 4. WPPConnect / wa-js — Webpack Module Export

| Feature | Detail |
|---------|--------|
| **How it works** | Extracts 100+ internal functions from WhatsApp's webpack bundle |
| **Functions** | `addAndSendMsgToChat`, `createChat`, `sendTextMsgToChat`, etc. |
| **Stability** | Operates at function API level, not DOM. Most resilient browser-based approach |
| **GitHub** | WPPConnect: 3.3K stars, wa-js: 713 stars |

**Verdict:** Most granular control. Could be injected into our existing Playwright page to replace DOM selectors for sending. Reading would stay via DOM (proven stable).

### 5. ADB Intent-Based Sending (Android)

| Feature | Detail |
|---------|--------|
| **Text only** | `am start -a android.intent.action.VIEW -d "https://api.whatsapp.com/send?phone=...&text=..."` — lands on confirmation screen |
| **Image + contact targeting** | `am start -a android.intent.action.SEND -t "image/*" --es jid "$phone@s.whatsapp.net" --eu android.intent.extra.STREAM "content://..."` — opens share preview, one tap to send |
| **Ban risk** | Zero (real app, real UI interaction) |
| **Contact picker bypass** | YES — `jid` extra format `$phone@s.whatsapp.net` targets specific contact |
| **Automation gap** | Still requires one u2 tap on Send button after intent fires |
| **Deep link limitation** | `whatsapp://send` works for text ONLY, cannot attach images |

**Verdict:** Zero ban risk, but NOT fire-and-forget. Still needs u2 for the final Send tap. Better than pure u2 scraping (bypasses navigation) but still brittle for the last mile. Good fallback option.

### 6. AI-Based Mobile Automation (DroidRun)

- EUR 2.1M funded, uses Android Accessibility Services + LLM to parse UI tree
- 43% success rate across 65 real-world tasks (2026 benchmarks)
- **Not production-grade** for reliable automation yet. Watch for future development.

## Selector Stability Ranking (Community Consensus)

From most to least stable:

1. **Internal webpack modules** (whatsapp-web.js, WPPConnect) — survive most updates
2. **`data-icon` attributes** (`span[data-icon="send"]`) — change infrequently, icon-rendering tied
3. **Structural IDs** (`#main`, `#pane-side`, `#side`) — reasonably stable
4. **`aria-label`** — changes with i18n and redesigns
5. **`data-testid`** — **STRIPPED entirely** as of 2026-04-07 (confirmed in our codebase)
6. **Class names** — most fragile, obfuscated/hashed, change every build

Our current code uses tiers 2-3 for reading (stable) and tiers 2-5 for sending (fragile).

## Comparison Table

| Approach | Ban Risk | Selector Fragility | Image Send | Maintenance | Python Support | Migration Effort |
|----------|----------|-------------------|------------|-------------|----------------|-----------------|
| **Current (Playwright)** | Low | HIGH (7 breaks/deploy) | DOM clicks | Us | Native | N/A |
| **WAHA NOWEB** | Medium | NONE (WebSocket) | REST API | WAHA team | httpx | Medium |
| **WAHA WEBJS** | Low | LOW (wwebjs handles) | REST API | WAHA team | httpx | Medium |
| **Baileys/Neonize** | HIGH | NONE (WebSocket) | Protocol | Community | Neonize | High |
| **whatsapp-web.js** | Low | LOW (internal JS) | Internal JS | Active | None (Node) | High |
| **WPPConnect wa-js** | Low | VERY LOW (webpack) | Internal JS | Active | None (Node) | Medium |
| **ADB Intent + u2** | Zero | LOW (one button) | Intent + tap | Us | u2 | Low |
| **Hybrid: Read=Playwright, Send=wa-js** | Low | READ=medium, SEND=very low | Internal JS | Partial | Playwright(Py) + JS inject | **LOW** |

## Recommendation

### Primary: Hybrid — Playwright reading + wa-js injection for sending

**Why this wins for her-os:**

1. **Reading stays Playwright** — `.message-in`, `data-pre-plain-text`, `#pane-side` are the most stable selectors (tier 2-3). Reading has not broken once across 6 sessions.

2. **Sending uses wa-js injection** — Instead of clicking DOM elements for attachment/send, inject WPPConnect's `wa-js` library into the Playwright page and call `WPP.chat.sendFileMessage(chatId, content)` directly. This bypasses all modal/attachment/preview selectors.

3. **Zero new infrastructure** — No Docker container (WAHA), no new process, no Node.js dependency. wa-js is injected via `page.evaluate()` into the existing Playwright Chromium instance.

4. **Ban risk unchanged** — Still a real browser session, same as current approach.

5. **Migration is surgical** — Only `wa_sender.py` changes. `wa_web.py` (reading), `wa_browser.py` (session), `agent.py` (poll loop) stay untouched.

### Fallback: WAHA NOWEB engine

If the hybrid approach proves too complex or wa-js injection is unreliable:
- Deploy WAHA Docker on Panda (NOWEB engine, ARM64 community image, ~200 MB)
- Replace `wa_sender.py` with httpx calls to `POST /api/sendImage`, `POST /api/sendText`
- Keep Playwright for reading (or switch to WAHA webhooks for incoming messages)
- Tradeoff: NOWEB has medium ban risk vs our current low risk

### NOT Recommended

- **Baileys/Neonize direct** — Ban risk too high for personal number
- **Full WAHA replacement** — Over-engineered for our single-account, 5 msgs/day use case
- **ADB intent only** — Still needs u2 tap, doesn't fully solve the problem
- **DroidRun** — Not production-grade (43% success rate)

## Migration Path (wa-js injection)

1. **Research wa-js API** — Document `WPP.chat.sendTextMessage()`, `WPP.chat.sendFileMessage()`, `WPP.chat.find()` APIs
2. **Proof of concept** — Inject wa-js into existing Playwright page, send one text message
3. **Migrate text sending** — Replace `wa_sender.send_message()` internals with wa-js call
4. **Migrate image sending** — Replace `wa_sender.send_image()` file chooser flow with `WPP.chat.sendFileMessage()`
5. **Remove fragile selectors** — Delete `attach_button`, `attach_photos`, `media_preview`, `media_caption_box`, `media_send_button` from SELECTORS dict
6. **Keep reading selectors** — `.message-in`, `data-pre-plain-text`, `compose_box` stay (proven stable)

## Anti-Pattern: WhatsApp Cloud API

WhatsApp Business Cloud API (official Meta) requires a **separate business number** — cannot use Rajesh's personal number. Best long-term option if a second number is acceptable, but out of scope for current constraint (single personal number, zero ban risk).

## References

- WAHA: devlikeapro/waha (GitHub, 6.4K stars)
- Baileys: WhiskeySockets/Baileys (GitHub, 8.9K stars)
- whatsapp-web.js: pedroslopez/whatsapp-web.js (GitHub, 21.6K stars)
- WPPConnect/wa-js: wppconnect-team/wa-js (GitHub, 713 stars)
- Neonize: krypton-byte/neonize (GitHub, 364 stars, Python/Go Baileys port)
- Healenium: healenium/healenium (GitHub, 198 stars, self-healing selectors)
- DroidRun: droidrun/droidrun (GitHub, EUR 2.1M funded, 43% task success)
