# Research: ADB Automation Stack for Annie — Android Phone Control

**Date:** 2026-03-31
**Context:** Annie needs to programmatically control a Google Pixel 9a (Android 15/16) from Titan
(Ubuntu/aarch64 NVIDIA DGX Spark) over home WiFi. Use cases: UPI payments, shopping apps
(Swiggy, Zomato, Uber, Porter, Amazon, Flipkart), calls on behalf of Rajesh.
**Status:** Research complete. Architecture recommendation at bottom.

---

## 1. ADB (Android Debug Bridge) — The Foundation Layer

ADB is the universal bedrock of all Android automation. Every other tool in this document sits on top of it.

### 1.1 Wireless ADB: Two Modes

**Mode A — Legacy TCP/IP (`adb tcpip 5555`)**

- Requires a one-time USB cable bootstrap: plug in, run `adb tcpip 5555`, unplug
- Phone now listens on port 5555 over WiFi
- Connect from Titan: `adb connect <phone-ip>:5555`
- No pairing code needed after initial USB setup
- **Vulnerability**: Unauthenticated — anyone on the LAN can connect once enabled
- **Reboot behavior**: TCP/IP mode is lost on reboot. Must re-plug USB each time.
- **Android 15**: Works but is the legacy path

**Mode B — Native Wireless Debugging (Android 11+, recommended)**

- No USB cable needed once initial pairing is done
- Enabled in: Settings → Developer Options → Wireless Debugging
- Two-step process:
  1. `adb pair <phone-ip>:<pairing-port>` (one-time, enter 6-digit code shown on phone)
  2. `adb connect <phone-ip>:<debug-port>` (ongoing, different port than pairing port)
- **Security**: TLS-encrypted + RSA key authentication. Only paired hosts can connect.
- **Reboot behavior**: Wireless Debugging toggle turns OFF on reboot by default. See §1.2 for workarounds.
- **Android 15/Pixel 9a**: Passes the trusted-WiFi auto-reconnect feature (rolling out via Android 16 QPR / update; Pixel gets it first via mDNS)

**Which to use:** Native Wireless Debugging (Mode B) for production. It is encrypted, key-authenticated, and survives network drops. Legacy tcpip mode is fine for a quick dev setup but should not be the permanent path.

### 1.2 ADB Persistence Across Reboots — The Problem and Solutions

The critical pain point: after every Pixel 9a reboot, Wireless Debugging is disabled and Annie loses her ADB connection.

**Option 1 — Shizuku (recommended, no root)**
- Install Shizuku from Play Store
- Shizuku v13.6+ (Jul 2025) supports auto-start without root on Android 13+ when on trusted WiFi
- Shizuku acts as an ADB-level privileged daemon that survives across some reboot scenarios
- Combine with **Automate app** flow: triggers Shizuku on boot + WiFi connect + wake from doze
- Limitation: still requires one manual setup after a cold reboot on non-rooted devices

**Option 2 — adb-auto-enable (GitHub: mouldybread/adb-auto-enable)**
- Small app that re-enables ADB wireless + `tcpip 5555` automatically on device boot
- Uses an Android service that starts on boot
- No root required, but phone must be trusted (past initial ADB pairing)

**Option 3 — Google's Native Auto-Reconnect (rolling out)**
- Android 16 QPR / platform updates bringing mDNS-based auto-reconnect on trusted WiFi networks
- Once shipped to Pixel 9a (Android 16 QPR3 or later), Wireless Debugging will stay on persistently
- Not yet fully available as of March 2026 but actively being rolled out

**Option 4 — USB always-connected (fallback)**
- Keep a USB cable from Titan to phone permanently
- Boot → USB ADB connects instantly, reliable, zero config
- Downside: cable management, port wear over years
- `adb tcpip 5555` can be triggered from Titan via a cron/startup script automatically

**Recommended for Annie**: Option 1 (Shizuku + Automate) as primary, USB cable as physical fallback for cold reboots. When Google's native auto-reconnect ships to Pixel 9a, switch fully to that.

### 1.3 ADB Commands for Phone Control

**Tap and Gestures:**
```bash
adb shell input tap 540 960              # Tap at pixel coordinates (x, y)
adb shell input swipe 540 1600 540 400 300   # Swipe up (from → to, duration ms)
adb shell input swipe 100 960 900 960 200    # Swipe right (fling)
adb shell input touchscreen swipe 540 1600 540 400   # Long swipe
adb shell input touchscreen tap 540 960     # Same as input tap
```

**Text Input:**
```bash
adb shell input text "hello"             # Simple text (no spaces, no special chars)
adb shell input text "hello\ world"      # Space requires backslash escape
# For complex text with special chars, use clipboard approach:
# 1. Set clipboard via uiautomator2 or Appium
# 2. Then Ctrl+V: adb shell input keyevent 279
```

**Hardware Buttons:**
```bash
adb shell input keyevent KEYCODE_HOME       # Home button
adb shell input keyevent KEYCODE_BACK       # Back button
adb shell input keyevent KEYCODE_RECENTS    # Recent apps
adb shell input keyevent KEYCODE_POWER      # Power/wake
adb shell input keyevent KEYCODE_VOLUME_UP
adb shell input keyevent KEYCODE_ENTER
adb shell input keyevent KEYCODE_DPAD_UP    # D-pad navigation
```

**Screen Content:**
```bash
# Screenshot — slow (save + pull, ~1-2s)
adb shell screencap -p /sdcard/sc.png && adb pull /sdcard/sc.png

# Screenshot — fast (stream direct, ~0.3-0.7s over WiFi)
adb exec-out screencap -p > /tmp/screenshot.png

# UI hierarchy dump (XML with all element properties)
adb shell uiautomator dump /sdcard/ui.xml && adb pull /sdcard/ui.xml

# Or stream directly:
adb exec-out uiautomator dump /dev/tty
```

**App Management:**
```bash
adb install app.apk                          # Install APK
adb uninstall com.package.name               # Uninstall
adb shell am start -n com.package.name/.MainActivity   # Launch app
adb shell am start -a android.intent.action.VIEW -d "tel:+911234567890"  # Dial
adb shell pm list packages                   # List installed packages
adb shell pm list packages | grep gpay       # Find GPay package
adb shell am force-stop com.package.name     # Kill app
adb shell monkey -p com.package.name -c android.intent.category.LAUNCHER 1  # Launch alternate
```

**Screen State:**
```bash
adb shell dumpsys window | grep -E 'mCurrentFocus|mFocusedApp'  # What's on screen
adb shell dumpsys activity | grep mResumedActivity               # Current activity
adb shell dumpsys display | grep mScreenState                    # Screen on/off
adb shell input keyevent KEYCODE_WAKEUP                          # Wake screen
adb shell svc power stayon true                                   # Keep screen on while charging
```

### 1.4 `input tap` vs `sendevent` — Which to Use?

| | `adb shell input tap` | `sendevent` |
|--|--|--|
| Layer | Java framework (high-level) | Linux kernel input subsystem (low-level) |
| Latency per call | ~200ms | ~100ms |
| Reliability | Moderate — can miss fast UI | Higher for sustained gestures |
| Complexity | Simple coordinates | Requires knowing device's input event format |
| Multi-touch | Not supported natively | Fully supported |
| Recommendation | **Use for Annie** | Only for advanced gestures |

**Verdict for Annie**: Use `adb shell input tap/swipe/text` for all basic automation. Use `sendevent` only if you need precise multi-touch gestures (e.g., pinch-to-zoom on a map). The ~100ms speed difference is irrelevant for AI-driven automation where the bottleneck is decision-making, not command execution.

### 1.5 ADB WiFi Latency

- **Raw command latency over WiFi 6 (home LAN)**: 150-300ms per `adb shell input tap` command
- **Screenshot via `adb exec-out screencap`**: 300-700ms over WiFi (PNG encoding on phone + transfer)
- **uiautomator dump**: 500ms-2s (XML generation is slow)
- **Comparison with USB**: USB is ~10x faster for transfers, but for command latency difference is 50-150ms
- **For AI automation**: Irrelevant. Annie's LLM decision time (1-5s) swamps ADB command latency

### 1.6 ADB Security on Home Network

**Is it encrypted?**
- Legacy `adb tcpip 5555`: **NOT encrypted**. Plaintext TCP. Anyone on the LAN can connect. Do NOT use on guest WiFi or shared networks.
- Native Wireless Debugging (Android 11+): **TLS encrypted + RSA key authenticated**. Only hosts with your RSA key (stored in `~/.android/adbkey`) can connect.

**Best practices for Annie's home network:**
```bash
# Verify your ADB keys
ls ~/.android/adbkey*              # Should have adbkey and adbkey.pub

# Use native wireless debugging (not tcpip 5555) on home network
# Network-level isolation: put phone on dedicated IoT VLAN if paranoid
# Never enable ADB when connected to guest/public WiFi on phone

# Check what hosts are authorized on the phone
adb shell cat /data/misc/adb/adb_keys   # (requires root to view)
```

---

## 2. python-uiautomator2 (openatx/uiautomator2) — Recommended Primary

**GitHub:** https://github.com/openatx/uiautomator2
**PyPI:** `uiautomator2` — latest v3.2.9 (Jan 2026)
**Maintenance:** Actively maintained. v3.x released Jan 2026 after maintainer resumed after a gap.

### 2.1 Architecture

```
Titan (Python) ←→ HTTP/JSON-RPC ←→ atx-agent (on phone) ←→ UIAutomator2 APK ←→ Android UI
```

- **atx-agent**: A Go binary that runs as an HTTP server on the phone. Auto-installed via `u2.init()`.
- **app-uiautomator.apk**: Tiny test APK that exposes UIAutomator2 framework over HTTP.
- **Communication**: Pure HTTP — no ADB for each command after initial setup. Much faster.
- **Connection**: Can use ADB (USB or WiFi) for initial setup, then pure TCP/HTTP for automation.

### 2.2 Setup on Titan (aarch64)

```bash
# Install Python client
pip install uiautomator2

# Install components on phone (one-time, phone connected via ADB)
python -m uiautomator2 init

# This installs:
# - atx-agent binary on phone
# - app-uiautomator.apk (UIAutomator service)
# - app-uiautomator-test.apk (test runner)
# - minicap (optional, for fast screenshots)
# - minitouch (optional, for fast touch input)
```

aarch64 compatibility: The Python client is pure Python, installs fine on Titan. The `atx-agent` binary runs on the Android phone (not on Titan), so architecture of Titan doesn't matter.

### 2.3 Python Usage

```python
import uiautomator2 as u2

# Connect via ADB (phone connected via WiFi ADB)
d = u2.connect()          # auto-detect via ADB
d = u2.connect('192.168.1.42')   # connect via phone IP directly (after atx-agent running)

# Basic interactions
d.screen.on()             # Wake screen
d(text="Settings").click()                    # Find by text and click
d(resourceId="com.android.settings:id/search").click()   # Find by resource ID
d(description="Search").click()               # Find by content-desc (accessibility)
d(className="android.widget.Button").click()  # Find by class

# Type text
d.send_keys("hello world")   # Type into focused field (handles spaces)
d.clear_text()                # Clear text field

# Gestures
d.click(540, 960)             # Tap at coordinates
d.double_click(540, 960)      # Double tap
d.long_click(540, 960)        # Long press
d.swipe(540, 1600, 540, 400, duration=0.5)   # Swipe up
d.swipe_ext("up")             # Named direction swipe

# Screenshots
img = d.screenshot()          # PIL Image object
img.save("/tmp/screen.png")
screenshot_bytes = d.screenshot(format='raw')   # Raw bytes, faster

# UI inspection
dump = d.dump_hierarchy()     # XML of current screen
info = d(text="Buy Now").info  # Element properties (bounds, enabled, etc.)

# App lifecycle
d.app_start("com.google.android.apps.nbu.paisa.user")   # Start GPay
d.app_stop("com.google.android.apps.nbu.paisa.user")    # Stop app
d.app_current()               # Returns currently running app info

# Wait for element
d(text="OK").wait(timeout=10)        # Wait up to 10s for element
d.wait_activity(".MainActivity", timeout=5)   # Wait for activity

# Scroll
d(scrollable=True).scroll.to(text="Target Item")   # Scroll to element
d(className="android.widget.ListView").scroll(steps=10)  # Scroll N steps

# Press buttons
d.press("home")
d.press("back")
d.press("recent")
```

### 2.4 Element Finding Strategies (priority order)

1. `resourceId` — most reliable: `d(resourceId="com.app.package:id/button_confirm")`
2. `text` — good for visible labels: `d(text="Place Order")`
3. `description` — accessibility label: `d(description="Send money")`
4. `className` + index: `d(className="android.widget.Button", index=0)`
5. `xpath` — last resort (slow): `d.xpath('//android.widget.Button[@text="OK"]').click()`

### 2.5 Performance

- **Command latency**: 50-150ms per action (HTTP to atx-agent, faster than raw ADB)
- **Screenshot**: ~100-300ms via atx-agent (uses minicap when available, much faster than screencap)
- **UI dump**: 200-800ms (UIAutomator XML generation)
- **Connection overhead**: Near zero — HTTP persistent connection after setup

### 2.6 Pros and Cons

| Pros | Cons |
|------|------|
| Pure Python, no Node.js, no Java required | Banking apps may detect UIAutomator instrumentation |
| Faster than Appium (no WebDriver overhead) | atx-agent binary must run on phone (auto-starts) |
| Works without display on Titan | Phone screen must be on (or woken) for UI interaction |
| aarch64 compatible (Python is Python) | Limited cross-platform (Android only) |
| Active maintenance (v3.x, Jan 2026) | Some complex gestures need workarounds |
| Good documentation, large Chinese community | |
| Find elements by text, ID, description, xpath | |

---

## 3. Appium — The Industry Standard (but heavier)

### 3.1 What is Appium?

Appium is a cross-platform mobile automation framework. For Android, it wraps UIAutomator2 internally as its driver. The stack:

```
Python test code → Appium Client → Appium Server (Node.js) → UIAutomator2 Driver → Android device
```

Appium is the "test framework" layer above UIAutomator2. UIAutomator2 actually does the work.

### 3.2 Architecture vs uiautomator2

```
uiautomator2:  Python ←→ HTTP/JSON-RPC ←→ atx-agent ←→ UIAutomator2
Appium:        Python ←→ WebDriver ←→ Appium Server (Node.js) ←→ UIAutomator2 Driver ←→ UIAutomator2
```

Appium adds two extra hops: WebDriver protocol + Node.js server. This is why uiautomator2 is faster.

### 3.3 Setup on Titan (aarch64)

```bash
# 1. Install Node.js (ARM64 build from NodeSource)
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs

# 2. Install Appium 2.x globally
npm install -g appium

# 3. Install UIAutomator2 driver
appium driver install uiautomator2

# 4. Install Java (required by UIAutomator2 driver)
sudo apt install default-jdk

# 5. Set Android SDK path
export ANDROID_HOME=~/Android/sdk
export PATH=$PATH:$ANDROID_HOME/platform-tools

# 6. Start Appium server
appium --port 4723 &

# 7. Python client
pip install Appium-Python-Client
```

aarch64 compatibility: Node.js 20+ has full ARM64 builds. Appium itself is pure JavaScript. The UIAutomator2 driver deploys an APK to the Android device, which is ARM (phone), not Titan. Compatible.

### 3.4 Python Usage

```python
from appium import webdriver
from appium.webdriver.common.appiumby import AppiumBy
from appium.options import UiAutomator2Options

options = UiAutomator2Options()
options.platform_name = "Android"
options.device_name = "Pixel9a"
options.automation_name = "UiAutomator2"
options.app_package = "com.google.android.apps.nbu.paisa.user"  # GPay
options.app_activity = ".activity.HomeActivity"
options.no_reset = True    # Don't reset app state between sessions

driver = webdriver.Remote("http://localhost:4723", options=options)

# Find elements
send_btn = driver.find_element(AppiumBy.ACCESSIBILITY_ID, "Send money")
pay_btn = driver.find_element(AppiumBy.ANDROID_UIAUTOMATOR,
    'new UiSelector().text("Pay")')
confirm = driver.find_element(AppiumBy.XPATH,
    '//android.widget.Button[@text="Confirm"]')

# Interact
send_btn.click()
amount_field = driver.find_element(AppiumBy.ID, "amount_input")
amount_field.send_keys("500")
driver.press_keycode(66)   # ENTER key

# Take screenshot
driver.save_screenshot("/tmp/gpay.png")

# Cleanup
driver.quit()
```

### 3.5 Appium vs uiautomator2 — When to Use Which

| Criterion | uiautomator2 | Appium |
|-----------|-------------|--------|
| Setup complexity | Low (pip install) | High (Node.js, Java, SDK) |
| Speed | Faster (~50-150ms/cmd) | Slower (~150-300ms/cmd) |
| Cross-platform | Android only | Android + iOS (same code) |
| Python support | Native | Via appium-python-client |
| Server required | No (embedded HTTP) | Yes (Node.js server) |
| Resources on Titan | Minimal | ~200MB Node.js + Java |
| Banking apps | Same detection risk | Same detection risk |
| iOS support | No | Yes |
| Maintenance | Active (2026) | Active (industry standard) |

**Verdict for Annie**: Use **uiautomator2** as the primary library. It is faster, lighter, needs no Node.js server, and is architecturally simpler for Annie's single-device control use case. Appium would only be worth it if Annie ever needs to control an iOS device too.

---

## 4. scrcpy and py-scrcpy-client — Visual Control Layer

### 4.1 What scrcpy Provides

scrcpy (GitHub: Genymobile/scrcpy) is the gold standard for real-time Android screen mirroring and control. It uses a custom binary protocol over ADB, dramatically faster than screencap.

Key capability for Annie: **near-instant screenshots at ~33ms** via its binary streaming protocol.

### 4.2 Headless Mode on Titan

```bash
# Install scrcpy on Titan (ARM64 build available)
sudo apt install scrcpy

# Run with no display (server-side automation, no GUI needed)
scrcpy --no-display --record=/tmp/capture.mkv   # Record without showing window

# Screenshot via scrcpy (using --no-display + screen buffer access)
# Better approach: use py-scrcpy-client (see §4.4)
```

scrcpy `--no-display` works on servers with no display/GPU. It runs the scrcpy server component on the phone and connects over ADB TCP — no X11, no Wayland, no virtual display needed on Titan.

### 4.3 scrcpy-mcp — AI Integration

**GitHub: JuanCF/scrcpy-mcp** provides an MCP server exposing 34 tools for AI control of Android:
- Screenshot (via scrcpy binary protocol, ~33ms)
- Tap, swipe, type
- App launch/stop
- UI element inspection (falls back to ADB uiautomator dump)
- Shell command execution

Annie could use this as an MCP tool within Claude Code calls. However, for Annie's Python-native architecture, py-scrcpy-client is more appropriate.

### 4.4 py-scrcpy-client — Python Library

**GitHub: leng-yue/py-scrcpy-client**
**PyPI:** `pyscrcpy`

```python
# Install
pip install pyscrcpy

# Connect and capture frames
import pyscrcpy

client = pyscrcpy.Client(device="192.168.1.42:5555")
client.start(threaded=True)

# Get current frame (PIL Image, ~33ms latency)
frame = client.last_frame  # numpy array (H, W, 3)

# Send touch events (fast binary protocol, not ADB input)
client.control.touch(540, 960, pyscrcpy.ACTION_DOWN)
client.control.touch(540, 960, pyscrcpy.ACTION_UP)

# Send key
client.control.keycode(pyscrcpy.KEYCODE_HOME)

client.stop()
```

**Limitation**: py-scrcpy-client requires Python <3.11 (as of last release). Check for newer forks if on 3.12.

**adbnativeblitz alternative (PyPI: `adbnativeblitz`)**: Claims screenshots as fast as scrcpy but 100% native ADB, no scrcpy server needed. Worth evaluating if py-scrcpy-client has Python version conflicts.

### 4.5 Screenshot Speed Comparison

| Method | Latency | Notes |
|--------|---------|-------|
| `adb screencap + pull` | 1-2s | Worst: writes to phone storage, then transfers |
| `adb exec-out screencap -p` | 300-700ms | Better: streams directly, no phone storage write |
| uiautomator2 (minicap) | 100-300ms | Good: minicap uses H.264 stream |
| scrcpy / py-scrcpy-client | ~33ms | Best: binary protocol, real-time stream |
| adbnativeblitz | 50-150ms | Near-scrcpy speed, pure ADB |

**Recommendation for Annie**: Use scrcpy/py-scrcpy-client for real-time vision tasks (e.g., reading what's on screen before tapping). Use uiautomator2 for element-finding and interaction (structured access). The two complement each other.

---

## 5. Tasker + AutoInput — On-Device Automation

Tasker is the most powerful on-device automation app for Android ($3.49 one-time).

### 5.1 Remote Triggering via ADB Intent

```bash
# Send broadcast intent to trigger a Tasker task named "annie_order_coffee"
adb shell am broadcast -a net.dinglisch.android.taskerm.ACTION_TASK -e task_name "annie_order_coffee"

# Alternatively, Tasker can listen for HTTP requests (via AutoRemote plugin, ~$3)
# AutoRemote runs an HTTP server on the phone; POST to trigger tasks
curl -X POST "http://192.168.1.42:4444/send?key=AUTOREMOTE_KEY&message=order_coffee"
```

### 5.2 AutoInput Plugin — UI Automation

AutoInput (by João Dias, $3) enables Tasker to:
- Click any UI element by text, ID, or position
- Type into any text field
- Read screen content
- Works as an accessibility service — can see ALL app content including banking apps

```
Tasker task:
1. AutoInput Action → Click element "Send Money" in GPay
2. Wait 500ms
3. AutoInput Action → Input text in "Amount" field: "500"
4. AutoInput Action → Click "Confirm"
```

### 5.3 Pros and Cons vs ADB-Based Automation

| | Tasker + AutoInput | ADB/uiautomator2 |
|--|--|--|
| Banking app support | **YES** (accessibility service) | Partially (detection risk) |
| Remote trigger | Via HTTP/intent | Via Python directly |
| Visual scripting | Yes (GUI editor) | No (code only) |
| Reliability | High (on-device) | Medium (network RTT) |
| Programming integration | Poor (not Python) | Native Python |
| Setup | Install 2 apps | Python script |
| Latency | Low (on-device) | 150-300ms per command |
| Debugging | Tasker logs | Python print/logging |

**Verdict for Annie**: Tasker + AutoInput is compelling for UPI and banking apps that detect instrumentation. However, it's a secondary automation layer — Annie can't drive Tasker from Python as naturally as uiautomator2. Consider for banking-specific flows triggered via ADB intent or HTTP.

---

## 6. MacroDroid — Simpler Alternative to Tasker

MacroDroid (free / $9.99 premium) runs an HTTP webhook server on the phone.

```bash
# Trigger MacroDroid macro remotely
curl "http://192.168.1.42:1234/macro?name=order_coffee"
```

**Comparison with Tasker:**
- MacroDroid: Simpler setup, built-in HTTP server, good for simple trigger → action flows
- Tasker: More powerful, better plugin ecosystem, needed for complex logic + AutoInput

**Verdict for Annie**: MacroDroid is useful for simple "wake phone and open app" triggers. For complex UI automation, Tasker + AutoInput wins. For Annie's Python-centric architecture, use MacroDroid only for the simplest trigger cases.

---

## 7. Android Accessibility Service — Nuclear Option

### 7.1 What It Is

A custom Android Accessibility Service is an APK you write and install that Android grants system-level UI access to — it can read and interact with ALL apps including banking, UPI, and system apps.

### 7.2 Capabilities

- Read the **complete UI hierarchy** of any app (including apps that block UIAutomator)
- Perform gestures: tap, swipe, long-press
- Inject text into any text field
- Monitor screen changes in real-time (unlike screencap polling)
- Works even when apps detect UIAutomator (accessibility service uses a different API path)

### 7.3 How to Deploy via ADB

```bash
# 1. Build the accessibility service APK (Android Studio project)
./gradlew assembleDebug

# 2. Install via ADB
adb install app/build/outputs/apk/debug/app-debug.apk

# 3. Enable via ADB (no UI tap needed)
adb shell settings put secure enabled_accessibility_services \
  com.yourpackage/.YourAccessibilityService
adb shell settings put secure accessibility_enabled 1

# 4. Send commands to your service via broadcast
adb shell am broadcast -a com.yourpackage.AUTOMATION \
  --es action "tap" --ei x 540 --ei y 960
```

### 7.4 Does This Trigger Banking App Warnings?

**Critical findings from NPCI's 2025 security framework:**

- NPCI mandates banks block apps on **rooted devices** — Pixel 9a with locked bootloader is fine
- NPCI mandates blocks for **USB debug / ADB connections** — this is the KEY problem
- GPay/PhonePe will **detect and block** if USB debugging (ADB) is enabled
- They check: `Settings.Global.ADB_ENABLED == 1`
- However: an **accessibility service** (not ADB) is used differently — accessibility-based tapping is how screen readers work and apps cannot block it without breaking accessibility compliance
- The risk: some banking apps detect accessibility services and show a warning or refuse to work

**Practical reality (2025):**
- Most Indian UPI apps (GPay, PhonePe, Paytm, BHIM) work with accessibility services enabled
- They are more likely to block if they detect specific automation accessibility services (e.g., ones with "Automation" in the name)
- A service named "Annie Voice Assistant" is less likely to trigger detection than "UIAutomator Server"
- **Guaranteed path**: Disable ADB when actually performing UPI transactions. Use a script: disable ADB → open app → perform UPI → re-enable ADB.

### 7.5 The ADB-Disable UPI Flow

This is the key architectural insight for UPI payments:

```python
# On Titan (Annie's tool implementation)
def perform_upi_payment(amount: float, upi_id: str) -> bool:
    # 1. Disable ADB so banking apps don't detect it
    device.shell("settings put global adb_enabled 0")
    time.sleep(2)

    # 2. Open GPay via a pre-set Tasker macro triggered earlier
    # (Tasker doesn't require ADB once set up; it's a local automation)
    trigger_tasker_via_wifi(task="gpay_payment", amount=amount, upi_id=upi_id)

    # Wait for Tasker to complete
    wait_for_tasker_completion(timeout=120)

    # 3. Re-enable ADB
    # NOTE: Can't re-enable via adb shell if ADB is off.
    # Solution: use scheduled Tasker task to re-enable ADB after payment completes
    # OR: use USB connection to re-enable
```

**Better approach**: Keep ADB wireless debugging ON but in **native mode** (TLS encrypted). NPCI security checks may focus on `adb_enabled` (legacy tcpip) rather than the newer `wireless_adb_enabled`. Test both.

---

## 8. AI Vision-Based Control — The DroidRun/AppAgent Architecture

For complex navigation where element IDs are unknown or change between app versions, vision-based control is the most robust approach.

### 8.1 DroidRun — Best Structured Approach

**GitHub:** https://github.com/droidrun/droidrun
**PyPI:** `droidrun`
**Status:** 3.8k stars, 43-91% AndroidWorld benchmark, actively developed (2025/2026)

Architecture:
```
Annie asks "send ₹500 to Rajesh's UPI"
→ DroidRun manager extracts UI structure (accessibility tree as structured text)
→ LLM receives structured text (NOT a screenshot) + decides action
→ DroidRun executor performs action via ADB/UIAutomator2
→ Loop until task complete
```

Key advantage: Uses **accessibility APIs** for structured text UI representation — feeds LLM text not images. Much faster and cheaper than vision-based (AppAgent: 180s/task vs DroidRun manager-executor loop: ~30-60s).

```python
# DroidRun usage (simplified)
from droidrun import DroidAgent

agent = DroidAgent(
    llm="anthropic/claude-3-5-haiku",   # Fast + cheap for simple steps
    device_id="192.168.1.42:5555"
)

result = await agent.run("Open GPay and send ₹500 to user@upi")
```

DroidRun supports Anthropic, OpenAI, Gemini, Ollama — Annie can route this to Nemotron on Beast or Claude for complex flows.

### 8.2 AppAgent — Vision-Based (Slower but More Universal)

**GitHub:** TencentQQGYLab/AppAgent

- Takes screenshot → draws numbered bounding boxes on UI elements → sends to vision LLM → gets click target
- Works on ANY app including those that block accessibility APIs
- Latency: ~180s per task (screenshot → GPT-4V → action → repeat)
- Not suitable for Annie's real-time use cases

**Verdict**: AppAgent is too slow (180s per step). Use DroidRun for AI-driven flows.

### 8.3 Combined Vision Architecture for Annie

```
For structured apps (most):  uiautomator2 element finding → fast, reliable
For dynamic/new apps:        DroidRun accessibility text → LLM → action
For visual verification:     adb exec-out screencap → Nemotron VLM on Titan
For banking (sensitive):     Tasker + AutoInput (accessibility service path)
```

---

## 9. Security Summary

### 9.1 ADB Security Model

| Mode | Encrypted | Authenticated | Recommended |
|------|-----------|--------------|-------------|
| `adb tcpip 5555` (legacy) | No | No (after initial auth) | Home network only, risky |
| Native Wireless Debugging | TLS | RSA keys | Yes |
| USB | N/A | USB host trust | Most secure |

### 9.2 Key Security Practices

1. **Use Native Wireless Debugging** (not legacy tcpip 5555) — it's TLS-encrypted
2. **ADB keys live in** `~/.android/adbkey` on Titan — protect this file
3. **Revoke all keys** on phone if Titan is compromised: Settings → Developer Options → Revoke USB Debugging Authorizations
4. **Network isolation**: Put Annie's phone on its own VLAN or subnet, not with laptops/guest devices
5. **Never enable ADB on public WiFi** — the phone's WiFi SSID should be home-only
6. **For UPI/banking**: Test whether disabling ADB (or using native wireless debugging) allows banking apps to run. If they still block, use Tasker + AutoInput path (which doesn't involve ADB at payment time).

---

## 10. Recommended Architecture for Annie

### 10.1 Layer Stack

```
Annie (Python, Titan)
│
├── Layer 0: Connection
│   ├── Primary: Native Wireless ADB (TLS encrypted, port auto-assigned)
│   ├── Reconnect: Shizuku + Automate app on phone boot
│   └── Fallback: USB cable to Titan
│
├── Layer 1: Phone Control (uiautomator2)  ← PRIMARY for most apps
│   ├── pip install uiautomator2
│   ├── Element finding: resourceId > text > description > xpath
│   ├── Screenshots: minicap (fast, ~100ms)
│   └── Gestures: tap, swipe, type, scroll
│
├── Layer 2: AI Navigation (DroidRun)  ← For complex/unknown UIs
│   ├── pip install droidrun
│   ├── LLM: Nemotron (Titan) for simple steps, Claude for complex
│   └── Uses accessibility tree as structured text (not screenshots)
│
├── Layer 3: Vision Verification (scrcpy / adbnativeblitz)  ← For "check what's on screen"
│   ├── pip install pyscrcpy  (or adbnativeblitz for pure ADB)
│   ├── last_frame at ~33ms
│   └── Feed frame to Nemotron VLM or OCR for verification
│
└── Layer 4: Banking/UPI  ← Special path
    ├── Tasker + AutoInput on phone (accessibility service)
    ├── Trigger from Annie via: adb shell am broadcast OR HTTP to AutoRemote
    ├── Task performs UPI action on-device (no ADB needed at transaction time)
    └── Result returned to Annie via Tasker HTTP POST to Annie server
```

### 10.2 Python Integration Design

```python
# services/annie-voice/phone_agent.py (new file ~200 lines)

import uiautomator2 as u2
import subprocess
from PIL import Image
import io

class PhoneAgent:
    """Annie's Android phone controller."""

    def __init__(self, phone_ip: str = "192.168.1.42"):
        self._phone_ip = phone_ip
        self._device: u2.Device = None

    def connect(self) -> bool:
        """Connect to phone via uiautomator2."""
        try:
            self._device = u2.connect(self._phone_ip)
            self._device.screen.on()   # Wake screen
            return True
        except Exception as e:
            # Fallback: reconnect ADB first
            subprocess.run(["adb", "connect", f"{self._phone_ip}:5555"], check=True)
            self._device = u2.connect(self._phone_ip)
            return True

    def screenshot(self) -> Image.Image:
        """Take screenshot. Returns PIL Image."""
        return self._device.screenshot()

    def screenshot_bytes(self) -> bytes:
        """Fast screenshot as bytes (for LLM vision input)."""
        # Direct exec-out is faster when uiautomator2 has issues
        result = subprocess.run(
            ["adb", "exec-out", "screencap", "-p"],
            capture_output=True
        )
        return result.stdout

    def tap(self, x: int, y: int) -> None:
        self._device.click(x, y)

    def tap_element(self, text: str = None, resource_id: str = None,
                    description: str = None) -> bool:
        """Click UI element by any selector."""
        if resource_id:
            el = self._device(resourceId=resource_id)
        elif text:
            el = self._device(text=text)
        elif description:
            el = self._device(description=description)
        else:
            raise ValueError("Need at least one selector")

        if el.exists(timeout=5):
            el.click()
            return True
        return False

    def type_text(self, text: str) -> None:
        self._device.send_keys(text)

    def launch_app(self, package: str) -> None:
        self._device.app_start(package)

    def press_home(self) -> None:
        self._device.press("home")

    def trigger_tasker_task(self, task_name: str, **kwargs) -> None:
        """Trigger Tasker task for banking/UPI flows."""
        extras = " ".join(f"--es {k} '{v}'" for k, v in kwargs.items())
        subprocess.run(
            f"adb shell am broadcast -a net.dinglisch.android.taskerm.ACTION_TASK "
            f"-e task_name '{task_name}' {extras}",
            shell=True
        )

# Annie tool definition (in tools.py)
async def control_phone(action: str, **kwargs) -> dict:
    """
    Control Annie's Android phone.

    Actions:
    - screenshot: take a screenshot, returns base64 image
    - tap: tap at coordinates or element
    - type: type text into focused field
    - launch_app: open an app by package name
    - home: go to home screen
    - upi_payment: trigger UPI payment via Tasker (safe path)
    """
    agent = get_phone_agent()  # singleton

    if action == "screenshot":
        img = agent.screenshot()
        # encode to base64 for LLM consumption
        buf = io.BytesIO()
        img.save(buf, format="PNG")
        return {"image_b64": base64.b64encode(buf.getvalue()).decode()}

    elif action == "tap":
        if "text" in kwargs or "resource_id" in kwargs:
            success = agent.tap_element(**kwargs)
        else:
            agent.tap(kwargs["x"], kwargs["y"])
            success = True
        return {"success": success}

    # ... etc
```

### 10.3 Error Handling Patterns

```python
from contextlib import contextmanager

@contextmanager
def phone_session(agent: PhoneAgent):
    """Ensure phone is connected and screen is on."""
    try:
        if not agent.is_connected():
            agent.connect()
        agent.wake_screen()
        yield agent
    except u2.exceptions.DeviceNotFoundError:
        # ADB reconnect
        subprocess.run(["adb", "connect", f"{agent._phone_ip}:5555"])
        agent.connect()
        yield agent
    finally:
        agent.screen_on()   # Leave screen on for next command


# Usage in Annie tool
async def swiggy_order(restaurant: str, items: list) -> str:
    with phone_session(get_phone_agent()) as phone:
        # Open Swiggy
        phone.launch_app("in.swiggy.android")

        # Wait for home screen
        phone.wait_for_element(text=restaurant, timeout=10)
        phone.tap_element(text=restaurant)

        for item in items:
            phone.tap_element(text=item)
            phone.tap_element(text="Add")

        phone.tap_element(description="Cart")
        phone.tap_element(text="Place Order")

        # Verify order placed
        if phone.element_exists(text="Order Placed"):
            return "Order placed successfully"

        # Fall back to vision verification
        screenshot = phone.screenshot_bytes()
        return await verify_order_via_vision(screenshot)
```

### 10.4 Installation Checklist for Titan + Pixel 9a

**On Titan (one-time):**
```bash
# 1. Install ADB
sudo apt install adb

# 2. Pair with phone (phone must show Developer Options → Wireless Debugging)
adb pair <phone-ip>:<pairing-port>   # Enter 6-digit code shown on phone

# 3. Connect
adb connect <phone-ip>:<debug-port>

# 4. Verify
adb devices

# 5. Install uiautomator2
pip install uiautomator2

# 6. Initialize uiautomator2 on phone (installs atx-agent APK)
python -m uiautomator2 init

# 7. Test
python -c "import uiautomator2 as u2; d=u2.connect(); print(d.info)"

# 8. Install DroidRun (for AI-driven navigation)
pip install droidrun

# 9. Install scrcpy (for fast screenshots)
sudo apt install scrcpy

# 10. Install adbnativeblitz (alternative fast screenshots, pure Python)
pip install adbnativeblitz
```

**On Pixel 9a (one-time):**
```
1. Settings → About phone → tap Build number 7 times
2. Settings → Developer Options → ON
3. Developer Options → Wireless Debugging → ON
4. Install Shizuku from Play Store
5. Set up Shizuku with Wireless Debugging
6. Install Tasker ($3.49) + AutoInput ($2.99) for banking flows
7. Settings → Battery → Charging optimization → Limit to 80%
8. Settings → Developer Options → Stay awake → ON (while charging)
9. Settings → Display → Screen timeout → 30 minutes (or "Never" in dev options)
10. Developer Options → disable phantom processes:
    adb shell settings put global settings_enable_monitor_phantom_procs false
```

---

## 11. Known Limitations and Gotchas

### 11.1 Banking Apps (GPay, PhonePe, BHIM)

**What blocks them:**
- Root detection (not a problem: Pixel 9a with locked bootloader)
- ADB `adb_enabled` setting detection — some check `Settings.Global.ADB_ENABLED`
- UIAutomator server detection — some apps detect `com.github.uiautomator` running

**Mitigations:**
1. **Test with native wireless debugging** — may not set `adb_enabled` global flag the same way
2. **Tasker + AutoInput path** — accessibility service is the most compatible with banking apps
3. **Disable ADB briefly**: `adb shell settings put global adb_enabled 0` before payment, then re-enable via scheduled Tasker task after

**NPCI 2025 security mandate analysis:**
- Root: blocked — not relevant (locked bootloader)
- USB debugging: blocked — test whether wireless debugging triggers this check
- Dynamic instrumentation (Frida): blocked — not using Frida
- Accessibility service: **allowed** (cannot be blocked without breaking screen readers)

### 11.2 Screen Must Be On

All UIAutomator-based automation requires the screen to be awake. Use:
```python
d.screen.on()              # uiautomator2
# or
adb shell input keyevent KEYCODE_WAKEUP
adb shell svc power stayon true   # keep screen on while charging
```

### 11.3 App Updates Break Selectors

When apps update, resource IDs and element layouts change. Strategies:
1. Use `text` selectors over `resourceId` when text is stable
2. Add fallback selectors: try ID first, fall back to text, fall back to position
3. DroidRun's LLM-based navigation adapts automatically to UI changes — preferred for long-term stability

### 11.4 uiautomator2 v3.x Breaking Changes

v3 (Jan 2026) introduced changes from v2.x. If existing scripts exist, check the migration guide at `uiautomator2/docs/2to3.md` in the GitHub repo.

---

## 12. Decision Matrix

| Use Case | Recommended Tool | Why |
|----------|-----------------|-----|
| Basic app navigation | uiautomator2 | Fast, element-aware |
| UPI payments (GPay, PhonePe) | Tasker + AutoInput | Bypasses ADB detection |
| Shopping apps (Swiggy, Zomato) | uiautomator2 | Clean APIs, stable element IDs |
| Complex multi-step unknown app | DroidRun | LLM-adaptive, handles UI changes |
| "What's on screen now?" | scrcpy / adb exec-out | Fast screenshot |
| Reboot persistence | Shizuku + Automate | No root needed |
| Making phone calls | uiautomator2 / ADB `am start tel:` | Simple |
| Installing test apps | ADB `adb install` | Direct |
| Reading notifications | uiautomator2 dump | Pull notification shade |

---

## 13. Open Questions (for implementation)

1. **UPI identity**: Which phone number + bank account will Annie use? Legal implications of AI operating banking apps (NPCI 2025 framework explicitly mentions "autonomous AI" as a risk category).
2. **ADB detection test**: Does Pixel 9a's native wireless debugging set `adb_enabled=1` in global settings? Run `adb shell settings get global adb_enabled` when wireless debugging is ON.
3. **Play Integrity pass**: Confirm Pixel 9a passes STRONG Play Integrity with wireless debugging ON (it should — locked bootloader + stock OS).
4. **Tasker trigger latency**: How fast does ADB intent → Tasker → AutoInput complete? Measure round-trip.
5. **uiautomator2 v3 compatibility**: Test `python -m uiautomator2 init` on Pixel 9a with Android 15/16.
6. **Screen always on**: Confirm `svc power stayon true` survives Pixel 9a software updates.
7. **DroidRun with Nemotron**: Test DroidRun with Ollama/Nemotron backend (it supports Ollama) to avoid Claude API costs for routine navigation.

---

## Sources

- [Android Debug Bridge — Android Developers](https://developer.android.com/tools/adb)
- [Google fixing wireless ADB auto-reconnect — Android Authority](https://www.androidauthority.com/android-wireless-adb-auto-reconnect-3624945/)
- [adb-auto-enable (mouldybread) — GitHub](https://github.com/mouldybread/adb-auto-enable)
- [openatx/uiautomator2 — GitHub](https://github.com/openatx/uiautomator2)
- [uiautomator2 v3.2.9 — PyPI](https://pypi.org/project/uiautomator2/3.2.9/)
- [DeepWiki: uiautomator2 installation](https://deepwiki.com/openatx/uiautomator2/2.1-installation-and-setup)
- [Shizuku auto-start Android 13+ — GitHub Discussion](https://github.com/RikkaApps/Shizuku/discussions/462)
- [Shizuku 2025 — mobile-hacker.com](https://www.mobile-hacker.com/2025/07/14/shizuku-unlocking-advanced-android-capabilities-without-root/)
- [Android 11 Wireless Debugging guide](https://en.androidsis.com/Complete-guide-to-wireless-adb-on-android-11-or-higher/)
- [Appium Python client — BrowserStack](https://www.browserstack.com/guide/appium-with-python-for-app-testing)
- [Appium UIAutomator2 driver — GitHub](https://github.com/appium/appium-uiautomator2-driver)
- [uiautomator2 vs Appium — Medium](https://medium.com/@vaibhavc121/choosing-the-right-mobile-automation-framework-beyond-appium-186205ef2e1c)
- [Genymobile/scrcpy — GitHub](https://github.com/Genymobile/scrcpy)
- [scrcpy-mcp MCP server — GitHub](https://github.com/JuanCF/scrcpy-mcp)
- [py-scrcpy-client — GitHub](https://github.com/leng-yue/py-scrcpy-client)
- [adbnativeblitz — PyPI](https://pypi.org/project/adbnativeblitz/)
- [ADB exec-out screencap speed — codegenes.net](https://www.codegenes.net/blog/using-adb-to-capture-the-screen/)
- [ADB sendevent vs input tap — Repeato](https://www.repeato.app/understanding-adb-shell-input-events/)
- [ADB security (Abusing ADB) — Trustwave](https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/abusing-the-android-debug-bridge/)
- [Tasker + AutoInput plugin](https://play.google.com/store/apps/details?id=com.joaomgcd.autoinput)
- [Android Accessibility Service — Android Developers](https://developer.android.com/guide/topics/ui/accessibility/service)
- [NPCI Mobile Security Framework 2025 — CertCube](https://blog.certcube.com/npcis-comprehensive-mobile-application-security-framework-for-upi-2025/)
- [Accessibility Permission impact — BrowserStack](https://www.browserstack.com/guide/accessibility-permission-in-android)
- [DroidRun — GitHub](https://github.com/droidrun/droidrun)
- [DroidRun — Documentation](https://docs.droidrun.ai/v3/overview)
- [Mobile AI agents benchmark 2026 — AImultiple](https://aimultiple.com/mobile-ai-agent)
- [AppAgent (TencentQQGYLab) — GitHub](https://github.com/TencentQQGYLab/AppAgent)
- [Tasker ADB Shell integration — XDA](https://xdaforums.com/t/tasker-secure-settings-adb-access-how-safe-is-it.4120197/)
- [MacroDroid vs Tasker comparison](https://www.androidauthority.com)
- [DGX Spark aarch64 compatibility — NVIDIA Forums](https://forums.developer.nvidia.com/t/architecture-and-library-compatibility-on-aarch64/350389)
- [Appium system requirements](https://appium.io/docs/en/latest/quickstart/requirements/)
