# Research: Automating Food Delivery Apps (Swiggy, Zomato, Zepto) via ADB

**Date:** 2026-04-02
**Context:** Annie controls a Pixel 9a (Android 16, API 36, serial 62271XEBF9DFD4) via USB ADB from Panda (DGX Spark, aarch64). uiautomator2 is the planned primary framework. This doc covers practical automation of Swiggy, Zomato, and Zepto — from launching the app to completing checkout.
**Prerequisite:** Read `docs/RESEARCH-ADB-AUTOMATION-STACK.md` for the full ADB/uiautomator2/DroidRun/scrcpy/Tasker foundation. This doc focuses specifically on food ordering flows.

---

## 1. App Identity & Launch Commands

### 1.1 Package Names (confirmed from Google Play Store)

| App | Package Name | Play Store ID |
|-----|-------------|---------------|
| Swiggy | `in.swiggy.android` | [Play Store](https://play.google.com/store/apps/details?id=in.swiggy.android) |
| Zomato | `com.application.zomato` | [Play Store](https://play.google.com/store/apps/details?id=com.application.zomato) |
| Zepto | `com.zeptoconsumerapp` | [Play Store](https://play.google.com/store/apps/details?id=com.zeptoconsumerapp) |

### 1.2 Launch via uiautomator2

```python
import uiautomator2 as u2

d = u2.connect()  # USB-connected Pixel 9a

# Launch apps
d.app_start("in.swiggy.android")       # Swiggy
d.app_start("com.application.zomato")   # Zomato
d.app_start("com.zeptoconsumerapp")     # Zepto

# Force stop
d.app_stop("in.swiggy.android")

# Check what's running
print(d.app_current())
```

### 1.3 Launch via raw ADB (alternative)

```bash
adb shell monkey -p in.swiggy.android -c android.intent.category.LAUNCHER 1
adb shell monkey -p com.application.zomato -c android.intent.category.LAUNCHER 1
adb shell monkey -p com.zeptoconsumerapp -c android.intent.category.LAUNCHER 1
```

### 1.4 Tech Stack of Each App

| App | Android Tech | UI Framework | Automation Impact |
|-----|-------------|--------------|-------------------|
| Swiggy | Kotlin (native) + React Native (UI components) | Hybrid: Native + RN | Most elements have native accessibility IDs. RN components may have less predictable IDs but are still visible in the accessibility tree. |
| Zomato | Kotlin (native) | Native | Best for automation — native elements have stable resource IDs. |
| Zepto | React Native (primary) | RN-dominant | More dynamic IDs, but uiautomator2 still sees the rendered native views. Text-based selectors more reliable than resource IDs. |

**Key insight:** All three apps render to native Android views under the hood, even when built with React Native. uiautomator2 sees native `android.widget.*` views regardless of the framework used. The accessibility tree works for all of them.

---

## 2. Practical Ordering Flow — Screen by Screen

### 2.1 Swiggy: App Open to Order Placed

**Typical flow (8-12 screens/interactions):**

```
1. Launch app                          → Home screen (restaurants, categories, search)
2. Dismiss popup (if any)             → Coupon/offer popup, location confirmation
3. Search for restaurant / tap name    → Restaurant list or direct restaurant page
4. Browse menu                        → Scrollable list: categories + items
5. Tap "ADD" on item                  → May show customization popup (size, toppings)
6. Select customizations → "Add Item" → Item added to cart (bottom bar appears)
7. Repeat steps 4-6 for more items
8. Tap cart / "View Cart"             → Cart page (items, prices, offers)
9. Apply coupon (optional)            → Coupon list or code entry
10. Tap "Place Order" / "Proceed"     → Payment selection screen
11. Select payment method (UPI/saved) → Payment confirmation
12. Confirm order                     → "Order Placed" confirmation screen
```

**Automation time estimate:** 30-90 seconds via uiautomator2 (excluding LLM decision time). With DroidRun/LLM: 2-5 minutes including perception-reasoning-action loops.

### 2.2 Zomato: App Open to Order Placed

**Typical flow (8-12 screens/interactions):**

```
1. Launch app                          → Home with banners, categories, restaurants
2. Dismiss popup (location/promo)      → Common first-launch popup
3. Search restaurant or browse         → Search bar at top, restaurant cards
4. Select restaurant                   → Restaurant page: menu categories, items
5. Tap "Add" on item                   → Customization sheet (variants, add-ons)
6. Confirm customization               → Item in cart, floating cart bar
7. Repeat for more items
8. Tap "View Cart" / cart icon         → Cart with itemized breakdown, savings, Pro benefits
9. Apply offer (optional)              → Offer code or auto-applied
10. Tap "Place Order"                  → Payment screen (remembers last payment method)
11. Confirm payment                    → Order confirmed, live tracking begins
```

**Zomato UX note:** Cart is always visible even during restaurant discovery. The payment flow remembers your last payment method (e.g., UPI). Fewer taps for repeat orders.

### 2.3 Zepto: App Open to Order Placed

**Typical flow (6-10 screens/interactions, faster):**

```
1. Launch app                          → Home with categories, search, quick picks
2. Search for item or browse category  → Product grid
3. Tap "Add" on item                   → Quantity selector (+ / -)
4. Repeat for more items
5. Tap cart icon                       → Cart review (delivery time, bill details)
6. Tap "Proceed to Pay"               → Payment selection
7. Confirm payment                     → Order placed, live tracking (10-min delivery)
```

**Zepto note:** Simpler flow (grocery, not restaurant). No restaurant selection or customization. Typically 6-8 interactions from open to checkout. Fastest to automate.

### 2.4 Common UI Patterns to Handle

| Pattern | Where | How to Handle |
|---------|-------|---------------|
| **Location permission popup** | First launch / app update | `d(text="Allow").click()` or `d(resourceId="...permission_allow").click()` |
| **Promo/coupon popup** | Home screen entry | `d(description="Close").click()` or `d(text="SKIP").click()` or tap outside |
| **Login required** | If session expired | Handle OTP flow (see Section 3) |
| **"Your location?"** | If GPS off or address change | Set default address beforehand in app settings |
| **Item customization sheet** | After tapping "Add" | Select options, then confirm — or use `d(text="Add Item").click()` |
| **Cart conflict (different restaurant)** | Swiggy/Zomato if cart has items from another restaurant | `d(text="Start Afresh").click()` to clear old cart |
| **Surge pricing popup** | Peak hours | `d(text="Continue").click()` or `d(text="OK, Got It").click()` |
| **Minimum order value** | Cart below threshold | Add more items or switch to another restaurant |
| **Delivery unavailable** | Late night / restaurant closed | Check before attempting order; read screen text |

---

## 3. Login & OTP Handling

### 3.1 The Login Flow

All three apps use **phone number + OTP** as the primary login method:

```
1. Open app → "Login" or "Sign Up" screen
2. Enter phone number (10-digit Indian mobile)
3. Tap "Get OTP" / "Send OTP"
4. Wait for SMS → Enter 4-6 digit OTP
5. Authenticated → Home screen
```

### 3.2 Staying Logged In (Preferred Strategy)

**Best approach: Log in manually once, then never log out.**

- These apps maintain persistent sessions (weeks to months)
- Session tokens stored in app data survive reboots
- As long as you don't clear app data or uninstall, the session persists
- `d.app_start()` and even `d.app_stop()` preserve the session
- **Only `d.app_clear()` would destroy it** — NEVER call this

**Risk of session expiry:**
- App update forcing re-login (rare, ~1-2x per year)
- Server-side session invalidation (very rare)
- Phone factory reset

### 3.3 Automated OTP Handling (When Needed)

If re-login is needed, Annie can read the OTP from SMS via ADB:

```bash
# Read the most recent SMS from Swiggy/Zomato
adb shell content query --uri content://sms/inbox \
  --projection address,body,date \
  --sort "date DESC" \
  --where "address LIKE '%SWIGGY%' OR address LIKE '%ZOMATO%' OR address LIKE '%ZEPTO%'" \
  | head -1
```

**Python approach:**

```python
import subprocess
import re
import time

def read_otp_from_sms(sender_pattern: str, timeout: int = 60) -> str | None:
    """Wait for and read OTP from SMS.

    Args:
        sender_pattern: Regex pattern for sender (e.g., "SWIGGY|Swiggy")
        timeout: Max wait time in seconds
    """
    start = time.time()
    while time.time() - start < timeout:
        result = subprocess.run(
            [
                "adb", "shell", "content", "query",
                "--uri", "content://sms/inbox",
                "--projection", "address,body,date",
                "--sort", "date DESC",
            ],
            capture_output=True, text=True,
        )
        for line in result.stdout.splitlines():
            if re.search(sender_pattern, line, re.IGNORECASE):
                # Extract 4-6 digit OTP from body
                match = re.search(r'\b(\d{4,6})\b', line)
                if match:
                    return match.group(1)
        time.sleep(2)
    return None

# Usage
otp = read_otp_from_sms(r"SWIGGY|JD-SWIGGY|VM-SWIGGY")
if otp:
    d.send_keys(otp)
```

**Important:** `adb shell content query --uri content://sms/inbox` works on a USB-connected ADB device **without root**. The ADB shell runs as the `shell` user, which has permission to query the SMS content provider. This is a well-known capability of the ADB debug interface.

**Android 16 consideration:** Google has been tightening SMS access app-side, but ADB shell access to content providers is part of the debug interface and is not affected by app-level permission restrictions. As long as USB debugging is enabled and authorized, this works.

### 3.4 Alternative OTP Approaches

| Approach | Complexity | Reliability |
|----------|-----------|-------------|
| **ADB SMS content query** (recommended) | Low | High — direct DB query |
| **Read OTP from notification** | Medium | Good — dump notification shade via uiautomator2 |
| **Clipboard copy** | Low | Fragile — depends on Android auto-copy behavior |
| **Watch logcat for SMS** | Medium | Unreliable on Android 16 |
| **Tasker intercept** | Medium | Good — Tasker can read SMS and forward to Annie via HTTP |

**Recommended approach:** ADB SMS content query for simplicity. Tasker SMS intercept as fallback.

---

## 4. Screen Understanding: Accessibility Tree vs Vision

### 4.1 Accessibility Tree (uiautomator2 dump) — Primary

```python
# Dump the entire UI hierarchy as XML
xml = d.dump_hierarchy()

# This returns structured XML like:
# <node resource-id="in.swiggy.android:id/search_bar"
#       text="Search for restaurant or dish"
#       class="android.widget.EditText"
#       bounds="[32,180][1048,256]"
#       clickable="true" />
```

**Strengths:**
- Fast (~200-800ms)
- Structured: every element has bounds, text, resource-id, class, description
- No LLM needed for simple navigation — pattern-match on text/ID
- Works with all three apps (native Android views)

**Weaknesses:**
- Some elements lack useful text or resource-id (especially in RN/WebView sections)
- Custom-drawn views (Canvas, custom animations) are invisible
- Dynamically loaded content may not appear until scrolled into view

### 4.2 Screenshot + Vision LLM — Secondary

```python
# Take screenshot
screenshot = d.screenshot()  # PIL Image
screenshot.save("/tmp/screen.png")

# Send to vision LLM for understanding
# Could use: Claude Vision, Qwen-VL on Titan, Nemotron VLM
response = await vision_llm.analyze(
    image=screenshot,
    prompt="What screen am I on? List all tappable elements with their approximate positions."
)
```

**Strengths:**
- Works on ANY app, including custom-drawn UIs
- Can understand visual layout, images, icons, colors
- Can read text that isn't in the accessibility tree (e.g., text in images)
- Better for "what is this screen showing?" type questions

**Weaknesses:**
- Slower (screenshot ~300ms + LLM inference 1-5s)
- More expensive (vision API calls or GPU VRAM for local VLM)
- Coordinates from LLM are approximate — may need retry/correction
- Overkill for standard UI elements that have proper accessibility labels

### 4.3 Recommended Hybrid Approach

```
Step 1: dump_hierarchy() → parse XML
Step 2: If target element found by text/id → click it (fast path, ~300ms)
Step 3: If not found → take screenshot → send to vision LLM → get coordinates → tap
Step 4: After any action → dump_hierarchy() to verify result
```

**For food ordering specifically:** The accessibility tree is sufficient 90%+ of the time. Restaurant names, menu items, prices, "Add" buttons, "Cart", "Place Order" — all have text labels. Vision is only needed for:
- Identifying which restaurant to pick from a visual list
- Reading promotional banners or offers
- Verifying order confirmation when text labels are ambiguous
- Handling unexpected popups with non-standard UI

### 4.4 Performance Comparison for Food Ordering

| Approach | Time per Screen | LLM Cost | Reliability |
|----------|----------------|----------|-------------|
| Accessibility tree only | ~0.5s | None | 90% (misses custom views) |
| Vision only (screenshot + VLM) | ~3-5s | API call per screen | 85% (coordinate imprecision) |
| Hybrid (tree first, vision fallback) | ~0.5s typical, ~4s fallback | Minimal | 95%+ |
| DroidRun (tree + LLM reasoning) | ~3-8s per step | LLM call per step | ~91% (AndroidWorld benchmark) |

---

## 5. DroidRun: AI-Powered Automation in Depth

### 5.1 Architecture

DroidRun uses a two-component system:

```
Panda (Python agent)                          Pixel 9a (phone)
├── DroidRun Python SDK                       ├── droidrun-portal APK
│   ├── Manager (high-level LLM)              │   ├── Accessibility Service
│   │   └── Sets goals, creates task plan     │   │   └── Reads live UI hierarchy
│   ├── Executor (fast LLM)                   │   ├── Content Provider (ADB query)
│   │   └── Decides single next action        │   └── Socket Server (HTTP API)
│   └── Perception                            └── Receives tap/swipe/type commands
│       ├── Accessibility tree (primary)
│       └── Screenshot (fallback)
```

**Manager-Executor loop:**
1. Manager receives goal ("Order biryani from Swiggy")
2. Manager creates high-level task list
3. Executor reads current screen state (accessibility tree)
4. Executor asks fast LLM: "Given this screen and task, what's the single next action?"
5. Executor performs action (tap, type, swipe)
6. Manager re-evaluates: is the task progressing? Adjust plan if needed?
7. Loop until goal is met or max steps reached

### 5.2 Setup on Panda

```bash
# Install DroidRun
pip install droidrun

# Setup portal app on phone (one-time)
droidrun setup

# Enable accessibility service on phone
# Settings → Accessibility → DroidRun Portal → Enable
# (must be done manually once)
```

### 5.3 Using with Local LLMs (Ollama/Nemotron)

DroidRun officially supports Ollama. Annie can route to Nemotron on Beast or Titan:

```python
from droidrun.agent import DroidAgent

agent = DroidAgent(
    llm_provider="ollama",
    model="nemotron:latest",    # or any Ollama model
    ollama_base_url="http://titan:11434",  # Ollama on Titan
    device_serial="62271XEBF9DFD4",
)

result = await agent.run("Open Swiggy, search for biryani, order from the first restaurant")
```

**LLM routing for Annie:**
- **Simple navigation** (launch app, tap known buttons): No LLM needed, use uiautomator2 directly
- **Dynamic decisions** (pick a restaurant, choose from options): Nemotron Nano on Titan (fast, cheap)
- **Complex multi-step** (full order with error recovery): Nemotron Super on Beast or Claude API

### 5.4 DroidRun vs DroidClaw vs mobile-use

| Feature | DroidRun | DroidClaw | mobile-use (Minitap) |
|---------|----------|-----------|---------------------|
| **GitHub** | droidrun/droidrun | unitedbyai/droidclaw | minitap-ai/mobile-use |
| **Stars** | ~3.8k | ~1k | ~2k |
| **Primary input** | Accessibility tree + screenshot | Accessibility tree (uiautomator dump) + screenshot | Accessibility tree |
| **LLM support** | OpenAI, Anthropic, Gemini, Ollama, DeepSeek | Groq, Ollama, OpenAI, Bedrock | OpenAI, Anthropic, Gemini |
| **Local LLM** | Yes (Ollama) | Yes (Ollama) | Not documented |
| **AndroidWorld score** | 91.4% | Not benchmarked | **100%** (claimed) |
| **Portal app** | Yes (custom accessibility service) | No (uses raw uiautomator dump) | No (uses raw ADB) |
| **Architecture** | Manager-Executor loop | Single perceive-reason-act loop | Agent loop |
| **Recovery** | Manager re-plans on failure | 3-step stall detection | Not documented |
| **Remote access** | Standard ADB | Tailscale built-in | Standard ADB |
| **Best for** | Production reliability | Simple always-on agents | Highest benchmark accuracy |
| **Risk** | Portal app needs manual accessibility enable | Simpler but less robust | Newer project, less battle-tested |

### 5.5 Recommendation for Annie

**Primary: uiautomator2 with scripted flows.** For food ordering, the UI patterns are predictable enough that scripted automation (with text/ID selectors) is fastest and most reliable. No LLM needed per step.

**Secondary: DroidRun for unfamiliar/changing UIs.** When app updates break selectors or Annie encounters an unexpected screen, DroidRun's LLM-adaptive approach self-corrects. Use with Ollama/Nemotron to avoid API costs.

**Not recommended: DroidClaw** for Annie's use case. DroidClaw is designed for "spare phone as always-on agent" scenarios. Annie already has a dedicated phone and server infrastructure; DroidRun's architecture is more sophisticated and better benchmarked.

---

## 6. Anti-Automation Measures & ToS

### 6.1 Do These Apps Detect Automation?

| Detection Method | Swiggy | Zomato | Zepto | Impact |
|-----------------|--------|--------|-------|--------|
| **Play Integrity API** | Likely | Likely | Likely | Checks device integrity (root, bootloader). Pixel 9a with locked bootloader passes STRONG verdict. |
| **UIAutomator detection** | Unknown | Unknown | Unknown | Apps *can* detect `com.github.uiautomator` package via `pm list`. Unlikely for food apps (not banking). |
| **ADB enabled detection** | No | No | No | Only banking/UPI apps check `Settings.Global.ADB_ENABLED`. Food apps don't. |
| **Rate limiting** | Yes (API) | Yes (API) | Yes (API) | Backend rate limits on order frequency. Normal personal use (1-3 orders/day) won't trigger. |
| **CAPTCHA** | Rare | Rare | Rare | May appear after suspicious activity (rapid repeated logins). Normal use won't trigger. |
| **Behavioral analysis** | Unlikely | Unlikely | Unlikely | These apps aren't banking — no reason to invest in behavioral biometrics. |
| **App Access Risk (Play Integrity)** | Possible | Possible | Possible | `appAccessRiskVerdict` can detect apps with screen capture/overlay. DroidRun Portal's accessibility service *could* be flagged. Risk is low for food apps. |

**Bottom line:** Food delivery apps have minimal anti-automation compared to banking/UPI apps. Their primary concern is preventing scraping of restaurant/pricing data (API-side), not preventing users from automating their own orders.

### 6.2 Pixel 9a Advantage

The Pixel 9a running stock Android 16 with a locked bootloader is the **ideal** automation target:
- Passes STRONG Play Integrity verdict
- No root detection issues
- Stock ROM means no "modified system" flags
- Google's own phone — apps are less likely to flag it

### 6.3 Terms of Service

**Swiggy ToS (key excerpt):** Prohibits "use any robot, spider, scraper or other automated means to access the Platform." This language targets commercial scraping, not personal order automation.

**Zomato ToS (key excerpt):** Similar prohibition on bots, spiders, and automated access tools.

**Zepto ToS:** Similar language.

**Practical assessment:**
- **Legal risk: Very low.** You are automating your own orders on your own phone. You are not scraping data, not creating fake accounts, not accessing others' data. This is functionally identical to using Android's built-in accessibility features.
- **Account ban risk: Very low.** These companies have no financial incentive to ban paying customers who order normally. The ToS language exists to prevent commercial scraping/abuse.
- **Ethical position:** Annie is a personal assistant ordering food for her owner on his own account, from his own phone. This is the same as asking a family member to order for you.

### 6.4 Mitigation Strategies (Defense in Depth)

Even though risk is low, apply these for safety:

1. **Human-like timing:** Add 0.5-2s random delays between interactions. Don't tap at machine speed.
2. **Don't call `pm list packages`** from the automation — some apps monitor this.
3. **Limit order frequency:** 1-3 orders per day is normal human behavior.
4. **Don't scrape:** Don't programmatically read menus/prices beyond what's needed for the current order.
5. **Keep apps updated:** Outdated app versions may be flagged.
6. **Use the same device consistently:** Don't swap devices frequently for the same account.
7. **Human fallback:** If Annie encounters a CAPTCHA or verification challenge, alert Rajesh.

---

## 7. Accessibility Service Approach (Alternative to uiautomator2)

### 7.1 How It Differs

| Aspect | uiautomator2 | Custom Accessibility Service |
|--------|-------------|----------------------------|
| **Runs on** | Phone (atx-agent) + Python (Panda) | Phone only (APK) |
| **Detection risk** | Medium (atx-agent is a known package) | Low (looks like a legitimate accessibility app) |
| **Setup** | `pip install` + `python -m uiautomator2 init` | Build APK + install + enable in settings |
| **Maintenance** | Python scripts on Panda | Java/Kotlin code compiled to APK |
| **Speed** | 50-150ms per command over HTTP | Near-instant (same process space) |
| **Banking app compat** | Lower (UIAutomator detection) | Higher (accessibility services are expected) |
| **Programming model** | Python (familiar, flexible) | Java/Kotlin (less familiar for Annie's stack) |
| **Real-time screen events** | Poll-based (dump hierarchy) | Event-driven (onAccessibilityEvent callback) |

### 7.2 When to Use Accessibility Service

For food ordering apps, **uiautomator2 is sufficient and preferred.** An accessibility service is only needed when:
- An app actively detects and blocks UIAutomator (banking/UPI apps)
- You need event-driven screen monitoring (not needed for ordering)
- You need to operate without ADB at transaction time

### 7.3 Google Play Policy

Google restricts accessibility services on Play Store apps:
- "Autonomous AI actions" are prohibited for Play Store apps
- However, this only applies to apps **published on the Play Store**
- A sideloaded APK (installed via ADB) is not subject to Play Store policies
- Annie's accessibility service APK would be sideloaded, not on the Play Store

### 7.4 DroidRun Portal as a Compromise

DroidRun Portal **is** an accessibility service. By using DroidRun, Annie gets the benefits of an accessibility service (system-level UI access) packaged as a maintained open-source tool, without needing to write a custom APK. This is the best of both worlds for non-banking apps.

---

## 8. Practical Implementation Patterns

### 8.1 Resilient Element Finding

```python
def find_and_tap(d, targets: list[dict], timeout: float = 10) -> bool:
    """Try multiple selectors with fallback.

    Args:
        d: uiautomator2 device
        targets: List of selector dicts, tried in order.
                 e.g., [{"text": "Add"}, {"description": "Add to cart"},
                        {"resourceId": "in.swiggy.android:id/add_btn"}]
    """
    deadline = time.time() + timeout
    while time.time() < deadline:
        for selector in targets:
            el = d(**selector)
            if el.exists(timeout=0.5):
                el.click()
                return True
        time.sleep(0.5)
    return False
```

### 8.2 Popup Dismissal Pattern

```python
KNOWN_DISMISSALS = [
    {"text": "SKIP"},
    {"text": "Not Now"},
    {"text": "No Thanks"},
    {"text": "Maybe Later"},
    {"description": "Close"},
    {"description": "Dismiss"},
    {"resourceId": ".*close.*"},  # regex match
    {"resourceId": ".*dismiss.*"},
]

def dismiss_popups(d, max_attempts: int = 3) -> int:
    """Dismiss common popups. Returns number dismissed."""
    dismissed = 0
    for _ in range(max_attempts):
        found = False
        for selector in KNOWN_DISMISSALS:
            el = d(**selector)
            if el.exists(timeout=0.3):
                el.click()
                dismissed += 1
                found = True
                time.sleep(0.5)
                break
        if not found:
            break
    return dismissed
```

### 8.3 Scroll-to-Find Pattern

```python
def scroll_and_find(d, text: str, max_scrolls: int = 10) -> bool:
    """Scroll down until element with text is visible."""
    for _ in range(max_scrolls):
        if d(text=text).exists(timeout=0.5):
            return True
        d.swipe_ext("up", scale=0.6)  # scroll down (swipe up)
        time.sleep(0.5)
    return False
```

### 8.4 Full Swiggy Order Flow (Example)

```python
async def order_from_swiggy(
    d,
    restaurant: str,
    items: list[str],
    payment_method: str = "upi",
) -> dict:
    """Complete Swiggy order flow.

    Returns: {"status": "success"|"failed", "details": str}
    """
    # Step 1: Launch
    d.app_start("in.swiggy.android")
    time.sleep(2)
    dismiss_popups(d)

    # Step 2: Search for restaurant
    search_bar = d(text="Search for restaurant or dish")
    if not search_bar.exists(timeout=5):
        search_bar = d(description="Search")
    search_bar.click()
    time.sleep(0.5)
    d.send_keys(restaurant)
    time.sleep(1)  # wait for results

    # Step 3: Select restaurant from results
    if not d(textContains=restaurant).exists(timeout=5):
        return {"status": "failed", "details": f"Restaurant '{restaurant}' not found"}
    d(textContains=restaurant).click()
    time.sleep(2)

    # Step 4: Add items
    for item_name in items:
        if not scroll_and_find(d, item_name):
            return {"status": "failed", "details": f"Item '{item_name}' not found"}
        # Find the "ADD" button near the item
        item_el = d(textContains=item_name)
        bounds = item_el.info["bounds"]
        # "ADD" button is typically to the right of the item name
        add_btn = d(text="ADD").right(item_el)
        if add_btn and add_btn.exists(timeout=2):
            add_btn.click()
        else:
            # Fallback: tap the first "ADD" button visible near the item
            find_and_tap(d, [{"text": "ADD"}, {"text": "Add"}])
        time.sleep(1)
        dismiss_popups(d)  # dismiss customization if not needed

    # Step 5: Go to cart
    find_and_tap(d, [
        {"textContains": "View Cart"},
        {"textContains": "Checkout"},
        {"description": "Cart"},
    ])
    time.sleep(2)

    # Step 6: Place order
    find_and_tap(d, [
        {"text": "Place Order"},
        {"text": "Proceed to Pay"},
        {"textContains": "Place Order"},
    ])
    time.sleep(2)

    # Step 7: Confirm payment
    # (assumes payment method is pre-configured)
    find_and_tap(d, [
        {"text": "Pay"},
        {"textContains": "Confirm"},
        {"textContains": "Pay Now"},
    ])
    time.sleep(3)

    # Step 8: Verify order placed
    if d(textContains="Order Placed").exists(timeout=10):
        return {"status": "success", "details": "Order placed successfully"}
    if d(textContains="Order Confirmed").exists(timeout=5):
        return {"status": "success", "details": "Order confirmed"}

    # Fallback: screenshot for manual verification
    d.screenshot().save("/tmp/order_result.png")
    return {"status": "uncertain", "details": "Check /tmp/order_result.png"}
```

### 8.5 Performance Budget

| Step | uiautomator2 (scripted) | DroidRun (LLM-driven) |
|------|------------------------|----------------------|
| App launch | 2s | 2s |
| Popup dismissal | 1-3s | 3-5s |
| Search + select restaurant | 3-5s | 8-15s |
| Add items (3 items) | 5-10s | 15-30s |
| Cart + checkout | 3-5s | 8-12s |
| Payment confirmation | 2-3s | 5-8s |
| **Total** | **16-28s** | **41-72s** |

Add 0.5-2s random delays for human-like behavior: **scripted total ~25-40s**.

---

## 9. Other AI Android Agent Frameworks (Landscape)

Beyond DroidRun, several notable projects exist:

| Project | Approach | Best For |
|---------|----------|----------|
| **AutoGLM Phone 9B** (Zhipu AI) | Vision model + ADB coordinates. 9B parameter VLM that outputs tap coordinates from screenshots. 89.7% on common tasks, 36.2% on AndroidLab. | Self-hosted vision-based automation. Could run on Beast (~18 GB INT4). |
| **mobile-use** (Minitap) | Agent framework, 100% on AndroidWorld. Uses accessibility tree. | Highest accuracy benchmark, but newer/less battle-tested. |
| **DroidBot-GPT** (MobileLLM) | Screenshot + view hierarchy snapshot → ChatGPT → action. Academic project. | Research reference, not production-ready. |
| **Droid-MCP** | MCP server for ADB. Exposes tap/swipe/screenshot as MCP tools. | MCP integration with Claude Desktop or similar. |
| **DroidMind** | MCP server for Android device management via ADB. | Device management + automation via AI. |
| **agent-device** (Callstack) | CLI for controlling iOS/Android for AI agents. | Cross-platform agent control. |

**For Annie's use case:** DroidRun is the best fit — it has Ollama support (local LLMs), good benchmark scores, active development, and the accessibility-tree-first approach is faster and cheaper than vision-only.

---

## 10. OTP & Authentication Strategy Summary

### 10.1 Decision Tree

```
App needs login?
├── No (session valid) → Proceed with order (99% of the time)
└── Yes (session expired)
    ├── Read OTP via ADB SMS content query
    │   └── adb shell content query --uri content://sms/inbox
    ├── If that fails → Read OTP from notification shade
    │   └── d.open_notification() → find OTP text → extract digits
    └── If all automated fails → Alert Rajesh via Telegram
        └── "Annie needs help: Swiggy login required, please enter OTP"
```

### 10.2 Pre-Authentication Checklist

Before first automated use:
1. Manually log in to Swiggy, Zomato, Zepto on the Pixel 9a
2. Set default delivery address in each app
3. Save preferred payment method (UPI ID or card)
4. Enable "Stay logged in" / "Remember me" options
5. Grant location permission (set to "Always" or "While using")
6. Disable battery optimization for each app (prevents background kill)

---

## 11. Open Questions & Next Steps

### 11.1 To Test on the Actual Device

1. **Session persistence:** Log in to all 3 apps, wait 1 week, verify sessions are still active.
2. **Resource IDs:** Run `d.dump_hierarchy()` on each app's key screens. Map stable selectors for: search bar, restaurant names, menu items, "Add" buttons, cart, "Place Order", payment confirmation.
3. **uiautomator2 v3 on Android 16:** Run `python -m uiautomator2 init` on Pixel 9a (API 36). Verify atx-agent installs and starts.
4. **ADB SMS query:** Verify `content query --uri content://sms/inbox` works on Android 16 without root.
5. **DroidRun Portal:** Install `droidrun-portal` APK, enable accessibility service, test basic commands.
6. **Play Integrity:** Run Play Integrity check with uiautomator2 atx-agent installed. Verify STRONG verdict still passes.
7. **Timing baseline:** Measure actual latency for a scripted Swiggy order (search → add → checkout) vs DroidRun-driven order.

### 11.2 Implementation Plan

**Phase 1 — Scripted Flows (1-2 days):**
- Install uiautomator2 on Panda, init on Pixel 9a
- Map UI selectors for Swiggy, Zomato, Zepto (dump hierarchy on key screens)
- Implement `order_from_swiggy()`, `order_from_zomato()`, `order_from_zepto()` functions
- Wire as Annie tools (Telegram trigger: "Annie, order biryani from Swiggy")

**Phase 2 — LLM Fallback (1 day):**
- Install DroidRun on Panda
- Install droidrun-portal on Pixel 9a, enable accessibility
- Implement fallback: if scripted flow fails → hand off to DroidRun agent with Nemotron
- Add screenshot-based verification after order placement

**Phase 3 — Robustness (ongoing):**
- Add popup dismissal database (grow KNOWN_DISMISSALS as new popups appear)
- Add retry logic with exponential backoff
- Monitor for app updates that break selectors → alert Rajesh
- Add payment confirmation safety gate (require Rajesh approval before final "Place Order" if amount > threshold)

### 11.3 Safety Gates

**CRITICAL — Never auto-confirm orders above a threshold without approval:**

```python
APPROVAL_THRESHOLD = 1000  # INR

async def confirm_order_with_approval(amount: float, details: str) -> bool:
    if amount > APPROVAL_THRESHOLD:
        # Ask Rajesh via Telegram
        approved = await telegram_ask(
            f"Annie wants to place an order for Rs.{amount:.0f}:\n{details}\n\nApprove?"
        )
        return approved
    return True  # Auto-approve small orders
```

---

## Sources

- [Swiggy — Google Play Store](https://play.google.com/store/apps/details?id=in.swiggy.android)
- [Zomato — Google Play Store](https://play.google.com/store/apps/details?id=com.application.zomato)
- [Zepto — Google Play Store](https://play.google.com/store/apps/details?id=com.zeptoconsumerapp)
- [DroidRun — GitHub](https://github.com/droidrun/droidrun)
- [DroidRun Portal — GitHub](https://github.com/droidrun/droidrun-portal)
- [DroidRun Documentation](https://docs.droidrun.ai/v2/overview)
- [DroidRun Ollama Guide](https://droidrun.mintlify.app/v3/guides/ollama)
- [DroidRun AndroidWorld Benchmark](https://droidrun.ai/benchmark/)
- [DroidRun State-of-the-Art Method](https://droidrun.ai/benchmark/method/)
- [DroidClaw — GitHub](https://github.com/unitedbyai/droidclaw)
- [mobile-use (Minitap) — GitHub](https://github.com/minitap-ai/mobile-use)
- [AutoGLM Phone 9B — Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B)
- [Open-AutoGLM — GitHub](https://github.com/zai-org/Open-AutoGLM)
- [Droid-MCP — PyPI](https://pypi.org/project/droid-mcp/)
- [DroidMind — GitHub](https://github.com/hyperb1iss/droidmind)
- [DroidBot-GPT — GitHub](https://github.com/MobileLLM/DroidBot-GPT)
- [droidrunnerd (HTTP queue server) — GitHub](https://github.com/8ff/droidrunnerd)
- [openatx/uiautomator2 — GitHub](https://github.com/openatx/uiautomator2)
- [uiautomator2 API Docs](https://uiautomator2.readthedocs.io/en/latest/api.html)
- [Decoding UI/UX of India's Food Delivery Apps — Medium](https://adadithya.medium.com/decoding-ui-ux-of-indias-food-delivery-apps-ea395c0c0432)
- [Swiggy vs Zomato UX — Medium](https://medium.com/@diginext27/swiggy-vs-zomato-who-nails-the-food-delivery-ux-in-india-3ccb520476c8)
- [Swiggy Tech Stack — Quora](https://www.quora.com/What-is-the-technology-stack-of-Swiggy)
- [Play Integrity API — Android Developers](https://developer.android.com/google/play/integrity/overview)
- [Play Integrity App Access Risk — Google](https://support.google.com/googleplay/android-developer/answer/11395166)
- [Appdome MobileBOT Defense](https://appdome.com/how-to/mobile-bot-detection/mobile-bot-defense/mobilebot-detection)
- [Android Accessibility Service — Android Developers](https://developer.android.com/guide/topics/ui/accessibility/service)
- [Google Play Accessibility Service Policy](https://support.google.com/googleplay/android-developer/answer/10964491)
- [Android 16 Behavior Changes](https://developer.android.com/about/versions/16/behavior-changes-16)
- [ADB Content Query for SMS — PhoneSploit-Pro Discussion](https://github.com/AzeemIdrisi/PhoneSploit-Pro/discussions/74)
- [From OpenClaw to DroidClaw — Medium](https://medium.com/write-a-catalyst/from-openclaw-to-droidclaw-your-friendly-guide-to-android-ai-automation-2b17de9ed67f)
- [Mobile AI Agents Tested 2026 — AImultiple](https://aimultiple.com/mobile-ai-agent)
