# Research: Paper-to-Notebook (VizuaraAI/paper-to-notebook)

**Date:** 2026-03-27 | **Session:** 373 | **Verdict: REJECTED — use Claude Code CLI instead**

## What It Does

Converts research papers (PDF or arXiv URL) into executable Jupyter notebooks with real PyTorch implementations. Users upload a paper, get back a runnable `.ipynb`.

**Repo:** `github.com/VizuaraAI/paper-to-notebook`

## Architecture

A thin orchestration layer over Gemini 2.5 Pro API. Four-step LLM pipeline:

| Step | Purpose | Max Tokens |
|------|---------|------------|
| **Analyze** | Extract metadata, algorithms, equations from PDF | 8,192 |
| **Design** | Plan toy implementation (architecture, training config) | 8,192 |
| **Generate** | Produce complete notebook as JSON cell array (11 sections) | 65,536 |
| **Validate** | Review cells for undefined vars, missing imports, placeholders | 65,536 |

**Key design choice:** No local PDF parsing. Raw PDF bytes are sent directly to Gemini's multimodal API via `types.Part.from_bytes()`. The LLM handles all text extraction.

### Tech Stack
- **Backend:** FastAPI + `google-genai` + `nbformat`
- **Frontend:** Next.js 14 + Tailwind (not relevant for Annie)
- **LLM:** Gemini 2.5 Pro (hardcoded, multimodal PDF input)
- **Dependencies:** `google-genai`, `nbformat`, `httpx`, `fastapi`

### Core Logic (~400 lines of Python)
- 4 prompt templates (the actual IP — ~200 lines of prompt engineering)
- `call_gemini_with_retry()` — retry with exponential backoff [5s, 15s, 30s]
- `parse_llm_json()` — strips markdown fences, calls Gemini to repair malformed JSON
- `build_notebook()` — ~20 lines of nbformat assembly
- arXiv URL handling — regex ID extraction, async PDF download via httpx

## Cost Analysis

| Model | Input (per 1M) | Output (per 1M) | Est. per Paper |
|-------|----------------|------------------|----------------|
| Gemini 2.5 Pro | $1.25 | $10.00 | ~$1.71 |
| Gemini 2.5 Flash | $0.30 | $2.50 | ~$0.43 |
| Claude Sonnet | $3.00 | $15.00 | ~$2.79 |

Estimates assume ~200K input tokens + ~146K output tokens across 4 pipeline steps.

Gemini free tier is available but rate-limited.

## Why We Rejected It

1. **It's just a Gemini API wrapper.** The entire "product" is 4 prompts + JSON parsing + nbformat assembly. No novel algorithms, no local ML models, no special PDF processing.

2. **Adds unnecessary dependency.** Would require `google-genai` SDK + Gemini API key — a third LLM provider alongside Claude API and Nemotron. Maintenance burden for minimal value.

3. **Claude Code CLI already does this.** Claude Code can read PDFs (multimodal), write PyTorch code, create .ipynb files (JSON format), and run/validate the output. It naturally follows the same analyze→design→generate→validate pattern when given a complex task.

4. **Already accessible via Telegram.** The existing `/claude` command routes to Claude Code on Titan. No new tool needed:
   ```
   /claude Read the paper at arxiv.org/abs/XXXX.XXXXX and create a
   Jupyter notebook implementing it in PyTorch. Save to /tmp/paper_impl.ipynb
   ```

## What's Worth Stealing

### Patterns for Paper-to-Notebook Specifically

- **The 4-step prompt chain pattern** is sound. Breaking paper→notebook into analyze/design/generate/validate produces better results than a single massive prompt.
- **The 11-section notebook structure** (title, problem intuition, imports, dataset, model, loss, baseline, main algorithm, inference, experiment, visualizations, summary) is a good template.
- **Prompt constraint techniques** that force complete output: "NO placeholders like `#TODO` or `pass`", "use REAL PyTorch (`torch.nn.Module`, actual training loops)", explicit section numbering with exact count.

### Patterns Transferable to Annie (Quick Wins)

1. **LLM-Based JSON Repair** — When LLM returns malformed JSON, call it again with "You are a JSON repair tool. Return only valid JSON." + the broken text. Cheap (small I/O) and would save Annie from silent failures on Beast's occasional malformed tool calls. Annie currently has no recovery path for this.

2. **Concurrency Semaphore for Expensive Ops** — `asyncio.Semaphore(N)` around Claude API calls. Annie has GPU queue for model inference but no backpressure on Claude API. Multiple simultaneous Telegram users or subagents can fire unbounded requests.

### Patterns Transferable to Annie (Architectural)

3. **Multi-Step Chained Pipeline** — Annie's subagents are single-shot (one prompt, one response). This repo chains steps where each output becomes the next input. Generalizes to: deep research (gather → synthesize → draft → review), email drafting (context → tone → draft → critique), scheduling (constraints → propose → validate → confirm). A generic `run_pipeline(steps)` utility would enable these.

4. **Draft Availability Mid-Pipeline** — Notebook is usable after step 3 (generate), before step 4 (validate). User gets a draft immediately while validation runs in background. Annie currently blocks until entire operation completes. For long-running tasks, surfacing intermediate results feels much more responsive.

5. **Validation as Separate LLM Step** — The LLM reviews its own output for undefined variables, missing imports, and logical flow. Transferable to any Annie tool that generates code (`execute_python`, Claude Code delegation, subagent drafts).

## Revised Decision: Kernel-Orchestrated Agent

**Original verdict** was "just use Claude Code CLI." But further exploration revealed that Annie's kernel `AgentOrchestrator` (`orchestrator.py`) natively supports multi-step chained pipelines — making a first-class paper-to-notebook agent trivial to build (~200 lines, zero kernel extensions).

### Architecture

```
Telegram: "Convert this paper: https://arxiv.org/abs/XXXX.XXXXX"
  → Annie intent recognition → kernel TaskQueue.submit(priority=NORMAL)
  → PaperToNotebookOrchestrator (3 stages, Claude API):
      Stage 1: ANALYZE  — httpx downloads PDF, Claude extracts structure/algorithms
      Stage 2: GENERATE — Claude produces complete notebook cells (chained from stage 1)
      Stage 3: VALIDATE — Claude self-reviews for undefined vars, missing imports, fixes
  → .ipynb saved to ~/.her-os/annie/task_results/<task-id>.ipynb
  → Telegram bot sends file attachment
```

### Components Needed

| Component | File | Lines |
|-----------|------|-------|
| Orchestrator subclass | `paper_notebook_agent.py` (new) | ~150 |
| Agent YAML definition | `agents/paper-to-notebook.yaml` (new) | ~15 |
| Tool schema + dispatch | `text_llm.py` (modify) | ~20 |
| ToolAdapter | `tool_adapters.py` (modify) | ~15 |
| Dependency | `requirements.txt` (+nbformat) | 1 |

### Why This Works Natively

- `AgentOrchestrator` (`orchestrator.py:86-283`) — template method pattern, sequential stages with output chaining, per-stage retries, approval gates
- `AgentDiscovery` (`agent_discovery.py:124-272`) — YAML-driven, hot-reload on file change
- `TaskQueue` (`task_queue.py:32-327`) — priority scheduling, persistence, voice gating
- `task_delivery.py` — result files for Telegram pickup
- `httpx` already in requirements (for arXiv PDF download)
- Claude `AsyncAnthropic` already wired in `subagent_tools.py`

### Prompts to Steal from VizuaraAI Repo

The 4 prompt templates in `backend/app.py` (lines 60-255) are MIT-licensed. Key elements:
- 11-section notebook structure (title → problem intuition → imports → dataset → model → loss → baseline → main algorithm → inference → experiment → visualizations → summary)
- Constraint forcing: "NO placeholders", "use REAL PyTorch", explicit section count
- Validation checklist: undefined variables, missing imports, syntax errors, logical flow

We collapse their 4 steps to 3 (merge analyze+design into one Claude call — Claude handles both well in a single pass).

## Stealable Patterns (General)

Quick wins (#1 JSON repair, #2 concurrency semaphore) are adoptable independently of this project. See "Patterns Transferable to Annie" sections above.