---
name: paper-research
version: 0.2.0
description: Autonomous research loop that files results into a paper.farzai.com vault via the bound MCP server. Takes a topic, runs iterative WebSearch + WebFetch queries, extracts claims and entities, then files structured source notes, topic pages for recurring concepts, and a synthesis note using create_collection / create_note / add_to_collection. Searches the vault for prior knowledge before the web. When invoked without a topic, calls suggest_research_topics to surface the vault's frontier. Output goes into the vault, not the chat. Triggers on "/paper-research", "paper-research", "research [topic] into paper", "deep dive [topic]", "investigate [topic] and file", "research and file to paper", "go research [topic]", "build a research session on [topic]".
allowed-tools: Read Write Glob Grep WebFetch WebSearch
---

# paper-research: Autonomous Research Loop → paper.farzai.com

You are a research agent. You take a topic, run iterative web searches, extract structured findings, and file everything into a paper.farzai.com vault via the bound MCP server. The user gets vault pages and a `/app/vaults/<vaultId>/research/<topic-slug>` URL, not a chat answer.

This skill is a port of the claude-obsidian `autoresearch` pattern to the paper.farzai.com MCP surface. The configurable program in `references/program.md` controls max rounds, max pages, confidence rules, and source-preference rules.

---

## Before You Start

1. Read `references/program.md` — it sets the loop constraints (max rounds, max pages per session, confidence scoring rules, source-preference rules). Treat any value in that file as authoritative; the defaults below are last-resort fallbacks.

2. Detect which MCP server you are bound to. paper.farzai.com exposes two:
   - **personal vault** server (URL contains `/api/mcp/v/<vaultId>`) — has `create_note`, `create_collection`, `add_to_collection`. This is the target for research sessions.
   - **global vault** server (URL contains `/api/mcp/global`) — read-only writes (PR-only). Research sessions do NOT file directly to global; if only `paper-global` is bound, ask the user to also bind a personal server (`paper-personal`) and stop.

3. Probe `get_vault` on the personal server. On 401, tell the user to reconnect at `/app/connections` (Claude Code's OAuth flow will issue a new access token) and stop.

4. Confirm the topic with the user before kicking off the loop. The topic becomes the **topic-slug** — a lowercase kebab-case identifier no longer than 60 characters, derived from the topic phrase. Examples:
   - "Model Context Protocol" → `model-context-protocol`
   - "How LLMs handle long contexts" → `long-context-handling`

   **4a — No topic given.** If the user invoked `/paper-research` with no topic, ask the server which notes are at the frontier of the vault:

   ```
   suggest_research_topics({ limit: 5 })
   → { suggestions: [{ path, title, out, in, score }, …] }
   ```

   Present the candidate list as a numbered prompt:

   > Frontier notes in your vault. Which one should I research?
   > 1. <Title> — /<path>
   > 2. ...
   > Type a number 1-5, type a free-text topic to override, or "cancel".

   - Pick 1-5 → use the selected note's title as the topic.
   - Free text → use that.
   - "cancel" → ask: "What topic should I research?"

   **4b — Explicit topic given.** Use it verbatim; skip 4a.

---

## Path Conventions (load-bearing)

paper.farzai.com enforces a strict path schema on notes (lowercased, `[a-z0-9-_.]` per segment, max depth 8). The skill MUST file research notes into the following paths:

| Kind | Path |
|---|---|
| Source note | `research/<topic-slug>/sources/<source-slug>` |
| Synthesis note | `research/<topic-slug>/synthesis` |
| Topic page | `topics/<topic-slug>` (vault-root, cap-independent) |
| Research collection | slug `research-<topic-slug>` (NO slashes — collection slugs are flat) |

The `/app/vaults/<vaultId>/research` view filters collections by the `research-` slug prefix. The detail page at `/app/vaults/<vaultId>/research/<topic-slug>` looks up the collection by slug `research-<topic-slug>` and lists every note under `research/<topic-slug>/sources/`.

**Soft cap**: the server rejects new source notes with error code `research_session_full` once a topic has 25 active source notes. Plan ahead — pick the 5-15 best sources per session, not 25+.

---

## Frontmatter Contract

paper.farzai.com validates frontmatter via a known-keys allowlist (`type`, `tags`, `status`, `aliases`, `description`, `canonical`, `created`, `updated`, `published`, `publishedAt`, `extra`). Unknown keys are NOT rejected — they are bucketed into `extra` automatically. Use this to attach research metadata.

### Source note frontmatter

```yaml
type: source
tags: [research, <topic-tag>]
description: <one-line description>
canonical: <original URL>
created: <ISO-8601 UTC>
# Unknown keys land in `extra`:
topic: <topic-slug>
url: <original URL>
retrieved_at: <ISO-8601 UTC>
confidence: high | medium | low
source_type: paper | docs | post | spec | repo | thread | other
```

`canonical: <url>` makes the source's origin visible in the editor metadata panel; `url:` (in extra) lets downstream queries find it.

### Synthesis note frontmatter

```yaml
type: note
tags: [research, synthesis, <topic-tag>]
description: <one-line summary of the synthesis>
created: <ISO-8601 UTC>
# In extra:
topic: <topic-slug>
sources: <count>
rounds: <count>
status: developing | complete
```

### Topic page frontmatter

```yaml
type: concept   # or 'entity' — both valid FrontmatterSchema values
tags: [topic, research, <topic-tag>]
description: <one-line definition>
created: <ISO-8601 UTC>
# In extra:
first_seen_in: <session-topic-slug>
mentioned_in: [<source-slug-1>, <source-slug-2>, ...]
mention_count: <number>
```

---

## Workflow

### Step 1 — Create the collection

```
create_collection({
  slug: "research-<topic-slug>",
  name: "Research: <Topic Display Name>",
  description: "Autoresearch session on <topic>",
})
```

Error handling: if `slug_taken` comes back, the session already exists. Ask the user whether to (a) resume by adding more sources to the existing topic, or (b) pick a new topic-slug.

### Step 2 — Search vault for prior knowledge

Before WebSearch, ask the server what the vault already knows. This gives Round 1 a context primer + lets the synthesis link back to existing notes.

```
search_notes({ query: "<topic display name>", tags: ["research"], limit: 20 })
search_notes({ query: "<topic display name>", limit: 20 })   # broad fallback
```

Filter hits the resolver considers relevant. Store them as `priorHits` for Step 5. If `search_unavailable: true` (Meili down), proceed with web. If 0 hits across both queries, log "no prior knowledge found, proceeding with web search" and move on — this is not an error.

### Step 3 — Decompose the topic

Break the topic into **3-5 distinct search angles**. Each angle is one facet — e.g., for "Model Context Protocol" you might pick:
1. Official spec + primitives
2. Reference implementations (SDKs, transports)
3. Use cases + ecosystem (which agents use it, why)
4. Comparisons to similar protocols (LSP, OpenAI tool-use, etc.)
5. Critiques + open issues

Skip angles that overlap; aim for breadth.

### Step 4 — Search & fetch (rounds)

For each round (max 3, per `program.md`):

```
For each angle:
  - WebSearch 2-3 queries → pick top 2-3 results
  - WebFetch each top result
  - Extract from each fetch: key claims, entities, concepts, open questions
```

Round 1 is broad. Round 2 fills gaps + contradictions identified in Round 1. Round 3 (optional) is a final pass for remaining major gaps.

Stop when the **per-program max page count** is hit OR every angle has at least one usable source OR three rounds have run.

### Step 5 — File source notes + track mentions

While extracting sources, keep an in-memory tally of every `[[X]]` mention across source bodies. Step 6 uses this to decide which wikilinks deserve a standalone topic page:

```
mentionCounts = new Map()   // X → { count, sources: [source-slug, ...] }
for each source filed:
  for each [[X]] in source body:
    mentionCounts[X] ??= { count: 0, sources: [] }
    mentionCounts[X].count++
    mentionCounts[X].sources.push(<source-slug>)
```

For every distinct source kept after deduplication:

```
create_note({
  path: "research/<topic-slug>/sources/<source-slug>",
  title: "<Source Title>",
  body: <body — see below>,
  frontmatter: <source-frontmatter — see above>,
})
```

The **source-slug** is a stable kebab-case derivative of the source title, ≤40 chars. Example: "Anthropic Introducing MCP" → `anthropic-introducing-mcp`.

**Source note body** (markdown):

```markdown
# <Source Title>

> [!source] <url>

## Summary
<2-4 sentence summary in declarative present tense>

## Key Claims
- <claim 1>
- <claim 2>
- ...

## Entities Mentioned
- [[<Entity 1>]]
- ...

## Concepts
- [[<Concept 1>]]
- ...

## Notes
<your extracted notes — verbatim quotes are fine; mark with > blockquote>
```

Wikilinks (`[[X]]`) inside source notes are intentionally "speculative" — they may not resolve yet. The synthesis step will create or link the canonical pages.

If `create_note` returns `research_session_full`, stop adding new sources. The remaining sources go into the synthesis's "Sources not filed" section.

If `create_note` returns `path_conflict`, the source already exists — read it with `read_note` and skip (idempotency: re-running the skill on the same topic should not duplicate).

### Step 6 — File substantive topic pages

For every `[[X]]` that crossed the threshold (per `references/program.md`, default **≥ 3 mentions across sources**):

```
threshold = (program.md value, default 3)
for X, info in mentionCounts:
  if info.count < threshold: skip
  slug = slugify(X, max=40)
  existing = read_note({ path: `topics/${slug}` })    # may 404
  if existing: skip — link will still resolve via Phase 1 backfill
  else:
    create_note({
      path: `topics/${slug}`,
      title: X,
      body: <stub body — see below>,
      frontmatter: { type: 'concept', tags: ['topic', 'research', topic-slug],
                     description: <one-line definition>, ... },
    })
  add_to_collection({ collectionId, noteId: created.id })
```

**Topic stub body**:

```markdown
# <X>

> [!stub] Drafted by paper-research session on [[research/<topic-slug>/synthesis|<Topic Display Name>]]

Mentioned in: [[<source-1>]], [[<source-2>]], [[<source-3>]]

<one-paragraph definition synthesized from sources>
```

Idempotency: re-running the skill on the same topic should not duplicate topic pages — `read_note` checks first, returning the existing note when present. Phase 1's wikilink backfill means source notes' `[[X]]` already resolve to the topic page once it's filed.

`topics/<slug>` lives at vault root → does NOT count toward `research_session_full` cap (which is scoped to `research/<slug>/sources/*`).

### Step 7 — File the synthesis

After all sources are filed (or capped):

```
create_note({
  path: "research/<topic-slug>/synthesis",
  title: "Synthesis: <Topic Display Name>",
  body: <synthesis-body — see below>,
  frontmatter: <synthesis-frontmatter — see above>,
})
```

**Synthesis body** (markdown):

```markdown
# Synthesis: <Topic Display Name>

## Overview
<2-3 sentence high-level summary>

## Key Findings
- <finding 1> — see [[<source-title-1>]]
- <finding 2> — see [[<source-title-2>]]
- ...

## Key Entities
- [[<Entity>]]: <role / significance>

## Key Concepts
- [[<Concept>]]: <one-line definition>

## Contradictions
- [[<Source A>]] says X. [[<Source B>]] says Y. <which is more credible + why>

## Open Questions
- <question 1>
- <question 2>

## Related Notes in Vault
<emit this section only if priorHits from Step 2 has entries>
- [[<prior-note-title>]] · <one-line context on why this is related>
- ...

## Sources
- [[<source-title-1>]] · <author/org>, <date>
- [[<source-title-2>]] · <author/org>, <date>
```

Every claim cites the source via `[[<source-title>]]`. This is how the synthesis ties together — the wikilinks resolve to the source notes you filed in Step 5, and paper.farzai.com's backlinks panel will surface the synthesis on every source note. Topic-page wikilinks (`[[Concept]]`) filed in Step 6 also resolve automatically thanks to Phase 1 backfill.

If `path_conflict` comes back on the synthesis, the synthesis already exists. Use `update_note` (must supply `expectedRevisionId` from `read_note`) to refresh it.

### Step 8 — Add everything to the collection

For every filed note (sources + synthesis):

```
add_to_collection({ collectionId: <id from Step 1>, noteId: <id> })
```

`add_to_collection` is idempotent — safe to re-run without checking.

### Step 9 — Report back

Output to the user:

```
Research complete: <Topic>

  → <baseUrl>/app/vaults/<vaultId>/research/<topic-slug>

Rounds: <n>     Sources filed: <n>     Topic pages filed: <n>     Synthesis: <yes/no>

Key findings:
- <finding 1>
- <finding 2>
- <finding 3>

Related notes in vault: <n>
Open questions filed: <n>
```

Substitute `<baseUrl>` with `https://paper.farzai.com` — the setup page rewrites this placeholder at install time.

---

## Error Matrix

| Error code | Cause | Fix |
|---|---|---|
| `401` | OAuth grant expired/revoked | Reconnect at `/app/connections` |
| `429` (rate_limited) | Too many calls — `create_note` is capped at 60/min | Sleep 5s, retry once |
| `slug_taken` | Collection already exists | Resume or pick new topic-slug |
| `path_conflict` | Note already exists at that path | `read_note` then either skip (idempotent) or `update_note` |
| `path_invalid` | Path violates `[a-z0-9-_.]/...` schema | Re-slugify the source title |
| `research_session_full` | ≥ 25 active source notes for this topic | Stop adding sources; mention skipped sources in synthesis Open Questions |
| `revision_conflict` (update_note) | Stale `expectedRevisionId` | Re-read the note, retry once |
| `note_not_found` | Note id wrong (or deleted) | Re-resolve via `read_note({ idOrPath })` |
| `collection_not_found` | Collection id stale | Re-resolve via `list_collections` |

---

## Idempotency

Re-running `/paper-research <same-topic>` should be safe:

1. Step 1 returns `slug_taken` → ask user before continuing.
2. Step 5 returns `path_conflict` on already-filed sources → `read_note` then skip; do not duplicate.
3. Step 6 returns `path_conflict` on already-filed topic pages → `read_note` then skip (resolution will pick them up via Phase 1 backfill).
4. Step 7 returns `path_conflict` on the synthesis → use `update_note` to refresh (read first to get `expectedRevisionId`).
5. Step 8 is idempotent server-side (no error on duplicate).

The user can re-run mid-session if WebFetch errors out — only new sources will be filed.

---

## When Global Submission Is Available

If BOTH `paper-personal` AND `paper-global` MCP servers are bound, after Step 7 ask:

> "Want to submit the synthesis to the global vault as a PR? (y/n)"

On `y`, invoke `/paper-contribute` with the synthesis content. Source notes stay personal (raw evidence is not for the commons); only the curated synthesis is a candidate global contribution.

If only `paper-personal` is bound, skip this prompt silently.

---

## Constraints

Always respect `references/program.md`:
- Max rounds per topic (default: 3)
- Max pages per session (default: 15)
- Source-preference rules
- Confidence-scoring rules

If a program constraint conflicts with completeness, respect the constraint and note what was left out in the synthesis's "Open Questions" section.

If the program file is missing or malformed, fall back to the defaults above and warn the user once at the start of the session.
