crossmem
crossmem is a local-first citation and knowledge pipeline. It captures academic papers (arXiv PDFs today, YouTube and more coming), compiles them into structured wiki notes with verbatim quotes and provenance metadata, and serves them to AI agents via MCP.
What it does
- Capture — downloads a paper, extracts metadata from arXiv + CrossRef + OpenAlex, generates a deterministic cite key
- Compile — parses the PDF (via Marker or pdftotext), splits into paragraph-level chunks with bounding-box provenance, runs a local LLM (Ollama) to add paraphrase and implication per chunk
- Verify — re-hashes every chunk’s text against its stored SHA-256; detects silent drift
- Cite & Recall — MCP tools that let Claude (or any MCP client) look up citations and search your wiki
Design principles
- Verbatim quotes are ground truth. The LLM only touches paraphrase/implication fields, never the original text.
- Provenance is first-class. Every chunk carries page, section, bounding box, SHA-256 hash, and byte range back to the source PDF.
- Metadata is cross-verified. Title, authors, and year must agree across at least two canonical sources (arXiv, CrossRef, OpenAlex). Disagreements surface as warnings, not silent picks.
- Everything runs locally. No cloud APIs. Ollama for LLM, Marker for PDF parsing, all on your Mac.
Quick links
Installation
From source
cargo install --path .
Or directly from GitHub:
cargo install --git https://github.com/crossmem/crossmem-rs
Dependencies
crossmem requires the following tools for PDF capture and compilation:
| Tool | Purpose | Install |
|---|---|---|
pdftotext | Fallback PDF text extraction | brew install poppler |
marker | PDF parsing with bounding boxes (default) | pip install marker-pdf or uvx marker-pdf |
| Ollama | Local LLM for paraphrase/implication | ollama.com |
Ollama model setup
crossmem uses llama3.2:3b by default. Pull the model before your first compile:
ollama pull llama3.2:3b
Override the model with the CROSSMEM_OLLAMA_MODEL environment variable:
CROSSMEM_OLLAMA_MODEL=mistral crossmem compile vaswani2017attention
Verify installation
crossmem --version
Quick Start
Capture a paper, compile it, and cite it — in under 30 seconds.
1. Capture
Download an arXiv paper and extract metadata:
crossmem capture https://arxiv.org/abs/1706.03762
Output:
[capture] arxiv_id: 1706.03762
[capture] title: Attention Is All You Need
[capture] cite_key: vaswani2017attention
[capture] saved to ~/crossmem/raw/...
2. Compile
Parse the PDF into chunks and run the LLM pass:
crossmem compile vaswani2017attention
This produces a wiki note at ~/crossmem/wiki/<timestamp>_vaswani2017attention.md with:
- YAML frontmatter (title, authors, year, DOI, cite_key)
- Five citation formats (APA, MLA, Chicago, IEEE, BibTeX)
- Per-chunk verbatim quotes with paraphrase, implication, and provenance metadata
3. Cite via MCP
Add crossmem to Claude Code:
claude mcp add crossmem -- crossmem mcp serve
Then ask Claude:
Cite vaswani2017attention in APA format.
Claude calls crossmem_cite and returns:
Vaswani, A., & Shazeer, N. (2017). Attention Is All You Need. arXiv preprint arXiv:1706.03762.
4. Search your wiki
Ask Claude:
What do I have on self-attention mechanisms?
Claude calls crossmem_recall and returns matching excerpts ranked by relevance, with cite keys and deep links to your wiki files.
Writing a Paper with crossmem
An end-to-end playbook for AI agents (Claude Code, Cursor, etc.) and their human authors. You have crossmem installed and the MCP server registered. You want to cite prior work correctly and quote-faithfully.
1. One-time setup
Install crossmem and its dependencies:
# Install crossmem
cargo install --path .
# Or, from the repo directly:
# cargo install --git https://github.com/crossmem/crossmem-rs
# Local LLM for paraphrase/implication generation
ollama pull llama3.2:3b
# PDF parser (preferred — produces bounding boxes)
pip install marker-pdf
# Fallback: brew install poppler (provides pdftotext)
Register the MCP server so your agent can call crossmem_cite and crossmem_recall:
Claude Code:
claude mcp add crossmem -- crossmem mcp serve
Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"crossmem": {
"command": "crossmem",
"args": ["mcp", "serve"]
}
}
}
2. Capturing a paper
crossmem capture https://arxiv.org/abs/1706.03762
Output:
[capture] arxiv_id: 1706.03762
[capture] title: Attention Is All You Need
[capture] cite_key: vaswani2017attention
[capture] saved to ~/crossmem/raw/1776227254_vaswani2017attention.pdf
This does three things:
- Downloads the PDF to
~/crossmem/raw/<timestamp>_<cite_key>.pdf - Fetches metadata from arXiv, CrossRef, and OpenAlex — reconciles across all three
- Generates a deterministic cite key via the pattern DSL
Then compile it into a wiki entry:
crossmem compile vaswani2017attention
This parses the PDF (Marker by default), splits it into chunks, runs each through Ollama for paraphrase and implication, and emits the wiki note at ~/crossmem/wiki/<timestamp>_vaswani2017attention.md.
Note: YouTube ingestion is design-only — see YouTube Ingestion Pipeline.
Capturing non-arXiv papers
Most journal papers (e.g. JCP, Nature, PRL) are not on arXiv. crossmem capture supports them through DOI lookup and local PDF import.
If you have a DOI — CrossRef metadata is fetched automatically:
# DOI URL
crossmem capture https://doi.org/10.1063/5.0012345
# Bare DOI
crossmem capture 10.1063/5.0012345
If the paper is open-access, the PDF downloads via Unpaywall. Otherwise you’ll get instructions to download it manually.
If you already have the PDF — the most common path for paywalled journals:
# With DOI (recommended — gets full CrossRef metadata)
crossmem capture ~/Downloads/smith2023.pdf --doi 10.1063/5.0012345
# Without DOI — extracts what it can from PDF metadata
crossmem capture ~/Downloads/smith2023.pdf --cite-key smith2023transport
Direct PDF URL — for preprint servers, institutional repos:
crossmem capture https://chemrxiv.org/paper.pdf --doi 10.1234/chemrxiv.5678
All paths produce the same raw/ + .meta.json output. Then compile as usual:
crossmem compile smith2023transport
For a JCP submission with 24 references, a typical workflow is:
# Capture each reference — most will be local PDFs with DOIs
for pdf in ~/papers/jcp-refs/*.pdf; do
doi=$(grep -oP '10\.\d{4,9}/[^\s]+' <<< "$(pdftotext "$pdf" - | head -5)")
crossmem capture "$pdf" --doi "$doi"
done
# Then compile each one
for meta in ~/crossmem/raw/*.meta.json; do
key=$(jq -r .cite_key "$meta")
crossmem compile "$key"
done
3. The compiled wiki entry — what the agent sees
Frontmatter
---
cite_key: vaswani2017attention
title: "Attention Is All You Need"
authors:
- "Ashish Vaswani"
- "Noam Shazeer"
year: 2017
arxiv_id: "1706.03762"
doi: "10.48550/arXiv.1706.03762"
captured_at: "1776227254"
raw: "~/crossmem/raw/1776227254_vaswani2017attention.pdf"
pdf_sha256: "9a8f3b..."
parser: "marker"
chunks: 47
meta:
sources: ["arxiv", "crossref", "openalex"]
reconciled: true
warnings: []
---
After the frontmatter, five citation formats are pre-generated: APA, MLA, Chicago, IEEE, and BibTeX.
Chunks
Each chunk carries verbatim text, LLM-generated derivatives, and full provenance:
<!-- chunk id=p4s32c1 -->
> The dominant sequence transduction models are based on complex recurrent or
> convolutional neural networks that include an encoder and a decoder.
**Paraphrase:** Prior sequence models relied on RNNs or CNNs in an encoder-decoder setup.
**Implication:** This dependency on recurrence was the bottleneck the Transformer aimed to eliminate.
```yaml
provenance:
page: 4
section: "3.2 Scaled Dot-Product Attention"
bbox: [72.0, 340.5, 523.8, 412.1]
text_sha256: "5f3e1c..."
byte_range: [18342, 19104]
```
Hard rule for agents: The > blockquote is the verbatim original extracted from the PDF. When citing, the agent MUST copy from this blockquote. NEVER fabricate or rephrase quotes. The Paraphrase and Implication fields exist for the agent’s reasoning and search — they do not belong in the paper as attributed quotes.
4. Agent prompts that actually work
Finding relevant chunks
“Search my library for how transformer attention was originally motivated. Return cite_keys and page numbers.”
Agent calls:
crossmem_recall("transformer attention motivation", limit=5)
Returns a ranked list of {cite_key, title, section, excerpt}. The agent picks the most relevant hits and reports them.
Quoting with provenance
“Write a paragraph introducing self-attention. Quote vaswani2017attention page 2 verbatim, then paraphrase in my voice. Include BibTeX.”
Agent workflow:
- Calls
crossmem_recall("self-attention vaswani2017attention")to find the right chunk - Reads the wiki file to locate the page-2 chunk
- Copies the
>blockquote verbatim into the draft as a block quote - Writes a surrounding paraphrase in the author’s voice (informed by the
Paraphrasefield, not copying it) - Calls
crossmem_cite("vaswani2017attention", "bibtex")for the BibTeX entry - Embeds the
text_sha256and page reference as an HTML comment socrossmem verifycan trace provenance:
% crossmem: vaswani2017attention p4s32c1 sha256=5f3e1c...
\begin{quote}
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder.
\end{quote}
\cite{vaswani2017attention}
Citing multiple papers
“Compare how Vaswani 2017 and Devlin 2019 frame the importance of pre-training.”
Agent calls crossmem_recall("pre-training importance"), gets hits from both papers, reads the relevant chunks, and writes a comparison paragraph quoting both — each quote traced to its chunk ID.
Running a drift check
After the human edits the draft (or the agent revises it), verify that no quotes have been accidentally mutated:
crossmem verify
Output when clean:
[verify] checked 94 chunks across 3 wiki entries
[verify] 0 drifts detected
Output when a quote was altered:
[verify] DRIFT in vaswani2017attention chunk p4s32c1
expected: 5f3e1c...
actual: a1b2c3...
[verify] 1 drift detected
Exit code 1 means drift — the agent or human must restore the original quote from the wiki.
Building the bib file
Collect all \cite{...} keys from a LaTeX draft and emit a single .bib:
grep -oP '\\cite\{[^}]+\}' draft.tex \
| sed 's/\\cite{//;s/}//' \
| tr ',' '\n' \
| sort -u \
| while read key; do crossmem mcp serve <<< "{\"method\":\"tools/call\",\"params\":{\"name\":\"crossmem_cite\",\"arguments\":{\"cite_key\":\"$key\",\"format\":\"bibtex\"}}}"; done
Or, have the agent do it: “Collect every cite key from my draft and produce a references.bib file using crossmem_cite.”
5. What crossmem protects against
| Failure mode | How crossmem prevents it |
|---|---|
| Hallucinated citation metadata | Multi-source reconciliation: arXiv + CrossRef + OpenAlex, ≥2 must agree. Disagreements surface as warnings in frontmatter. |
| Hallucinated quotes | Agent contract: never compose original text, only copy the > blockquote. crossmem verify catches any post-hoc mutation via SHA-256 re-hashing. |
| Wrong page numbers | Every chunk carries page, section, and bbox — the reader can trace back to the exact PDF region. |
| Lost context | byte_range preserves the exact location in the raw PDF. Chunks retain their section heading for navigation. |
| Cite key collisions | Deterministic pattern DSL with a–z suffix tiebreaker (then _<count> if all 26 are taken). |
6. Limits
Be honest about what crossmem cannot do today:
- Scanned / image-only PDFs: Marker’s OCR quality varies. Chunks from poorly scanned pages may have garbled text.
- Math-heavy pages: The pipeline does not run Nougat or other math-aware extractors. Equations may appear as lossy Unicode approximations or be missing entirely.
- Non-arXiv sources: Journal papers captured via DOI or local PDF have single-source metadata (CrossRef only), so there is no cross-verification. Books and conference proceedings with non-standard DOIs may produce incomplete frontmatter.
- Single-author workflow: There is no shared library, sync, or multi-user conflict resolution. Each machine has its own
~/crossmem/directory. - Ollama dependency: Compile requires a running Ollama instance. If Ollama is down or the model is missing, compile will fail.
7. Minimal paper-writing session
A scripted walkthrough — capture two papers, write an intro paragraph, verify.
# Capture two papers
crossmem capture https://arxiv.org/abs/1706.03762
crossmem compile vaswani2017attention
crossmem capture https://arxiv.org/abs/1810.04805
crossmem compile devlin2019bert
Now prompt the agent:
“Write an introductory paragraph for my Related Work section. It should cite both vaswani2017attention and devlin2019bert, quoting one key sentence from each verbatim. Output LaTeX with \cite commands and the BibTeX entries.”
The agent:
- Calls
crossmem_recall("attention mechanism transformer", limit=5)andcrossmem_recall("pre-training bidirectional", limit=5) - Reads the wiki entries for both papers, selects one chunk each
- Produces:
The Transformer architecture replaced recurrence with self-attention:
\begin{quote}
``The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder.''
\end{quote}
\cite{vaswani2017attention}. Building on this,
BERT demonstrated that bidirectional pre-training could be applied to a wide
range of NLP tasks:
\begin{quote}
``We introduce a new language representation model called BERT, which stands
for Bidirectional Encoder Representations from Transformers.''
\end{quote}
\cite{devlin2019bert}.
% crossmem: vaswani2017attention p1s0c1 sha256=...
% crossmem: devlin2019bert p1s0c1 sha256=...
- Calls
crossmem_cite("vaswani2017attention", "bibtex")andcrossmem_cite("devlin2019bert", "bibtex")to emitreferences.bib
Finally, verify nothing drifted:
crossmem verify
# [verify] checked 94 chunks across 2 wiki entries
# [verify] 0 drifts detected
The quotes in your LaTeX match the raw PDFs. Ship it.
crossmem capture
Download a paper and extract metadata.
Usage
crossmem capture <input> [--doi <doi>] [--cite-key <key>]
Input types
| Input | Example | Detection |
|---|---|---|
| Local PDF file | /path/to/paper.pdf | Path exists on disk |
| arXiv URL or bare ID | https://arxiv.org/abs/1706.03762, 1706.03762 | arXiv URL pattern or bare numeric ID |
| DOI URL or bare DOI | https://doi.org/10.1038/nature12373, 10.1038/nature12373 | DOI URL prefix or 10.NNNN/... pattern |
| Direct PDF URL | https://example.com/paper.pdf | HTTPS URL ending in .pdf |
Inputs are matched in the order above — first match wins.
Flags
| Flag | Description |
|---|---|
--doi <doi> | Attach a DOI to the capture. For local files and PDF URLs, fetches CrossRef metadata for this DOI. |
--cite-key <key> | Override the auto-generated cite key. |
What it does
arXiv input (existing behavior)
- Extracts the arXiv ID from the URL
- Fetches metadata from arXiv API
- Cross-checks metadata against CrossRef and OpenAlex (reconciliation)
- Downloads the PDF from arXiv
- Generates a cite key using the configured pattern DSL
- Saves PDF +
.meta.jsonsidecar
DOI input
- Fetches metadata from CrossRef API
- Tries Unpaywall API for an open-access PDF URL (requires
CROSSMEM_UNPAYWALL_EMAILenv var) - If no open-access PDF found, prints instructions to download manually and use local file capture
Local PDF file
- Copies (not moves) the PDF to
~/crossmem/raw/<timestamp>_<cite_key>.pdf - If
--doigiven: fetches CrossRef metadata - If no
--doi: tries extracting embedded PDF metadata viapdfinfo(Title, Author, CreationDate) - If no metadata found and no
--cite-key: errors with instructions
Direct PDF URL
- Downloads the PDF
- Then follows the same metadata path as local file (CrossRef via
--doi, orpdfinfofallback)
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (invalid input, download failure, metadata fetch failure) |
| 2 | Missing arguments |
Environment variables
| Variable | Description |
|---|---|
CROSSMEM_UNPAYWALL_EMAIL | Email address for Unpaywall API (required for DOI→PDF lookup) |
See config.toml for cite key configuration.
Examples
arXiv paper
$ crossmem capture https://arxiv.org/abs/1706.03762
[capture] arxiv_id: 1706.03762
[capture] title: Attention Is All You Need
cite_key: vaswani2017attention
Journal paper via DOI
$ crossmem capture 10.1063/5.0012345
[capture] DOI: 10.1063/5.0012345
cite_key: smith2023molecular
Local PDF with DOI metadata
$ crossmem capture ~/Downloads/paper.pdf --doi 10.1063/5.0012345
[capture] Local file: /Users/me/Downloads/paper.pdf
[capture] Fetching CrossRef metadata for DOI 10.1063/5.0012345
cite_key: smith2023molecular
Local PDF with manual cite key
$ crossmem capture ~/Downloads/paper.pdf --cite-key jones2024transport
[capture] Local file: /Users/me/Downloads/paper.pdf
cite_key: jones2024transport
Direct PDF URL
$ crossmem capture https://example.com/papers/preprint.pdf --doi 10.1234/example
cite_key: doe2024example
Storage layout
~/crossmem/raw/
<timestamp>_<cite_key>.pdf # Raw PDF
<timestamp>_<cite_key>.meta.json # Metadata sidecar
The .meta.json file contains the reconciled metadata used by compile:
{
"cite_key": "smith2023molecular",
"title": "Molecular dynamics simulation of transport",
"authors": ["John Smith", "Jane Doe"],
"year": 2023,
"arxiv_id": "",
"doi": "10.1063/5.0012345",
"container_title": "The Journal of Chemical Physics",
"sources": ["crossref"],
"reconciled": true,
"warnings": []
}
crossmem compile
Parse a captured PDF into structured wiki chunks with LLM-generated paraphrase and implication.
Usage
crossmem compile <cite_key>
Arguments
| Argument | Description |
|---|---|
<cite_key> | The cite key printed by crossmem capture. Example: vaswani2017attention |
What it does
- Finds the raw PDF and
.meta.jsonfor the given cite key in~/crossmem/raw/ - Parses the PDF using Marker (preferred) or
pdftotext(fallback) - Splits content into paragraph-level chunks with bounding-box provenance
- Computes SHA-256 hash for each chunk’s verbatim text
- Sends each chunk to Ollama for paraphrase and implication generation
- Generates five citation formats (APA, MLA, Chicago, IEEE, BibTeX)
- Emits the final wiki note to
~/crossmem/wiki/<timestamp>_<cite_key>.md
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (cite key not found, Ollama unreachable, parse failure) |
Environment variables
| Variable | Default | Description |
|---|---|---|
CROSSMEM_OLLAMA_MODEL | llama3.2:3b | Ollama model used for paraphrase/implication generation |
Example
$ crossmem compile vaswani2017attention
[compile] loading raw PDF for vaswani2017attention
[compile] parsing with Marker (MPS)...
[compile] 47 chunks extracted
[compile] compiling chunk 1/47...
...
[compile] wiki saved to ~/crossmem/wiki/1776227300_vaswani2017attention.md
PDF parsing tiers
| Tier | Parser | When used | Bounding boxes |
|---|---|---|---|
| 0 | pdftotext -layout | Fallback when Marker unavailable | No |
| 1 | Marker (MPS) | Default for arXiv papers | Yes (polygon per block) |
The parser tier is recorded in the wiki frontmatter as the parser field.
LLM contract
The LLM (Ollama) is only allowed to generate paraphrase and implication fields. It never touches:
- Original verbatim text (from PDF extractor)
- Metadata fields (from reconciler)
- Citation strings (deterministic generator)
- Provenance data (from parser)
crossmem verify
Verify chunk integrity by re-hashing verbatim text against stored SHA-256 hashes.
Usage
crossmem verify [cite_key]
Arguments
| Argument | Description |
|---|---|
[cite_key] | Optional. If provided, only verify chunks for this cite key. If omitted, verify all wiki entries. |
What it does
- Walks
~/crossmem/wiki/for all.mdfiles (or the one matchingcite_key) - For each wiki entry with a
cite_keyin frontmatter:- Extracts all
<!-- chunk id=... -->blocks - Finds the
text_sha256in each chunk’s provenance YAML - Re-computes SHA-256 from the verbatim quoted text (
> ...lines) - Reports any mismatches as “DRIFT”
- Extracts all
- Prints summary: total chunks checked, total drifts detected
Exit codes
| Code | Meaning |
|---|---|
| 0 | All chunks verified, no drift |
| 1 | Error or drift detected |
Example
$ crossmem verify vaswani2017attention
[verify] Checking vaswani2017attention
Verified 47 chunks, 0 drift(s) detected.
$ crossmem verify
[verify] Checking vaswani2017attention
[verify] Checking lecun2015deep
Verified 93 chunks, 0 drift(s) detected.
When drift is detected:
DRIFT: vaswani2017attention chunk p4s32c1
expected: 5f3e1c...
actual: a8b2d4...
Verified 47 chunks, 1 drift(s) detected.
crossmem mcp serve
Start the MCP (Model Context Protocol) server on stdio.
Usage
crossmem mcp serve
What it does
Starts an MCP server that communicates over stdin/stdout, providing two tools to any MCP client:
crossmem_cite— look up a citation by cite keycrossmem_recall— search the wiki for matching entries
The server loads wiki entries from ~/crossmem/wiki/ and serves them to the connected client.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Clean shutdown |
| 1 | Server error |
Environment variables
| Variable | Default | Description |
|---|---|---|
RUST_LOG | warn | Log level (logs go to stderr, not stdout — stdout is the MCP transport) |
Adding to Claude Code
claude mcp add crossmem -- crossmem mcp serve
This registers crossmem as an MCP server that Claude Code will start automatically.
Adding to Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"crossmem": {
"command": "crossmem",
"args": ["mcp", "serve"]
}
}
}
crossmem serve
Run the HTTP/WebSocket relay bridge that connects CLI tools and agents to the crossmem Chrome extension.
Usage
crossmem serve
crossmem # 'serve' is the default when no subcommand is given
What it does
Starts an HTTP + WebSocket server on 127.0.0.1:7600 (configurable). The Chrome extension connects via WebSocket; CLI tools and agents send commands via HTTP.
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/status | GET | Connection status: connected extensions, pending command count |
/command | POST | Send a command to the extension and wait for its response |
/dialog_response | POST | Send a dialog response back to the extension |
/capture | POST | Screen recording capture handler |
/ or /ws | WS | Extension WebSocket connection |
Exit codes
| Code | Meaning |
|---|---|
| 0 | Clean shutdown (SIGINT or SIGTERM) |
| 1 | Bind failure (port already in use) |
Environment variables
| Variable | Default | Description |
|---|---|---|
BRIDGE_PORT | 7600 | Port to listen on |
RUST_LOG | info | Log level |
Example
$ crossmem serve
[bridge] crossmem-bridge v0.1.0
[bridge] HTTP → http://127.0.0.1:7600/status
[bridge] HTTP → http://127.0.0.1:7600/command
[bridge] WS → ws://127.0.0.1:7600/
[bridge] waiting for extension...
Sending a command
curl -X POST http://127.0.0.1:7600/command \
-H 'Content-Type: application/json' \
-d '{"action":"navigate","params":{"url":"https://example.com"}}'
Checking status
curl -s http://127.0.0.1:7600/status | jq .
Chrome extension
The bridge is designed to work with the crossmem Chrome extension. The extension connects via WebSocket and executes commands using chrome.scripting.executeScript.
Supported actions: navigate, click, type, wait, extract, screenshot, summarize, tab_info, ping.
For multi-agent use, add "agentId": "my-agent" to commands to isolate tab control.
MCP Integration
crossmem exposes two tools via the Model Context Protocol (MCP), allowing AI agents to look up citations and search your wiki without leaving the conversation.
Tools
| Tool | Description |
|---|---|
crossmem_cite | Look up a citation by cite key and return it in a specified format |
crossmem_recall | Search the wiki for entries matching a query |
Setup
Claude Code
claude mcp add crossmem -- crossmem mcp serve
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"crossmem": {
"command": "crossmem",
"args": ["mcp", "serve"]
}
}
}
Agent usage prompts
Once crossmem is registered as an MCP server, you can ask your agent things like:
- “Cite vaswani2017attention in APA format.”
- “Give me the BibTeX for vaswani2017attention.”
- “What do I have on attention mechanisms?”
- “Search my wiki for papers about transformer architectures.”
- “Find all papers by Vaswani in my library.”
How it works
The MCP server (crossmem mcp serve) runs on stdio. It loads all .md files from ~/crossmem/wiki/ on startup, parses their YAML frontmatter and body, and responds to tool calls by searching this in-memory index.
Logs go to stderr (not stdout), so they don’t interfere with the MCP JSON-RPC transport.
crossmem_cite
Look up a citation by cite key and return it in the requested format.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
cite_key | string | yes | — | Citation key, e.g. vaswani2017attention |
format | string | no | bibtex | One of: bibtex, apa, mla, chicago, ieee |
Returns
The formatted citation string extracted from the wiki file’s citation section.
Success
Vaswani, A., & Shazeer, N. (2017). Attention Is All You Need. arXiv preprint arXiv:1706.03762.
Cite key not found
If the cite key doesn’t match any wiki entry, returns the top 5 fuzzy matches:
Error: cite_key 'vaswani' not found. Did you mean:
- vaswani2017attention — Attention Is All You Need
Format not found
If the cite key exists but the wiki file is missing the requested citation section:
Error: cite_key 'vaswani2017attention' found but no APA citation section in wiki file.
File: /Users/you/crossmem/wiki/1776227300_vaswani2017attention.md
Fuzzy matching
When an exact match fails, the tool scores candidates by:
- Full cite key substring match (+10)
- Full title substring match (+5)
- Per-token cite key match (+3 each)
- Per-token title match (+2 each)
The top 5 candidates are returned as suggestions.
crossmem_recall
Search the crossmem wiki for entries matching a query. Returns matching excerpts ranked by relevance.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | yes | — | Search query string |
limit | integer | no | 5 | Max results to return (capped at 20) |
Returns
A ranked list of matching wiki entries, each with:
- Index number
- Cite key and title
- Section where the match was found
- Excerpt (up to 400 characters) with surrounding context
- Deep link to the wiki file
Example response
1. [vaswani2017attention] Attention Is All You Need
section: p.4 §3.2 Scaled Dot-Product Attention
excerpt: ...We call our particular attention "Scaled Dot-Product Attention". The input consists of queries and keys of dimension dk...
link: file:///Users/you/crossmem/wiki/1776227300_vaswani2017attention.md
2. [lecun2015deep] Deep Learning
section: p.12 §4 Attention Mechanisms
excerpt: ...Attention mechanisms have become an integral part of sequence modeling...
link: file:///Users/you/crossmem/wiki/1776300000_lecun2015deep.md
No results
No results for query: 'quantum computing'
Scoring
Results are ranked by total token frequency: each whitespace-delimited query token is counted across the entry’s title and body. Higher count = higher rank.
config.toml
crossmem reads configuration from ~/.crossmem/config.toml. If the file doesn’t exist, it’s created with defaults on first run.
Location
~/.crossmem/config.toml
Full reference
[cite_key]
pattern = "[auth:lower][year][shorttitle:1:nopunct]"
Sections
[cite_key]
| Key | Default | Description |
|---|---|---|
pattern | [auth:lower][year][shorttitle:1:nopunct] | Pattern DSL for generating cite keys. See cite_key Pattern DSL. |
Environment variables
These are not in config.toml but affect crossmem’s behavior:
| Variable | Default | Description |
|---|---|---|
CROSSMEM_OLLAMA_MODEL | llama3.2:3b | Ollama model for compile pass |
BRIDGE_PORT | 7600 | Bridge server port |
RUST_LOG | info (bridge) / warn (MCP) | Log level filter |
Data directories
crossmem stores all data under ~/crossmem/:
~/crossmem/
raw/ # Downloaded PDFs + metadata JSON sidecars
wiki/ # Compiled wiki notes (markdown)
cite_key Pattern DSL
crossmem generates citation keys using a pattern DSL inspired by Better BibTeX. The pattern is configured in ~/.crossmem/config.toml:
[cite_key]
pattern = "[auth:lower][year][shorttitle:1:nopunct]"
Syntax
A pattern is a string of tokens (inside [brackets]) and literal characters (outside brackets).
[field:modifier1:modifier2]literal_text[field2]
Tokens
| Token | Description | Example output |
|---|---|---|
auth | First author’s last name | Vaswani |
authors | All authors’ last names concatenated | VaswaniShazeer |
year | Publication year | 2017 |
shorttitle | First N significant words from title (stop words filtered) | attention |
title | Full title | Attention Is All You Need |
shorttitle behavior
shorttitle filters out common stop words (a, an, the, is, are, was, for, of, with, …) and takes the first N remaining words. N is specified as a numeric modifier.
Example with title “Attention Is All You Need”:
[shorttitle:1]→attention[shorttitle:3]→attentionneed(after filtering “Is”, “All”, “You”)
Modifiers
Modifiers are appended to the token with : separators and applied in order:
| Modifier | Description | Example |
|---|---|---|
lower | Lowercase | VASWANI → vaswani |
upper | Uppercase | vaswani → VASWANI |
nopunct | Remove all non-alphanumeric characters | hello-world! → helloworld |
condense | Remove all whitespace | hello world → helloworld |
N (digit) | For shorttitle: take first N words. For other fields: take first N whitespace-delimited words. | [shorttitle:1] → first significant word |
Examples
Default pattern
pattern = "[auth:lower][year][shorttitle:1:nopunct]"
| Paper | Generated key |
|---|---|
| Vaswani et al., “Attention Is All You Need”, 2017 | vaswani2017attention |
| LeCun et al., “Deep Learning”, 2015 | lecun2015deep |
All authors
pattern = "[authors:lower][year]"
| Paper | Generated key |
|---|---|
| Vaswani & Shazeer, “Attention Is All You Need”, 2017 | vaswanishazeer2017 |
With literal separator
pattern = "[auth:lower]_[year]"
| Paper | Generated key |
|---|---|
| Vaswani et al., 2017 | vaswani_2017 |
Full title condensed
pattern = "[title:condense:lower]"
| Paper | Generated key |
|---|---|
| “Attention Is All You Need” | attentionisallyouneed |
Multi-word short title
pattern = "[auth:lower][year][shorttitle:3:nopunct]"
| Paper | Generated key |
|---|---|
| Vaswani et al., “Attention Is All You Need”, 2017 | vaswani2017attentionneed |
Collision resolution
If a generated key collides with an existing entry, crossmem appends a suffix:
- Try
athroughz:vaswani2017attention→vaswani2017attentiona - If all 26 letters exhausted, append
_<count>:vaswani2017attention_27
Wiki Frontmatter
Every wiki note in ~/crossmem/wiki/ starts with YAML frontmatter between --- delimiters.
Fields
| Field | Type | Required | Description |
|---|---|---|---|
cite_key | string | yes | DSL-generated citation key. Example: vaswani2017attention |
title | string | yes | Paper title |
authors | list[string] | yes | List of author names |
year | integer | yes | Publication year |
arxiv_id | string | yes (arXiv) | arXiv identifier, e.g. 1706.03762 |
doi | string | no | DOI (may be preprint DOI) |
doi_preprint | string | no | Preprint DOI (e.g. 10.48550/arXiv.1706.03762) |
doi_published | string | no | Published version DOI (if paper was published in a journal) |
captured_at | string | yes | Unix timestamp of capture |
raw | string | yes | Path to the raw PDF file |
pdf_sha256 | string | yes | SHA-256 hash of the raw PDF bytes |
parser | string | yes | Parser used: marker, pdftotext |
chunks | integer | yes | Number of chunks in the document |
meta.sources | list[string] | yes | Metadata sources used: arxiv, crossref, openalex |
meta.reconciled | boolean | yes | Whether metadata was cross-verified across sources |
meta.warnings | list[string] | no | Warnings from metadata reconciliation |
Example
---
cite_key: vaswani2017attention
title: "Attention Is All You Need"
authors:
- "Ashish Vaswani"
- "Noam Shazeer"
- "Niki Parmar"
year: 2017
arxiv_id: "1706.03762"
doi: "10.48550/arXiv.1706.03762"
captured_at: "1776227254"
raw: "~/crossmem/raw/1776227254_vaswani2017attention.pdf"
pdf_sha256: "9a8f3b..."
parser: "marker"
chunks: 47
meta:
sources: ["arxiv", "crossref", "openalex"]
reconciled: true
warnings: []
---
Citations section
After the frontmatter, the wiki body starts with a title heading and a ## Citations section containing five subsections:
# Attention Is All You Need
## Citations
### APA
Vaswani, A., & Shazeer, N. (2017). Attention Is All You Need. arXiv preprint arXiv:1706.03762.
### MLA
Vaswani, Ashish, et al. "Attention Is All You Need" arXiv preprint arXiv:1706.03762 (2017).
### Chicago
Vaswani, Ashish, and Noam Shazeer. "Attention Is All You Need" arXiv preprint arXiv:1706.03762 (2017).
### IEEE
A. Vaswani et al., "Attention Is All You Need" arXiv preprint arXiv:1706.03762, 2017.
### BibTeX
```bibtex
@article{vaswani2017attention,
title={Attention Is All You Need},
author={Ashish Vaswani and Noam Shazeer},
year={2017}
}
Chunk Format
After the citations section, each wiki note contains a series of chunks. Each chunk preserves verbatim text from the source PDF along with provenance metadata.
Chunk structure
<!-- chunk id=p4s32c1 -->
> We call our particular attention "Scaled Dot-Product Attention".
> The input consists of queries and keys of dimension dk, and values
> of dimension dv.
**Paraphrase:** The authors name their mechanism "Scaled Dot-Product Attention" and define its inputs.
**Implication:** This naming convention becomes the standard terminology used across the field.
```yaml
provenance:
page: 4
section: "3.2 Scaled Dot-Product Attention"
bbox: [72.0, 340.5, 523.8, 412.1]
text_sha256: "5f3e1c..."
byte_range: [18342, 19104]
## Chunk ID format
Chunk IDs follow the pattern `p{page}s{section}c{chunk}`:
| Part | Description | Example |
|------|-------------|---------|
| `p{N}` | Page number | `p4` = page 4 |
| `s{N}` | Section number within page | `s32` = section 3.2 |
| `c{N}` | Chunk number within section | `c1` = first chunk |
## Fields
### Verbatim text
Lines starting with `> ` contain the original text extracted from the PDF. This text is **never modified by the LLM** — it comes directly from the PDF parser.
### Paraphrase
A 1–2 sentence LLM-generated summary of the chunk's content. Generated by Ollama during `crossmem compile`.
### Implication
A 1–2 sentence LLM-generated statement about why this chunk matters to the field. Generated by Ollama during `crossmem compile`.
### Provenance
YAML metadata block attached to each chunk:
| Field | Type | Description |
|-------|------|-------------|
| `page` | integer | Page number in the source PDF |
| `section` | string | Section heading (if detected by parser) |
| `bbox` | `[f64; 4]` | Bounding box `[x_min, y_min, x_max, y_max]` in PDF coordinates. Present when parsed with Marker. |
| `text_sha256` | string | SHA-256 hash of the verbatim text. Used by `crossmem verify` to detect drift. |
| `byte_range` | `[usize; 2]` | `[start, end]` byte offset in the source PDF content stream. Present when available from parser. |
## Chunk types
The `chunk_type` field (internal) classifies each chunk:
| Type | Description |
|------|-------------|
| `page` | Full-page text (from `pdftotext` fallback) |
| `heading` | Section heading |
| `paragraph` | Body paragraph (from Marker block tree) |
| `figure` | Figure caption |
| `table` | Table content |
| `equation` | Mathematical expression |
## Integrity verification
Run `crossmem verify` to re-hash every chunk's verbatim text and compare against the stored `text_sha256`. Any mismatch indicates the wiki file has been modified since compilation.
Pipeline Overview
crossmem’s citation pipeline transforms a URL into a structured, verifiable wiki note.
Pipeline diagram
graph TD
A[crossmem capture URL] --> B[Download PDF]
B --> C[Fetch arXiv metadata]
C --> D[Reconcile: CrossRef + OpenAlex]
D --> E[Generate cite_key via DSL]
E --> F["Save raw PDF + .meta.json"]
G[crossmem compile cite_key] --> H[Load raw PDF + metadata]
H --> I{Marker available?}
I -->|Yes| J[Marker: paragraph chunks + bbox]
I -->|No| K[pdftotext: page-level chunks]
J --> L[Compute SHA-256 per chunk]
K --> L
L --> M[Ollama: paraphrase + implication per chunk]
M --> N[Generate 5 citation formats]
N --> O["Emit wiki markdown to ~/crossmem/wiki/"]
P[crossmem verify] --> Q[Walk wiki files]
Q --> R[Re-hash chunk text]
R --> S{SHA-256 match?}
S -->|Yes| T[OK]
S -->|No| U[DRIFT detected]
V[crossmem mcp serve] --> W[Load wiki entries]
W --> X[crossmem_cite: lookup by key]
W --> Y[crossmem_recall: search by query]
Why capture and compile are separate
capture is lightweight and idempotent: it issues API calls to arXiv, CrossRef, and OpenAlex, downloads the PDF, and writes metadata. You can re-run it to refresh metadata without re-parsing. compile is heavyweight: it invokes Marker (or another PDF parser) and Ollama to produce chunk-level paraphrases and implications. Separating the two lets you swap the PDF parser (Marker → Nougat → GROBID) or change the LLM model without re-downloading anything. It also enables a practical workflow: batch-capture dozens of papers first, then compile them at leisure — or only compile the ones that turn out to be relevant.
Stage details
Capture
- URL parsing — extracts arXiv ID from various URL formats (
/abs/,/pdf/, bare ID) - PDF download — fetches PDF, computes SHA-256, saves to
~/crossmem/raw/ - Metadata fetch — queries arXiv API for title, authors, year
- Metadata reconciliation — cross-checks against CrossRef (via DOI) and OpenAlex. Flags disagreements as warnings in frontmatter.
- Cite key generation — applies the configured pattern DSL to the reconciled metadata
Compile
- PDF parsing — Marker (with MPS acceleration) produces paragraph-level blocks with bounding-box coordinates. Falls back to
pdftotext -layoutfor page-level extraction. - Chunk assembly — blocks are grouped into typed chunks (paragraph, heading, figure, table, equation) with unique IDs
- Provenance — each chunk gets page, section, bbox, SHA-256, and byte range
- LLM pass — Ollama generates paraphrase and implication for each chunk. The LLM never sees or modifies the original text.
- Citation generation — deterministic formatting into APA, MLA, Chicago, IEEE, BibTeX
- Emission — final wiki markdown written to
~/crossmem/wiki/
Verify
Walks all wiki files, re-extracts verbatim text from > blockquote lines, re-computes SHA-256, and compares against stored text_sha256 in provenance blocks. Reports any drifts.
MCP serve
Loads wiki entries into memory, exposes crossmem_cite (lookup by cite key with fuzzy matching) and crossmem_recall (full-text search with relevance ranking) over stdio MCP transport.
Data Model
Core types
ReconciledMetadata
The metadata reconciler merges data from multiple sources into a single canonical record.
#![allow(unused)]
fn main() {
pub struct ReconciledMetadata {
pub title: String,
pub authors: Vec<String>,
pub year: u16,
pub arxiv_id: String,
pub doi: Option<String>,
pub doi_preprint: Option<String>,
pub doi_published: Option<String>,
pub sources: Vec<String>, // e.g. ["arxiv", "crossref", "openalex"]
pub warnings: Vec<String>,
pub reconciled: bool,
}
}
ChunkV2
The paragraph-level chunk with full provenance.
#![allow(unused)]
fn main() {
pub struct ChunkV2 {
pub chunk_type: String, // "page", "heading", "paragraph", etc.
pub chunk_id: String, // e.g. "p1s1c1"
pub page: usize,
pub text: String, // Verbatim extracted text
pub provenance: Provenance,
pub paraphrase: Option<String>, // LLM-generated
pub implication: Option<String>,// LLM-generated
}
}
Provenance
Tracks exactly where a chunk came from in the source PDF.
#![allow(unused)]
fn main() {
pub struct Provenance {
pub page: usize,
pub section: Option<String>,
pub bbox: Option<[f64; 4]>, // [x_min, y_min, x_max, y_max]
pub text_sha256: String,
pub byte_range: Option<[usize; 2]>,
}
}
WikiEntry (MCP)
The in-memory representation used by the MCP server.
#![allow(unused)]
fn main() {
struct WikiEntry {
cite_key: Option<String>,
title: String,
authors: Vec<String>,
year: Option<u16>,
source: Option<String>,
date: Option<String>,
file_path: PathBuf,
body: String,
}
}
Storage layout
~/crossmem/
├── raw/ # Capture output
│ ├── <timestamp>_<cite_key>.pdf # Raw PDF
│ └── <timestamp>_<cite_key>.meta.json # Reconciled metadata
└── wiki/ # Compile output
└── <timestamp>_<cite_key>.md # Wiki note
Trust boundaries
| Data | Source | Verifiable? |
|---|---|---|
| Title, authors, year, DOI | Metadata reconciler (arXiv + CrossRef + OpenAlex) | Cross-source agreement |
| Cite key, citation strings | Deterministic generator | Pure function, unit-tested |
| Verbatim quote text | PDF extractor (Marker / pdftotext) | SHA-256 hash |
| Bounding box, byte range | PDF extractor | Re-extraction reproducibility |
| Paraphrase, implication | LLM (Ollama) | Not verifiable — advisory only |
Chunk-based Citation v2 Design
Status: Implemented (Phase 2 MVP shipped) Date: 2026-04-15
User requirement
How do we ensure citations are absolutely correct — 萬無一失?
One-line answer: Verbatim text + bbox provenance is ground truth; LLM only touches paraphrase/implication, never quotes; metadata is cross-verified across ≥2 canonical sources.
Competitor survey
| Tool | What it nails | What it misses |
|---|---|---|
| Zotero + Better BibTeX | Stable cite_key via JS-ish pattern DSL; key regeneration rules; 80%+ academic mind-share | No chunk/page content; just metadata container |
| Marker (datalab-to/marker) | PDF→markdown + polygon bbox per block, --keep_chars for char-level bboxes, JSON tree-per-page | Slower than pdftotext; needs CUDA/MPS |
| Nougat | Transformer-based; beats GROBID on formulas | VLM → hallucination risk on quote fidelity |
| GROBID | 68 fine-grained TEI labels; best on metadata + bibliography refs; 2–5s/page, 90%+ accuracy | Weak on formulas, figures, modern layouts |
| PaperQA2 | Chunk-size configurable; LLM re-rank + contextual summarization; grounded in-text citations | No bbox, chunk = N-char sliding window → page/fragment precision lost |
| Tensorlake RAG | Anchor tokens <c>2.1</c> inlined + bbox stored separately → auditable trail | Proprietary pipeline; design pattern is copyable |
| OpenAlex / CrossRef / Semantic Scholar | Each is a canonical metadata source | Each has gaps; must cross-reconcile |
The industry gold standard for “absolutely correct citation”:
- Parse once with bbox-aware extractor (Marker-class) → each block has
{page, polygon, text}. - Anchor tokens inlined at chunk build time (
<c>p4§3.2</c>) so LLM can only emit citation IDs it saw in context. - Resolve citation IDs → bbox + page at render time; users get deep-link to the exact PDF region.
- Metadata cross-check across OpenAlex + CrossRef + arXiv; flag inconsistencies instead of silently picking one.
- Quote is verbatim from the PDF text layer, stored with SHA-256 of the source bytes — any LLM-generated “quote” is rejected.
What Phase 1 got right / wrong
Right: pre-gen APA/MLA/Chicago/IEEE/BibTeX, deterministic cite_key, per-page original text preserved verbatim, paraphrase/implication separated from quote.
Wrong / gap:
- Metadata only from arXiv API (no CrossRef/OpenAlex cross-check)
- Quote preservation is page-level, not paragraph/sentence
- No bbox — can’t deep-link into PDF region
- No hash-based verifiability
- cite_key = primitive pattern vs Better BibTeX DSL
- No handling of preprint→published DOI mapping
Phase 2 architecture
2A. Metadata layer (the cite_key + bib trust root)
Pipeline:
arxiv_id → [arxiv API] ┐
→ [CrossRef] ├─→ reconcile → canonical metadata
→ [OpenAlex] ┘ │
├─→ cite_key (Better-BibTeX-style pattern, configurable)
├─→ 5 formats (APA/MLA/Chicago/IEEE/BibTeX)
└─→ DOI + published-version DOI (if preprint)
Rules:
- ≥2 sources must agree on title + first-author + year. Disagreement → emit
meta.warningsin frontmatter. - cite_key pattern DSL (ported from Better BibTeX):
[auth:lower][year][shorttitle:1:nopunct], configurable via~/.crossmem/config.toml. - Track preprint↔published mapping in
meta.doi_preprintandmeta.doi_published.
2B. PDF parsing layer (the chunk trust root)
Tiered strategy by document type + quality tier:
| Tier | Parser | Use when | Bbox? | Speed |
|---|---|---|---|---|
| 0 | pdftotext -layout | Fallback / pure text | No | instant |
| 1 | Marker (Mac MPS) | Default for arxiv | Yes, polygon/block | 1–3 s/page |
| 2 | GROBID (JVM, local) | Bib-references + structured metadata | Yes, TEI | 2–5 s/page |
| 3 | Nougat (MPS) | Formula-heavy pages | Partial | 5–15 s/page |
Phase 2 default: Marker for body + GROBID for bibliography, both run, merge into unified chunk tree.
2C. Chunk schema v2 (bbox + hash provenance)
---
cite_key: vaswani2017attention
meta:
sources: [arxiv, crossref, openalex]
reconciled: true
warnings: []
pdf_sha256: 9a8f...
...
---
## p.4 §3.2 Scaled Dot-Product Attention
<!-- chunk id=p4s32c1 -->
> We call our particular attention "Scaled Dot-Product Attention"...
provenance:
page: 4
section: "3.2 Scaled Dot-Product Attention"
bbox: [72.0, 340.5, 523.8, 412.1]
text_sha256: 5f3e1c...
byte_range: [18342, 19104]
**Paraphrase:** …
**Implication:** …
text_sha256= SHA-256 of the verbatim extracted text. Re-running the extractor must reproduce it, else the chunk is flagged stale.bbox+page= deep-link target:crossmem://pdf/{cite_key}#p=4&bbox=72,340,523,412.byte_range= PDF content-stream offset (from Marker); cheapest way to re-verify without re-extraction.
2D. LLM contract (what model is / isn’t allowed to touch)
| Field | Who writes | Verifiable? |
|---|---|---|
title, authors, year, doi, arxiv_id | Metadata reconciler | Cross-source check |
cite_key, 5 citation strings | Deterministic generator | Pure function, unit-tested |
original (the quote) | PDF extractor | SHA-256 + byte_range |
paraphrase, implication | LLM | Never trusted for provenance |
figure.caption | PDF extractor | bbox + OCR-of-caption-only |
figure.implication | LLM | Same rule: advisory text only |
The pipeline never asks the LLM to produce a quote. If a future feature wants “the key sentence on this page”, the LLM picks a sentence index from a numbered list of extracted sentences, never emits the sentence text.
2E. Paragraph- and figure-level chunking
- Paragraph splitter: Marker’s block tree →
paragraph-typed blocks become chunks (not pages). - Figure chunks: Marker
figureblocks → crop image toraw/figs/{cite_key}_fig{N}.png, caption extracted separately, implication runs on caption-only. - Table chunks: Marker
tableblock → markdown-table format, implication on markdown text. - Equation chunks: Nougat output in LaTeX, stored as
$$…$$, implication on LaTeX source.
2F. Idempotence + re-compile
- Re-running
captureis idempotent onarxiv_id: re-downloads only ifpdf_sha256differs. - Re-running
compilere-does LLM pass only for chunks whosetext_sha256changed. crossmem verify <cite_key>walks the wiki, re-extracts, re-hashes; reports any mismatches.
Implementation order
- Metadata reconciler (arxiv + crossref + openalex merge, warnings on disagreement)
- cite_key pattern DSL (Better-BibTeX-style, unit-tested)
- Marker integration via
uvx marker-pdfCLI (Python sidecar; Rust drives via subprocess + JSON) - Chunk schema v2 writer (paragraph/figure/table/equation chunks with bbox + hash)
- GROBID on-demand for bibliography references
crossmem verifycommand- Nougat sidecar for math-heavy pages (opt-in)
What this buys the user
Writing a paper citing Vaswani 2017 p.4 §3.2:
Before (Phase 1): User opens wiki, sees page-4 summary paragraph, pastes bibtex. May still need to open PDF to find exact sentence.
After (Phase 2):
- Wiki shows
§3.2as a dedicated chunk with verbatim quote. - Clicking the provenance block opens the PDF at page 4 with the bbox highlighted.
- Cite key
vaswani2017attentionis guaranteed stable across arxiv→NeurIPS preprint→published. crossmem verifyrun weekly confirms no wiki has silently drifted from its PDF source.
Sources
- PaperQA2 — chunk-size configurable RAG
- Tensorlake citation grounding — anchor token pattern
- Marker — PDF→markdown with bbox
- GROBID — structured metadata extraction
- Better BibTeX cite keys — pattern DSL
- OpenAlex Work object — DOI canonical
YouTube Ingestion Pipeline — Design Document
Status: Draft Author: crossmem team Date: 2026-04-15 Tracking: crossmem-rs#27
1. Overview
Extend crossmem capture <url> to detect youtube.com / youtu.be hosts and dispatch to a YouTube-specific pipeline that produces time-aligned wiki chunks — the video analog of the PDF chunk pipeline from #24.
The pipeline runs entirely local on an Apple Silicon Mac mini (M2/M4). No cloud APIs.
Pipeline stages
capture (download + extract audio/subs)
→ transcribe (whisper.cpp Metal)
→ keyframes (ffmpeg scene-cut)
→ OCR + VLM caption (per keyframe)
→ compile (Ollama paraphrase/implication per chunk)
→ emit wiki markdown
2. Download Path
Decision: yt-dlp binary
| Option | Pros | Cons |
|---|---|---|
| yt-dlp binary | Battle-tested, handles every edge case, active community, --cookies-from-browser for member-only | External dep, Python-based, updates frequently |
| libyt-dlp bindings | Tighter integration | No stable C API; Python FFI is fragile |
| youtube-rs (pure Rust) | No external dep | Incomplete, breaks on YT changes, no auth, no live/shorts |
yt-dlp wins because YouTube aggressively rotates extraction logic. Maintaining a pure-Rust extractor is a full-time job. yt-dlp is the industry standard for a reason.
Edge cases handled by yt-dlp flags
| Scenario | yt-dlp flags |
|---|---|
| Age-gated | --cookies-from-browser chrome (reads real Chrome cookies) |
| Member-only | Same cookie approach; user must be logged in |
| Live streams | --live-from-start --wait-for-video 30 (wait + download from start) |
| Shorts | Works as normal URLs (youtube.com/shorts/ID → standard extraction) |
| Playlists | --yes-playlist or --no-playlist (user flag; default: single video) |
| Chapters | --embed-chapters + --write-info-json (chapter list in info JSON) |
| Auto captions | --write-auto-subs --sub-lang en |
| Human captions | --write-subs --sub-lang en (preferred over auto when available) |
Download command template
yt-dlp \
--format "bestaudio[ext=m4a]/bestaudio/best" \
--extract-audio --audio-format wav --audio-quality 0 \
--write-info-json \
--write-subs --write-auto-subs --sub-lang "en.*" --sub-format vtt \
--embed-chapters \
--cookies-from-browser chrome \
--output "%(id)s.%(ext)s" \
--paths "$HOME/crossmem/raw/youtube/" \
"$URL"
For keyframe extraction we also need the video file:
yt-dlp \
--format "bestvideo[height<=1080][ext=mp4]/bestvideo[height<=1080]/best" \
--write-info-json \
--output "%(id)s_video.%(ext)s" \
--paths "$HOME/crossmem/raw/youtube/" \
"$URL"
3. Audio Extraction → Transcription
Decision: whisper.cpp with Metal acceleration, large-v3-turbo model
| Engine | Backend | Speed (1h audio, M2) | Accuracy | Notes |
|---|---|---|---|---|
| whisper.cpp | Metal (Apple GPU) | ~6–8 min | WER ~8% (large-v3-turbo) | C/C++, no Python, --print-timestamps for word-level |
| whisper-mlx | MLX (Apple GPU) | ~5–7 min | Same models | Python dep, MLX framework, slightly faster on M4 |
| WhisperKit | CoreML | ~5–6 min | Good | Swift-only, harder to call from Rust |
| insanely-fast-whisper | MPS (PyTorch) | ~10–15 min | Same models | Heavy Python stack, MPS less optimized than Metal |
| faster-whisper | CTranslate2 (CPU) | ~15–25 min | Same models | No Metal/MPS; CPU-only on macOS |
whisper.cpp wins because:
- Native Metal acceleration — no Python runtime
- Easily called from Rust via
std::process::Command(same pattern aspdftotextin cite.rs) - Outputs VTT/SRT/JSON with word-level timestamps
- Active project, models available via Hugging Face in ggml format
Model choice: large-v3-turbo
| Model | Params | VRAM | Disk | Speed (M2, 1h) | WER (en) |
|---|---|---|---|---|---|
| large-v3 | 1.55B | ~3 GB | 3.1 GB | ~12 min | ~7.5% |
| large-v3-turbo | 809M | ~1.6 GB | 1.6 GB | ~6 min | ~8% |
| distil-large-v3 | 756M | ~1.5 GB | 1.5 GB | ~5 min | ~9% |
large-v3-turbo is the sweet spot: half the VRAM of large-v3, nearly the same WER, 2× faster. distil-large-v3 is marginally faster but has slightly worse accuracy on non-native English speakers (common in academic talks).
Transcription command
whisper-cpp \
--model models/ggml-large-v3-turbo.bin \
--file "$HOME/crossmem/raw/youtube/${VIDEO_ID}.wav" \
--output-vtt \
--output-json \
--print-timestamps \
--language en \
--threads 4
Caption priority
- Human-uploaded subtitles (
.en.vttfrom yt-dlp) — highest quality, use as-is - whisper.cpp transcription — always run for timestamp alignment even if subs exist
- Auto-generated YouTube captions — fallback only; lower quality than whisper
When human subs exist, align them with whisper timestamps for precise time-coding.
Speaker diarization
Decision: Skip for P1, add in P3 if needed.
Rationale:
- Most YouTube content crossmem targets is solo presenter (lectures, conference talks, tutorials)
- pyannote requires Python + HF token + ~2 GB model; adds significant complexity
- sherpa-onnx is lighter but diarization accuracy on overlapping speech is still mediocre
- Can retrofit later: diarization produces
(speaker_id, start, end)segments that merge with existing transcript chunks
If multi-speaker content becomes common, P3 can add pyannote 3.1 with speaker embedding.
4. Visual Understanding
4a. Keyframe extraction
Decision: ffmpeg scene-cut detection
ffmpeg -i "${VIDEO_ID}_video.mp4" \
-vf "select='gt(scene,0.3)',showinfo" \
-vsync vfr \
-frame_pts 1 \
"${OUTPUT_DIR}/keyframe_%04d.png" \
2>&1 | grep "pts_time" > "${OUTPUT_DIR}/keyframe_times.txt"
| Method | Pros | Cons |
|---|---|---|
ffmpeg scene filter | Zero extra deps, timestamp-aware, tunable threshold | May over/under-extract |
| TransNetV2 | ML-based, higher accuracy | Python + PyTorch dep, overkill for slides |
| PySceneDetect | Good API | Python dep |
ffmpeg is already a required dependency (for audio extraction). Scene threshold 0.3 works well for slide-based content; can tune per-video.
Chapter-aware extraction: If the info JSON contains chapters, also extract one keyframe per chapter boundary (seek to chapter_start + 2s). Merge with scene-cut keyframes, deduplicate within 5s window.
Target: 1 keyframe per 30–120 seconds depending on content type. Cap at 200 keyframes per video.
4b. Per-keyframe VLM caption
Decision: Qwen2.5-VL-7B via Ollama (local)
Ollama already supports multimodal models. The existing Ollama integration in cite.rs targets http://localhost:11434/api/generate — the same endpoint accepts image input with "images": [base64_png].
{
"model": "qwen2.5-vl:7b",
"prompt": "Describe this video frame in one sentence. If it contains a slide, list the title and key bullet points.",
"images": ["<base64_keyframe>"],
"stream": false
}
| Model | VRAM | Speed (per frame, M2) | Quality |
|---|---|---|---|
| Qwen2.5-VL-7B (q4_K_M) | ~5 GB | ~3–5 sec | Good for slides/diagrams |
| LLaVA-1.6-7B | ~5 GB | ~3–5 sec | Slightly worse on text-heavy slides |
| Qwen2.5-VL-3B | ~2.5 GB | ~1–2 sec | Faster but misses fine text |
Qwen2.5-VL-7B is the best local VLM for slide/diagram content. 7B quantized fits comfortably alongside whisper on M2 (16 GB unified memory).
Batching: Process keyframes sequentially (VLM needs full GPU). At ~4 sec/frame × 100 frames = ~7 min. Acceptable.
4c. OCR on slides
Decision: Apple Vision framework via swift-ffi (primary), Tesseract (fallback)
| Engine | Accuracy | Speed | Dependencies |
|---|---|---|---|
| Apple Vision (VNRecognizeTextRequest) | Excellent, especially printed text | ~0.1s/image | macOS 13+, Swift FFI |
| PaddleOCR | Very good, multi-language | ~0.3s/image | Python + large model |
| Tesseract | Good for English | ~0.5s/image | brew install tesseract |
Apple Vision is the clear winner on macOS: built-in, fast, accurate, no extra deps. Access from Rust via a tiny Swift CLI helper:
// crossmem-ocr (Swift CLI, ~30 lines)
import Vision
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
// ... read image, perform request, print results as JSON
Compile as crossmem-ocr binary, call from Rust via Command::new("crossmem-ocr"). Ship as part of the crossmem install or build from source on first run.
Fallback: If the Swift helper isn’t available (Linux compat someday), fall back to tesseract --oem 1 -l eng.
5. Chunk Schema
Time-aligned chunk (parallel to CompiledChunk in cite.rs)
#![allow(unused)]
fn main() {
pub struct YouTubeChunk {
pub start_ms: u64,
pub end_ms: u64,
pub speaker: Option<String>, // None until diarization (P3)
pub transcript: String, // Whisper or human-sub text for this segment
pub slide_ocr: Option<String>, // OCR text if keyframe in this time range
pub keyframe_path: Option<String>, // Relative path to keyframe PNG
pub keyframe_caption: Option<String>, // VLM description of keyframe
pub paraphrase: String, // LLM-generated 1-2 sentence summary
pub implication: String, // LLM-generated field impact
}
}
Chunk boundaries
Priority order for segmentation:
- Chapters (from info JSON) — if present, each chapter = one chunk
- Scene cuts — if no chapters, split at scene-cut boundaries
- Fixed window — fallback: 60-second segments with sentence-boundary snapping
Within a chapter, if the chapter exceeds 5 minutes, sub-split at scene cuts or 60s intervals.
Minimum chunk: 10 seconds. Maximum chunk: 5 minutes (force-split at sentence boundary).
Metadata struct
#![allow(unused)]
fn main() {
pub struct YouTubeMetadata {
pub title: String,
pub channel: String,
pub upload_date: String, // YYYY-MM-DD
pub video_id: String,
pub duration_sec: u64,
pub chapters: Vec<Chapter>, // from info JSON
pub description: String,
pub tags: Vec<String>,
}
pub struct Chapter {
pub title: String,
pub start_sec: f64,
pub end_sec: f64,
}
}
Cite key
{channel_slug}{year}{first_noun_of_title}
Examples:
- 3Blue1Brown, “But what is a neural network?” (2017) →
3blue1brown2017neural - Andrej Karpathy, “Let’s build GPT from scratch” (2023) →
karpathy2023gpt - Two Minute Papers, “OpenAI Sora” (2024) →
twominutepapers2024sora
channel_slug = channel name lowercased, non-alphanumeric stripped, truncated to 20 chars.
Time-coded deep link
Each chunk carries a provenance URL:
https://youtu.be/{VIDEO_ID}?t={floor(start_ms / 1000)}
6. Citation Formats
APA 7th (online video)
{Channel} [{Channel}]. ({Year}, {Month} {Day}). {Title} [Video]. YouTube. https://www.youtube.com/watch?v={VIDEO_ID}
Example:
3Blue1Brown [3Blue1Brown]. (2017, October 5). But what is a neural network? [Video]. YouTube. https://www.youtube.com/watch?v=aircAruvnKk
MLA 9th
"{Title}." YouTube, uploaded by {Channel}, {Day} {Month} {Year}, www.youtube.com/watch?v={VIDEO_ID}.
Chicago 17th (note-bibliography)
{Channel}. "{Title}." {Month} {Day}, {Year}. Video, {Duration}. https://www.youtube.com/watch?v={VIDEO_ID}.
IEEE
{Channel}, "{Title}," YouTube. [Online Video]. Available: https://www.youtube.com/watch?v={VIDEO_ID}. [Accessed: {Access Date}].
BibTeX
@misc{cite_key,
author = {{Channel}},
title = {{Title}},
year = {Year},
month = {Month},
howpublished = {\url{https://www.youtube.com/watch?v=VIDEO_ID}},
note = {[Video]. YouTube. Accessed: YYYY-MM-DD}
}
7. Wiki Markdown Output
Follows the same structure as the ArXiv wiki notes. Example:
---
cite_key: 3blue1brown2017neural
title: "But what is a neural network?"
channel: "3Blue1Brown"
upload_date: "2017-10-05"
video_id: "aircAruvnKk"
duration_sec: 1140
captured_at: "1776300000"
raw: "~/crossmem/raw/youtube/aircAruvnKk.wav"
chunks: 12
source_type: youtube
---
# But what is a neural network?
## Citations
### APA
...
## Chunks
### 00:00–01:32 — Chapter: Introduction
> [Transcript text, first 400 chars...]
**Slide OCR:** [if keyframe present]
**Keyframe:** `keyframes/aircAruvnKk_0042.png` — "A diagram showing..."
**Paraphrase:** ...
**Implication:** ...
**Source:** [00:00](https://youtu.be/aircAruvnKk?t=0)
8. Orchestration
Decision: Same binary, new module youtube.rs
The existing crossmem capture <url> dispatches on URL. Add host detection:
#![allow(unused)]
fn main() {
// main.rs capture dispatch
if url.contains("arxiv.org") {
cite::cmd_capture(url).await
} else if url.contains("youtube.com") || url.contains("youtu.be") {
youtube::cmd_capture(url).await
} else {
// future: generic handler
}
}
Module structure
src/
cite.rs # existing arxiv pipeline (unchanged)
youtube.rs # new: YouTube capture + compile
youtube/
download.rs # yt-dlp wrapper
transcribe.rs # whisper.cpp wrapper
keyframe.rs # ffmpeg scene-cut + chapter extraction
ocr.rs # Apple Vision / tesseract wrapper
vlm.rs # Ollama multimodal (Qwen2.5-VL) wrapper
chunk.rs # Segmentation + chunk assembly
emit.rs # Wiki markdown emission
shared/
ollama.rs # Extract from cite.rs — shared Ollama client
formats.rs # Citation format builders (generalized)
Shared Ollama code: Factor compile_page_chunk and the HTTP client into shared/ollama.rs. Both cite.rs and youtube.rs call it. The prompt template differs (page text vs transcript chunk), but the HTTP plumbing is identical.
Two-stage flow (same as arxiv)
crossmem capture <youtube-url>
→ downloads audio + video + subs + info JSON
→ extracts metadata, generates cite_key
→ saves to ~/crossmem/raw/youtube/{video_id}/
→ prints cite_key for next step
crossmem compile <cite_key>
→ detects source_type (arxiv vs youtube) from meta JSON
→ runs transcription (whisper.cpp)
→ runs keyframe extraction (ffmpeg)
→ runs OCR + VLM caption per keyframe
→ runs Ollama compile per chunk (paraphrase + implication)
→ emits wiki markdown to ~/crossmem/wiki/
9. Dependency Install UX
Decision: Error with one-liner install instructions on first run
Auto-installing is tempting but violates principle of least surprise. Instead:
$ crossmem capture https://youtube.com/watch?v=abc123
ERROR: missing required dependencies for YouTube ingestion:
✗ yt-dlp — brew install yt-dlp
✗ ffmpeg — brew install ffmpeg
✓ whisper.cpp — found at /opt/homebrew/bin/whisper-cpp
Install all missing:
brew install yt-dlp ffmpeg
Then retry: crossmem capture https://youtube.com/watch?v=abc123
Check order: which yt-dlp && which ffmpeg && which whisper-cpp (or whisper depending on install method).
whisper.cpp model download: If binary exists but model is missing:
Model not found. Download large-v3-turbo (~1.6 GB):
curl -L -o ~/.cache/whisper/ggml-large-v3-turbo.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
crossmem-ocr Swift helper: Build from source on first YouTube capture:
$ swift build -c release -package-path ./tools/crossmem-ocr
Or provide pre-built binary in releases.
10. Cost Model (All Local)
Estimated wall-clock for M2 Mac mini (16 GB)
| Stage | 1h video | 30 min video | 3h video |
|---|---|---|---|
| yt-dlp download (audio + video) | ~2 min | ~1 min | ~5 min |
| whisper.cpp transcription | ~6 min | ~3 min | ~18 min |
| ffmpeg keyframe extraction | ~1 min | ~30 sec | ~3 min |
| OCR per keyframe (~80 frames) | ~8 sec | ~4 sec | ~20 sec |
| VLM caption per keyframe | ~5 min | ~2.5 min | ~15 min |
| Ollama compile per chunk (~40 chunks) | ~8 min | ~4 min | ~24 min |
| Total | ~22 min | ~11 min | ~65 min |
Bottlenecks
- Ollama compile — sequential LLM calls, ~12 sec/chunk. Could batch with larger context window.
- VLM caption — sequential, ~4 sec/frame. GPU contention with Ollama if run concurrently.
- Whisper — fast on Metal, but locks GPU for duration.
Memory pressure
| Concurrent | Peak VRAM | Safe on 16 GB? |
|---|---|---|
| Whisper alone | ~1.6 GB | Yes |
| Ollama (7B q4) alone | ~5 GB | Yes |
| Whisper + Ollama | ~6.6 GB | Yes |
| Qwen2.5-VL-7B + Ollama text | ~10 GB | Tight but OK |
| All three simultaneous | ~12 GB | Risky — run sequentially |
Strategy: Run stages sequentially. whisper → keyframes → OCR → VLM → compile. No concurrent GPU workloads.
11. Storage Layout
~/crossmem/
raw/
youtube/
{video_id}/
{video_id}.wav # Audio (whisper input)
{video_id}_video.mp4 # Video (keyframe source)
{video_id}.info.json # yt-dlp metadata
{video_id}.en.vtt # Human subs (if available)
{video_id}.en.auto.vtt # Auto subs (if available)
{video_id}.meta.json # crossmem metadata
transcript.json # Whisper output with timestamps
keyframes/
frame_0001.png # Scene-cut keyframes
frame_0002.png
keyframe_times.json # Timestamp → frame mapping
ocr/
frame_0001.txt # OCR output per frame
captions/
frame_0001.txt # VLM caption per frame
wiki/
{timestamp}_{cite_key}.md # Final compiled wiki note
12. Phased Delivery
P1 — Download + Transcribe (MVP)
- URL detection in
main.rscapture dispatch - yt-dlp download wrapper (
youtube/download.rs) - whisper.cpp transcription wrapper (
youtube/transcribe.rs) - Basic chunk segmentation (chapters or 60s windows)
- Ollama compile pass (reuse from cite.rs)
- Wiki markdown emission (transcript-only, no visual)
- Dependency check + error messages
- Tests for metadata parsing, cite_key generation, chunk segmentation
P2 — Keyframes + OCR
- ffmpeg scene-cut extraction (
youtube/keyframe.rs) - Chapter-aware keyframe selection
- Apple Vision OCR helper (
tools/crossmem-ocr/) - Tesseract fallback
- OCR text merged into chunks
- Tests for keyframe timing, OCR integration
P3 — VLM Captions + Diarization
- Ollama multimodal integration for keyframe captioning (
youtube/vlm.rs) - Keyframe captions merged into chunks
- Optional: pyannote speaker diarization
- Tests for VLM response parsing
P4 — Polish + Chunk Emission
- Human sub → whisper alignment
- Playlist support (batch capture)
crossmem compile --source youtubeflag- Storage cleanup (delete intermediate files after compile)
- Integration tests with real short video
- Performance benchmarks on M2/M4
13. Open Questions
-
Subtitle language detection: Should we auto-detect the video language and pass
--languageto whisper, or always useen? For P1, assume English. -
Video retention: Keep the video file after keyframe extraction, or delete to save disk? A 1h 1080p video is ~1–2 GB. Suggest: keep for 7 days, then auto-prune.
-
Ollama model for compile pass: Reuse
llama3.2:3b(same as arxiv), or use a different model better suited for spoken-word paraphrasing? Suggest: same model, same env var. -
Playlist semantics: One wiki note per video, or one per playlist? Suggest: one per video, with a playlist index note linking them.
-
Live stream handling: yt-dlp can download from start, but duration is unknown until stream ends. Suggest: P1 skips live, add in P2.
Why crossmem bridge does not use Chrome DevTools Protocol
The incident
On a developer workstation, a suspicious process (PID 73079) spawned from a Claude shell snapshot executed the following sequence:
sleep 2400(wait for Chrome to settle)- Connect to
ws://localhost:9222(Chrome DevTools Protocol) Runtime.evaluate→Clerk.session.getToken()to steal the active session tokenPOSTthe stolen token to an external API (teaching.monster)
Root cause: a dev tool had launched Chrome with --remote-debugging-port=9222.
This single flag exposes every open tab, every origin, every cookie on a
localhost WebSocket with zero authentication. Any local process—malicious or
not—can connect and run arbitrary JavaScript in the context of any page the user
has open. CDP is a debugger; it trusts the caller completely.
What crossmem bridge does differently
crossmem bridge is a Manifest V3 Chrome extension that communicates with local
agents over a WebSocket on localhost:7600. The design differs from CDP in
several concrete ways:
- No
--remote-debugging-port. The user’s Chrome launches normally. There is no app-wide debug backdoor to connect to. - User-installed extension with Chrome’s permission UI. The user explicitly grants the extension host permissions. CDP requires no user consent at all; whatever launched Chrome with the flag decides.
- Whitelisted action set. The bridge accepts a fixed set of named actions:
navigate,click,type,extract,screenshot,summarize,tab_info,wait,ping. There is no generic “evaluate arbitrary JS” verb. An attacker who connects to:7600can click buttons and read extracted text, but cannot callClerk.session.getToken()orNetwork.getAllCookies. - Real Chrome profile, no spoofing. The extension runs inside the user’s
actual Chrome profile—no
--user-data-dirto a throwaway directory, no Chrome for Testing with broken Keychain integration.
Threat model comparison
| Attack surface | CDP (:9222) | crossmem bridge (:7600) |
|---|---|---|
| Arbitrary JS on any origin | Runtime.evaluate — yes | No eval verb — no |
| Dump all cookies | Network.getAllCookies — yes | No such action — no |
| Read/modify DOM | Full DOM access | Only via named actions (click, extract) |
| Authentication | None | None (same weakness — see below) |
| User consent | None; whoever launched Chrome decides | Chrome extension install prompt |
The PID 73079 attack required exactly two CDP primitives: Runtime.evaluate and
network access. Neither exists in the crossmem bridge action vocabulary.
What this design does NOT protect against
Honesty matters more than marketing. crossmem bridge has real limitations:
localhost:7600is unauthenticated, same as CDP on:9222. Any local process can connect. The attack surface is smaller (no eval, no cookie dump), but the network posture is identical.chrome.scripting.executeScriptis arbitrary JS under the hood. The bridge currently uses it to implement actions likeextractandclick. If a future action handler passes attacker-controlled input (selectors, payloads) intoexecuteScriptwithout sanitization, the constrained action set becomes a confused deputy.- Supply-chain attack on the extension itself. A malicious MV3 update pushed to the Chrome Web Store bypasses every architectural constraint. The extension IS the trust boundary.
- Planned hardening (not yet implemented):
- Per-request auth token (shared secret between agent and extension)
- Unix domain socket instead of TCP (removes network-reachable surface)
- Strict input validation on action parameters
Takeaway
The lesson from PID 73079 is not “use crossmem bridge instead of CDP.” It is: dev automation tooling should not default to opening an app-wide debug backdoor.
CDP is a debugger protocol. It was designed for DevTools, not for agent orchestration. When you expose it on localhost, you hand every local process— including ones you didn’t launch—full control over every tab in the browser.
crossmem bridge chose a constrained, consent-gated channel: a user-installed extension exposing a fixed action vocabulary over a local WebSocket. This is a design choice that reduces the blast radius of local-process compromise. It is not magic, and it is not complete. But it means PID 73079’s exact attack vector—connect, eval, exfiltrate—does not work.