Bring in a local PDF or arXiv URL and generate a structured Agent Reading Report.
Evidence-first agentic paper workspace
Paperflow
Read papers, verify claims, ask an Agent with context, and save durable research knowledge. Every generated claim is labeled R0/R1/R2 and traced back to PDF evidence whenever possible.
What is Paperflow
Report first, chat second, evidence always.
Click a reliability-labeled claim to jump back to the PDF page and highlight the source evidence.
Ask the Agent with report and evidence context, then export durable notes to Obsidian.
News
Latest updates
Paperflow now presents a public-facing evidence-first agentic paper workspace with PDF evidence highlighting, responsive PDF search, Agent chat grounding, local-first research memory, and Obsidian export. Future small feature releases will use the v0.1.x format.
Quickstart
Run Paperflow locally
Requirements: Python 3.9+, Node.js 18+, and a DeepSeek API key for real Agent parsing.
git clone https://github.com/shiml20/PaperFlow.git
cd PaperFlow
export DEEPSEEK_API_KEY="your-deepseek-api-key"
cd paperflow
./run-dev.sh --install
Then open http://127.0.0.1:5173, import a PDF or arXiv URL,
and open the Workspace.
How to use
From paper to knowledge base
- Import a local PDF or paste an arXiv URL.
- Watch the Agent move from PDF parsing to dynamic partial reports.
- Read first key findings while the full report continues to fill in.
- Open the completed Reading Report and inspect R0 / R1 / R2 claims.
- Click a claim or evidence item to inspect source text and PDF location.
- Ask the Agent a focused question grounded in the current paper.
- Save or update the Obsidian note.
Core features
A research workspace, not a generic summarizer.
- Chunked full-paper reading with briefing and coordinator synthesis.
- Dynamic partial reports, coverage-aware generation, and live parsing metrics.
- Transparent process output for PDF extraction, DeepSeek stages, persistence, and failure states.
- R0 / R1 / R2 reliability badges in UI and data model.
- Evidence quote, page, section, bbox, and location status for claims.
- PDF.js reader with continuous scroll, zoom, page jump, bbox highlight, and select-to-ask.
- Right-rail Agent panel with transcript, process cards, status, and composer.
- Chat transcripts persisted in SQLite and restored per paper.
- SSE step/final events for streaming paper-scoped chat.
- Metadata import, content deduplication, and six-lane R1 search.
- Field Maps with milestones, timelines, task taxonomy, datasets, methods, and opportunities.
- Agent-enriched lineage graph edges with rationale and confidence.
Reliability model
Every claim has a source contract.
| Level | Meaning | Typical use |
|---|---|---|
| R0 | Strictly grounded in the current paper. | Claims with direct evidence quotes and PDF locations. |
| R1 | Grounded in another paper or external source fetched through search. | Related-work context, benchmark origins, citation-backed comparisons. |
| R2 | Inference, trend judgement, or research opinion. | Opportunities, synthesis, and uncertain claims shown with explicit caution. |
DeepSeek setup
Agent configuration
| Variable | Default | Purpose |
|---|---|---|
DEEPSEEK_API_KEY |
none | DeepSeek API key used by the backend PaperAgent. |
DEEPSEEK_BASE_URL |
https://api.deepseek.com/beta |
DeepSeek-compatible chat completions endpoint root. |
DEEPSEEK_MODEL |
deepseek-v4-flash |
Model used for Reading Report generation. |
DEEPSEEK_REPORT_READ_TIMEOUT |
180 |
Read timeout in seconds for report generation. |
PAPERFLOW_DATA_DIR |
./data |
Optional override for the single local data root. |
Architecture
Local-first, agent-backed.
React + Vite web app for library, report, PDF workspace, Agent chat, Field Map, and runtime configuration.
FastAPI service with PDF parsing, DeepSeek PaperAgent, R1 search, Field Map, chat, task queue, and Obsidian export.
Project-level data/ directory containing SQLite metadata, local PDFs, JSON reports, parsed chunks, R1 cache, task snapshots, and Markdown notes.
Acknowledgements
Credits and inspirations
- Agent integration is built against the DeepSeek API and reuses configuration written by the DeepSeek-TUI CLI when present.
- PDF parsing is powered by PyMuPDF.
- The frontend is built with Vite and React.
- The prompt design was inspired by Peng Sida's open research-learning notes, pengsida/learning_research.
Status
Pre-1.0 milestones
Release history and milestone details now live in STATUS.md.