Evidence-first agentic paper workspace

Paperflow

Read papers, verify claims, ask an Agent with context, and save durable research knowledge. Every generated claim is labeled R0/R1/R2 and traced back to PDF evidence whenever possible.

What is Paperflow

Report first, chat second, evidence always.

Import to Report

Bring in a local PDF or arXiv URL and generate a structured Agent Reading Report.

Claim to Evidence

Click a reliability-labeled claim to jump back to the PDF page and highlight the source evidence.

Ask and Keep

Ask the Agent with report and evidence context, then export durable notes to Obsidian.

News

Latest updates

Paperflow now presents a public-facing evidence-first agentic paper workspace with PDF evidence highlighting, responsive PDF search, Agent chat grounding, local-first research memory, and Obsidian export. Future small feature releases will use the v0.1.x format.

Quickstart

Run Paperflow locally

Requirements: Python 3.9+, Node.js 18+, and a DeepSeek API key for real Agent parsing.

git clone https://github.com/shiml20/PaperFlow.git
cd PaperFlow

export DEEPSEEK_API_KEY="your-deepseek-api-key"
cd paperflow
./run-dev.sh --install

Then open http://127.0.0.1:5173, import a PDF or arXiv URL, and open the Workspace.

How to use

From paper to knowledge base

  1. Import a local PDF or paste an arXiv URL.
  2. Watch the Agent move from PDF parsing to dynamic partial reports.
  3. Read first key findings while the full report continues to fill in.
  4. Open the completed Reading Report and inspect R0 / R1 / R2 claims.
  5. Click a claim or evidence item to inspect source text and PDF location.
  6. Ask the Agent a focused question grounded in the current paper.
  7. Save or update the Obsidian note.

Core features

A research workspace, not a generic summarizer.

Dynamic Reading Reports
  • Chunked full-paper reading with briefing and coordinator synthesis.
  • Dynamic partial reports, coverage-aware generation, and live parsing metrics.
  • Transparent process output for PDF extraction, DeepSeek stages, persistence, and failure states.
Evidence-first Agentic Workspace
  • R0 / R1 / R2 reliability badges in UI and data model.
  • Evidence quote, page, section, bbox, and location status for claims.
  • PDF.js reader with continuous scroll, zoom, page jump, bbox highlight, and select-to-ask.
Agent Conversation
  • Right-rail Agent panel with transcript, process cards, status, and composer.
  • Chat transcripts persisted in SQLite and restored per paper.
  • SSE step/final events for streaming paper-scoped chat.
Literature Context And Field Maps
  • Metadata import, content deduplication, and six-lane R1 search.
  • Field Maps with milestones, timelines, task taxonomy, datasets, methods, and opportunities.
  • Agent-enriched lineage graph edges with rationale and confidence.

Reliability model

Every claim has a source contract.

Level Meaning Typical use
R0 Strictly grounded in the current paper. Claims with direct evidence quotes and PDF locations.
R1 Grounded in another paper or external source fetched through search. Related-work context, benchmark origins, citation-backed comparisons.
R2 Inference, trend judgement, or research opinion. Opportunities, synthesis, and uncertain claims shown with explicit caution.

DeepSeek setup

Agent configuration

Variable Default Purpose
DEEPSEEK_API_KEY none DeepSeek API key used by the backend PaperAgent.
DEEPSEEK_BASE_URL https://api.deepseek.com/beta DeepSeek-compatible chat completions endpoint root.
DEEPSEEK_MODEL deepseek-v4-flash Model used for Reading Report generation.
DEEPSEEK_REPORT_READ_TIMEOUT 180 Read timeout in seconds for report generation.
PAPERFLOW_DATA_DIR ./data Optional override for the single local data root.

Architecture

Local-first, agent-backed.

Frontend

React + Vite web app for library, report, PDF workspace, Agent chat, Field Map, and runtime configuration.

Backend

FastAPI service with PDF parsing, DeepSeek PaperAgent, R1 search, Field Map, chat, task queue, and Obsidian export.

Storage

Project-level data/ directory containing SQLite metadata, local PDFs, JSON reports, parsed chunks, R1 cache, task snapshots, and Markdown notes.

Acknowledgements

Credits and inspirations

  • Agent integration is built against the DeepSeek API and reuses configuration written by the DeepSeek-TUI CLI when present.
  • PDF parsing is powered by PyMuPDF.
  • The frontend is built with Vite and React.
  • The prompt design was inspired by Peng Sida's open research-learning notes, pengsida/learning_research.

Status

Pre-1.0 milestones

Release history and milestone details now live in STATUS.md.