RPG
Research Process Graph Browser
Explore 150,000+ research elements across 2,600+ Plant Cell papers.
Enter a research question, method, or finding — RPG finds related work and shows how they connect.
Drag & drop a PDF here
or click to browse
PDF files accepted
No graphs yet. Upload a PDF to get started.
Loading samples...
The Plant Cell RPG Library
Browse -- pre-computed Research Process Graphs from The Plant Cell
Loading The Plant Cell library...
QMF Taxonomy Browser
Explore the hierarchical classification of -- research sentences across Questions, Methods, and Findings from The Plant Cell
Loading taxonomy...
Researcher Directory
Explore -- corresponding authors from The Plant Cell — browse expertise, find experts by method, and see the questions they ask and findings they produce.
Loading researchers...
Enter a method — see who uses it and what they find
Search any method, technique, or approach. Our AI understands synonyms and related terms.
Select an example above or type your own query to get started.
Research Questions (Q)
Methods (M)
Findings (F)
Browse Questions, Methods & Findings
Papers
Digital Professor
Enter a research question and get a data-driven research prospectus — recommended methods, expected findings, follow-up questions, and cross-pollination opportunities — all grounded in 2,600+ Plant Cell papers.
RPG — Research Process Graph
Transform scientific papers into interactive, structured knowledge graphs. Upload a PDF and let AI extract the research questions, methods, and findings as a navigable graph you can explore, validate, and share.
Quick Start
Go to the Upload tab and drag-and-drop a research paper (or click to browse). The system extracts the text and sends it to an LLM for graph generation.
Once processing completes you are taken to an interactive graph. Nodes represent research elements; edges show how they connect.
Search nodes, filter by type, chat with an AI about the paper, validate claims against the source text, or generate structured summaries.
Node Types
Every node in the graph is classified as one of three types:
The hypotheses or questions the paper sets out to answer.
Experimental techniques, datasets, tools, or analytical approaches used.
Results, conclusions, or observations reported in the paper.
Edges connect nodes to show relationships — for example, a Method that addresses a Research Question and produces a Finding.
Features
Upload any research paper as a PDF. The text is extracted automatically and an LLM generates the Research Process Graph. Progress is shown in real time.
Powered by Cytoscape.js. Pan, zoom, click on nodes for details, and switch between layouts:
- Hierarchical — top-down DAG (default)
- Breadthfirst — level-based tree layout
- Force-directed — physics-based spring layout
Open the chat panel while viewing a graph and ask natural-language questions about the paper. The AI has access to the extracted graph structure and (for uploaded papers) the full paper text. Responses stream in real time and support Markdown formatting.
Select one or more nodes and click Validate to check whether each claim is supported by the original paper text. Results are colour-coded:
- Supported — fully backed by the paper
- Partially supported — some evidence found
- Not supported — no matching evidence
Relevant excerpts from the paper are shown alongside each result.
Click Summarize to generate a structured overview of the entire paper. You can also select specific nodes first to get a targeted summary of just those elements. The summary streams in a modal and renders as Markdown.
JSON — download the raw graph data for programmatic use.
PDF — export a publication-ready snapshot of the graph with a colour-coded legend.
Use the search bar above the graph to find nodes by label. Toggle Q / M / F checkboxes to show or hide specific node types. Use the Select checkboxes to bulk-select all nodes of a type for validation or summarisation.
Explore pre-loaded sample graphs from the Samples tab to see how RPG works before uploading your own paper. All features except validation (which requires the original PDF text) are available on samples.
Browse and search over 2,600 pre-computed RPGs from The Plant Cell journal (2005–2025) in the The Plant Cell Library tab. Filter by year, sort by date or node count, and paginate through the full catalog. Click any paper to open its interactive graph.
When viewing any library paper, click the Related tab on the left side of the graph to open the Related Papers panel. It ranks every other paper in the library by a weighted Jaccard similarity score computed over shared taxonomy categories (see algorithm below). Click any result card to navigate directly to that paper's graph.
Use the four sliders in the panel to tune how much each dimension contributes to the score:
- Q — Research Question weight: papers asking the same questions are most meaningfully related (default 2.0)
- F — Finding weight: shared findings indicate methodologically convergent research (default 1.5)
- M — Method weight: shared methods alone are a weaker signal (default 1.0)
- L2 specificity bonus: how much more to reward specific (L2) category matches over broad (L1) ones (default 2.0×)
Changes to the sliders trigger a live re-ranking with a short debounce. Click Reset to restore empirically validated defaults.
In the The Cell Library tab, switch to Node Search to search the labels of all ~150,000 Q/M/F nodes across every paper. Type a free-text query (e.g. "WRKY transcription factor stress response") to find similar Research Questions, Methods, or Findings ranked by BM25 relevance. Matching terms are highlighted in each result. Use the All / Questions / Methods / Findings tabs to focus the search. Clicking a result opens the paper's graph.
Explore the hierarchical Q/M/F taxonomy in the Taxonomy tab. Search across category names and descriptions, drill into L1 → L2 sub-categories, and browse thousands of representative example sentences per category.
RPG Similarity Algorithm
The Related Papers feature ranks every library paper against the current paper using a weighted Jaccard similarity score computed over shared taxonomy categories of Q, M, and F nodes. The pipeline has four stages:
Stage 1 — Distinctive vocabulary construction
The taxonomy contains ~252 L2 sub-categories (≈ 10 per node type × ~8 sub-categories × 3 types) each with a name, description, and hundreds of representative example sentences. For each category, a raw vocabulary is assembled from the union of all tokens in its name, description, and example sentences.
Because plant biology papers share many generic terms (e.g. gene, expression, protein), a raw vocabulary would cause every category to match every paper. A cross-category IDF filter is applied: a token is kept only if it appears in ≤ 25 % of the L1 categories for that node type. This retains the terms that are distinctive to each category while discarding domain-wide noise.
Stage 2 — Node classification (top-K per node)
For each Q, M, or F node label in a paper, asymmetric coverage is computed against every category's distinctive vocabulary:
Asymmetric coverage is used rather than symmetric Jaccard because category vocabularies are much larger than short node labels; symmetric Jaccard would penalise exact nodes unfairly. The top-2 L1 and top-2 L2 categories with score > 0 are assigned to each node. Each paper's final category sets are the union of its per-node assignments, capturing all research themes covered by the paper.
Stage 3 — Weighted Jaccard similarity
Given two papers A and B, each with six category sets (Q-L1, Q-L2, F-L1, F-L2, M-L1, M-L2), the similarity score is:
where Jaccard(X, Y) = |X ∩ Y| / |X ∪ Y|.
Default weights (user-adjustable via sliders):
| Dimension | Default weight | Rationale |
|---|---|---|
| Q-L2 (specific research question) | 4.0 × | Most informative match |
| Q-L1 (broad research question) | 2.0 × | High importance |
| F-L2 (specific finding) | 3.0 × | Converging conclusions |
| F-L1 (broad finding) | 1.5 × | Medium importance |
| M-L2 (specific method) | 2.0 × | Methodological overlap |
| M-L1 (broad method) | 1.0 × | Weakest signal |
L2 weights equal the corresponding L1 weight multiplied by the L2 specificity bonus slider (default 2×). All six weights are normalised to sum to 1 before computing the final score.
Stage 4 — Ranking & caching
All 2,600+ papers are scored against the query paper (≈ 10–30 ms per request).
Papers with a zero score (no shared category at all) are excluded.
Results are paginated (8 per page) and the score distribution is displayed
as a percentage in each result card (green ≥ 40 %, blue ≥ 20 %, grey < 20 %).
When using the default weights, results are cached in memory (LRU, 200-entry cap)
for fast repeat access.
The category index is built once at server startup and cached to disk
(_similarity_index.json), so subsequent restarts are nearly instant.
BM25 Node Search Algorithm
The Node Search feature ranks individual Q, M, and F node labels from all The Plant Cell papers in response to a free-text query. It uses BM25 (Best Match 25), a classical information-retrieval algorithm that is highly effective for scientific text because authors use precise, domain-specific terminology where lexical overlap is a reliable proxy for semantic similarity — without requiring a large language model or GPU.
Corpus & index
Approximately 150,000 node labels (Q, M, F) are indexed as
individual documents. The index is built once at server startup and cached to disk
(_node_search_index.json) so subsequent restarts load in under 2 seconds.
Each node stores its paper provenance (paper ID, title, year) alongside the label and
its token list.
Tokenisation
Node labels are lowercased, stripped of punctuation, and filtered with an 83-word stopword list (generic English words plus common scientific verbs such as demonstrate, identify, analyze). Unlike the similarity classifier, BM25 keeps the token list (not a set) so that term frequency is preserved.
BM25 scoring
For a query Q and document D:
IDF down-weights terms common across many node labels; tf-normalisation rewards relevant term repetition while penalising very long labels. Parameters k1 = 1.5 and b = 0.75 are standard BM25 defaults that work well across diverse corpora.
Score normalisation & result modes
Within each result set the top-scoring node is assigned 100 % and all others are scaled proportionally, making scores interpretable across different queries. Three filled dots (•••) indicate > 66 %, two dots > 33 %, and one dot any positive score.
In All mode the top 5 results per type are retrieved in a single request and scored independently (so Q top = 100 %, M top = 100 %, etc.). Switching to a specific type activates paginated mode (10 results per page). Typical query latency is under 15 ms for 150,000 nodes.
Tips & Shortcuts
| Action | How |
|---|---|
| Select a single node | Click on it |
| Multi-select nodes | Hold Shift and click additional nodes |
| Select all nodes of a type | Use the Select: All Q / M / F checkboxes |
| Clear selection | Click Clear selection button |
| Zoom | Scroll wheel or pinch gesture |
| Pan | Click and drag on the background |
| Reset view | Click the Fit button |
| Change layout | Use the layout dropdown (Hierarchical / Breadthfirst / Force-directed) |
| Open AI chat | Click the Chat tab on the right side of the graph |
| Send a chat message | Type and press Enter or click Send |
FAQ
What file formats are supported?
Currently only PDF files are accepted. Text is extracted in your browser using PDF.js before being sent to the server for processing.
What LLM powers the extraction?
Graph extraction and chat are powered by OpenAI models. The extraction pipeline uses a more capable model for accuracy, while chat uses a faster model for responsiveness.
Can I edit the graph after extraction?
Manual graph editing is not currently supported. You can re-upload the paper to regenerate the graph.
Why is validation unavailable for sample graphs?
Validation compares graph nodes against the original paper text. Since sample graphs are pre-loaded without the source PDF, validation cannot be performed.
Is my data stored permanently?
Uploaded PDFs and generated graphs are stored on the server for the duration of the session. They are not shared with other users.