RPG

Research Process Graph

Q M F

RPG

Research Process Graph Browser

by the Mutwil Lab

Explore 150,000+ research elements across 2,600+ Plant Cell papers.

Enter a research question, method, or finding — RPG finds related work and shows how they connect.

More search options

Questions Methods Findings

Drag & drop a PDF here

or click to browse

PDF files accepted

No graphs yet. Upload a PDF to get started.

Loading samples...

The Plant Cell RPG Library

Browse -- pre-computed Research Process Graphs from The Plant Cell

Loading...

Loading The Plant Cell library...

Page 1

QMF Taxonomy Browser

Explore the hierarchical classification of -- research sentences across Questions, Methods, and Findings from The Plant Cell

Loading taxonomy...

Researcher Directory

Explore -- corresponding authors from The Plant Cell — browse expertise, find experts by method, and see the questions they ask and findings they produce.

Loading researchers...

Enter a method — see who uses it and what they find

Search any method, technique, or approach. Our AI understands synonyms and related terms.

Try:

Select an example above or type your own query to get started.

0 papers -- ORCID

0 Q 0 M 0 F

Research Questions (Q)

Methods (M)

Findings (F)

Browse Questions, Methods & Findings

Papers

Digital Professor

Enter a research question and get a data-driven research prospectus — recommended methods, expected findings, follow-up questions, and cross-pollination opportunities — all grounded in 2,600+ Plant Cell papers.

How to read the results: 3.2x more relevant This category appears 3.2× more often in your matched papers than across all 2,600 papers. Higher = more specifically relevant to your question. 42 papers Number of your matched papers that contain this category. 23% of matches Percentage of your matched papers containing this category. orig Found by matching original text (preserves gene names like POM2, CESA1). gen Found by matching generalized concepts (e.g., “cellulose biosynthesis” → “polymer biosynthesis”).

RPG — Research Process Graph

Transform scientific papers into interactive, structured knowledge graphs. Upload a PDF and let AI extract the research questions, methods, and findings as a navigable graph you can explore, validate, and share.

Quick Start

1

Upload a PDF

Go to the Upload tab and drag-and-drop a research paper (or click to browse). The system extracts the text and sends it to an LLM for graph generation.

2

View the Graph

Once processing completes you are taken to an interactive graph. Nodes represent research elements; edges show how they connect.

3

Explore & Analyse

Search nodes, filter by type, chat with an AI about the paper, validate claims against the source text, or generate structured summaries.

Node Types

Every node in the graph is classified as one of three types:

Q

Research Question

The hypotheses or questions the paper sets out to answer.

M

Method

Experimental techniques, datasets, tools, or analytical approaches used.

F

Finding

Results, conclusions, or observations reported in the paper.

Edges connect nodes to show relationships — for example, a Method that addresses a Research Question and produces a Finding.

Features

PDF Upload & Processing

Upload any research paper as a PDF. The text is extracted automatically and an LLM generates the Research Process Graph. Progress is shown in real time.

Interactive Graph Visualisation

Powered by Cytoscape.js. Pan, zoom, click on nodes for details, and switch between layouts:

Hierarchical — top-down DAG (default)
Breadthfirst — level-based tree layout
Force-directed — physics-based spring layout

AI Chat

Open the chat panel while viewing a graph and ask natural-language questions about the paper. The AI has access to the extracted graph structure and (for uploaded papers) the full paper text. Responses stream in real time and support Markdown formatting.

Node Validation

Select one or more nodes and click Validate to check whether each claim is supported by the original paper text. Results are colour-coded:

Supported — fully backed by the paper
Partially supported — some evidence found
Not supported — no matching evidence

Relevant excerpts from the paper are shown alongside each result.

Summarisation

Click Summarize to generate a structured overview of the entire paper. You can also select specific nodes first to get a targeted summary of just those elements. The summary streams in a modal and renders as Markdown.

Export

JSON — download the raw graph data for programmatic use.
PDF — export a publication-ready snapshot of the graph with a colour-coded legend.

Search & Filter

Use the search bar above the graph to find nodes by label. Toggle Q / M / F checkboxes to show or hide specific node types. Use the Select checkboxes to bulk-select all nodes of a type for validation or summarisation.

Sample Graphs

Explore pre-loaded sample graphs from the Samples tab to see how RPG works before uploading your own paper. All features except validation (which requires the original PDF text) are available on samples.

The Plant Cell Library

Browse and search over 2,600 pre-computed RPGs from The Plant Cell journal (2005–2025) in the The Plant Cell Library tab. Filter by year, sort by date or node count, and paginate through the full catalog. Click any paper to open its interactive graph.

Related Papers

When viewing any library paper, click the Related tab on the left side of the graph to open the Related Papers panel. It ranks every other paper in the library by a weighted Jaccard similarity score computed over shared taxonomy categories (see algorithm below). Click any result card to navigate directly to that paper's graph.

Use the four sliders in the panel to tune how much each dimension contributes to the score:

Q — Research Question weight: papers asking the same questions are most meaningfully related (default 2.0)
F — Finding weight: shared findings indicate methodologically convergent research (default 1.5)
M — Method weight: shared methods alone are a weaker signal (default 1.0)
L2 specificity bonus: how much more to reward specific (L2) category matches over broad (L1) ones (default 2.0×)

Changes to the sliders trigger a live re-ranking with a short debounce. Click Reset to restore empirically validated defaults.

Node Search

In the The Cell Library tab, switch to Node Search to search the labels of all ~150,000 Q/M/F nodes across every paper. Type a free-text query (e.g. "WRKY transcription factor stress response") to find similar Research Questions, Methods, or Findings ranked by BM25 relevance. Matching terms are highlighted in each result. Use the All / Questions / Methods / Findings tabs to focus the search. Clicking a result opens the paper's graph.

Taxonomy Browser

Explore the hierarchical Q/M/F taxonomy in the Taxonomy tab. Search across category names and descriptions, drill into L1 → L2 sub-categories, and browse thousands of representative example sentences per category.

RPG Similarity Algorithm

The Related Papers feature ranks every library paper against the current paper using a weighted Jaccard similarity score computed over shared taxonomy categories of Q, M, and F nodes. The pipeline has four stages:

Stage 1 — Distinctive vocabulary construction

The taxonomy contains ~252 L2 sub-categories (≈ 10 per node type × ~8 sub-categories × 3 types) each with a name, description, and hundreds of representative example sentences. For each category, a raw vocabulary is assembled from the union of all tokens in its name, description, and example sentences.

Because plant biology papers share many generic terms (e.g. gene, expression, protein), a raw vocabulary would cause every category to match every paper. A cross-category IDF filter is applied: a token is kept only if it appears in ≤ 25 % of the L1 categories for that node type. This retains the terms that are distinctive to each category while discarding domain-wide noise.

Stage 2 — Node classification (top-K per node)

For each Q, M, or F node label in a paper, asymmetric coverage is computed against every category's distinctive vocabulary:

score(node, cat) = |tokens(node) ∩ vocab(cat)| ÷ |tokens(node)|

Asymmetric coverage is used rather than symmetric Jaccard because category vocabularies are much larger than short node labels; symmetric Jaccard would penalise exact nodes unfairly. The top-2 L1 and top-2 L2 categories with score > 0 are assigned to each node. Each paper's final category sets are the union of its per-node assignments, capturing all research themes covered by the paper.

Stage 3 — Weighted Jaccard similarity

Given two papers A and B, each with six category sets (Q-L1, Q-L2, F-L1, F-L2, M-L1, M-L2), the similarity score is:

sim(A, B) = Σ_dim [w_dim × Jaccard(A_dim, B_dim)] ÷ Σ w_dim

where Jaccard(X, Y) = |X ∩ Y| / |X ∪ Y|.

Default weights (user-adjustable via sliders):

Dimension	Default weight	Rationale
Q-L2 (specific research question)	4.0 ×	Most informative match
Q-L1 (broad research question)	2.0 ×	High importance
F-L2 (specific finding)	3.0 ×	Converging conclusions
F-L1 (broad finding)	1.5 ×	Medium importance
M-L2 (specific method)	2.0 ×	Methodological overlap
M-L1 (broad method)	1.0 ×	Weakest signal

L2 weights equal the corresponding L1 weight multiplied by the L2 specificity bonus slider (default 2×). All six weights are normalised to sum to 1 before computing the final score.

Stage 4 — Ranking & caching

All 2,600+ papers are scored against the query paper (≈ 10–30 ms per request). Papers with a zero score (no shared category at all) are excluded. Results are paginated (8 per page) and the score distribution is displayed as a percentage in each result card (green ≥ 40 %, blue ≥ 20 %, grey < 20 %). When using the default weights, results are cached in memory (LRU, 200-entry cap) for fast repeat access. The category index is built once at server startup and cached to disk (_similarity_index.json), so subsequent restarts are nearly instant.

BM25 Node Search Algorithm

The Node Search feature ranks individual Q, M, and F node labels from all The Plant Cell papers in response to a free-text query. It uses BM25 (Best Match 25), a classical information-retrieval algorithm that is highly effective for scientific text because authors use precise, domain-specific terminology where lexical overlap is a reliable proxy for semantic similarity — without requiring a large language model or GPU.

Corpus & index

Approximately 150,000 node labels (Q, M, F) are indexed as individual documents. The index is built once at server startup and cached to disk (_node_search_index.json) so subsequent restarts load in under 2 seconds. Each node stores its paper provenance (paper ID, title, year) alongside the label and its token list.

Tokenisation

Node labels are lowercased, stripped of punctuation, and filtered with an 83-word stopword list (generic English words plus common scientific verbs such as demonstrate, identify, analyze). Unlike the similarity classifier, BM25 keeps the token list (not a set) so that term frequency is preserved.

BM25 scoring

For a query Q and document D:

IDF(t) = log ((N − df(t) + 0.5) / (df(t) + 0.5) + 1)

score(D, Q) = Σ_t∈Q IDF(t) × tf(t,D) × (k₁+1) ÷ [tf(t,D) + k₁ × (1 − b + b × |D| / avgdl)]

IDF down-weights terms common across many node labels; tf-normalisation rewards relevant term repetition while penalising very long labels. Parameters k₁ = 1.5 and b = 0.75 are standard BM25 defaults that work well across diverse corpora.

Score normalisation & result modes

Within each result set the top-scoring node is assigned 100 % and all others are scaled proportionally, making scores interpretable across different queries. Three filled dots (•••) indicate > 66 %, two dots > 33 %, and one dot any positive score.

In All mode the top 5 results per type are retrieved in a single request and scored independently (so Q top = 100 %, M top = 100 %, etc.). Switching to a specific type activates paginated mode (10 results per page). Typical query latency is under 15 ms for 150,000 nodes.

Tips & Shortcuts

Action	How
Select a single node	Click on it
Multi-select nodes	Hold `Shift` and click additional nodes
Select all nodes of a type	Use the Select: All Q / M / F checkboxes
Clear selection	Click Clear selection button
Zoom	Scroll wheel or pinch gesture
Pan	Click and drag on the background
Reset view	Click the Fit button
Change layout	Use the layout dropdown (Hierarchical / Breadthfirst / Force-directed)
Open AI chat	Click the Chat tab on the right side of the graph
Send a chat message	Type and press `Enter` or click Send

FAQ

What file formats are supported?

Currently only PDF files are accepted. Text is extracted in your browser using PDF.js before being sent to the server for processing.

What LLM powers the extraction?

Graph extraction and chat are powered by OpenAI models. The extraction pipeline uses a more capable model for accuracy, while chat uses a faster model for responsiveness.

Can I edit the graph after extraction?

Manual graph editing is not currently supported. You can re-upload the paper to regenerate the graph.

Why is validation unavailable for sample graphs?

Validation compares graph nodes against the original paper text. Since sample graphs are pre-loaded without the source PDF, validation cannot be performed.

Is my data stored permanently?

Uploaded PDFs and generated graphs are stored on the server for the duration of the session. They are not shared with other users.

Graph Title

AI-generated responses. Verify important claims against the source paper.

Q - Research Question M - Method F - Finding Shift+click to multi-select nodes