Lightweight Retrieval-Augmented Generation and Large Language Model-Based Modeling for Scalable Patient-Trial Matching

arXiv: 2604.22061 · PDF

作者: Xiaodi Li, Yang Xiao, Munhwan Lee, Konstantinos Leventakos, Young J. Juhn, David Jones, Terence T. Sio, Wei Liu, Maria Vassilaki, Nansu Zong

主分类: cs.CL · 全部: cs.AI, cs.CL, cs.LG

命中关键词: large language model, llm, retrieval, reasoning, serving, fine-tun

TL;DR

提出轻量化 RAG + LLM 框架用于患者-临床试验匹配：先用检索筛选 EHR 关键片段，再用 LLM 编码并接轻量分类器，以远低于端到端 LLM 的算力达到同等效果。

核心观点

患者-试验匹配面临 EHR 长文本与复杂入组标准的可扩展性难题。
将流程显式拆成"检索相关片段 + LLM 编码 + 轻量预测器"三段。
冻结 LLM 足以编码结构化临床数据；非结构化叙述则需 fine-tune。
轻量 pipeline 在显著降低算力下，性能接近端到端 LLM。

方法

RAG 模块：从长 EHR 中检索与入组标准临床相关的片段，压缩输入长度。
LLM 编码：将选出的片段映射为信息表示，支持 frozen 或 fine-tuned 两种模式。
表示精炼：通过降维得到紧凑向量。
下游分类：使用轻量预测器（非端到端 LLM）完成匹配判定。

实验

公共基准：n2c2、SIGIR、TREC 2021/2022。
真实多模态数据：Mayo Clinic MCPMD。
对比：端到端 LLM 方法与传统 ML 方法。
指标：摘要未给出具体指标名，应为匹配分类性能 + 计算开销。

结果

检索筛选显著降低计算负担，同时保留临床有意义信号。
结构化数据：frozen LLM 表示已足够强。
非结构化叙述：fine-tuning 必不可少。
轻量管线性能与端到端 LLM 可比，算力成本大幅下降。具体数字摘要未披露。

为什么重要

为医疗 LLM 落地提供一个实用模板：在 EHR 这种超长异构文本上，不必把全部内容塞进 LLM，而是 RAG 截取 + 表示学习 + 浅层模型即可。对 agent / LLM 基础设施从业者，这印证了"检索压缩 + 冻结 encoder + 轻量头"是处理长上下文领域任务的经济方案。

与已有工作的关系

延续 RAG 思路（Lewis 等）用于压缩长上下文。
对比端到端临床 LLM（如 TrialGPT 一类患者-试验匹配工作）。
与传统临床 NLP / 结构化 EHR ML 方法形成对照。
与 frozen LLM-as-encoder 范式（类似 linear probe、embedding + classifier）一脉相承。

尚未回答的问题

具体准确率、召回、算力节省倍数未在摘要披露。
检索器本身如何选择、是否领域微调未说明。
在罕见病或入组标准极严的试验上的泛化性。
多模态（影像、基因组）如何融入该轻量管线。
与最新长上下文 LLM（百万 token）相比的性价比边界。

论文图表

图 1: Figure 1 (extracted from PDF)

图 1

图 2: Figure 2 (extracted from PDF)

图 2

原始摘要

Patient-trial matching requires reasoning over long, heterogeneous electronic health records (EHRs) and complex eligibility criteria, posing significant challenges for scalability, generalization, and computational efficiency. Existing approaches either rely on full-document processing with large language models (LLMs), which is computationally expensive, or use traditional machine learning methods that struggle to capture unstructured clinical narratives. In this work, we propose a lightweight framework that combines retrieval-augmented generation and large language model-based modeling for scalable patient-trial matching. The framework explicitly separates two key components: retrieval-augmented generation is used to identify clinically relevant segments from long EHRs, reducing input complexity, while large language models are used to encode these selected segments into informative representations. These representations are further refined through dimensionality reduction and modeled using lightweight predictors, enabling efficient and scalable downstream classification. We evaluate the proposed approach on multiple public benchmarks (n2c2, SIGIR, TREC 2021/2022) and a real-world multimodal dataset from Mayo Clinic (MCPMD). Results show that retrieval-based information selection significantly reduces computational burden while preserving clinically meaningful signals. We further demonstrate that frozen LLMs provide strong representations for structured clinical data, whereas fine-tuning is essential for modeling unstructured clinical narratives. Importantly, the proposed lightweight pipeline achieves performance comparable to end-to-end LLM approaches with substantially lower computational cost.