arXiv: 2604.22061 · PDF

Authors: Xiaodi Li, Yang Xiao, Munhwan Lee, Konstantinos Leventakos, Young J. Juhn, David Jones, Terence T. Sio, Wei Liu, Maria Vassilaki, Nansu Zong

Primary category: cs.CL · all: cs.AI, cs.CL, cs.LG

Matched keywords: large language model, llm, retrieval, reasoning, serving, fine-tun


TL;DR

该论文提出一种轻量级框架,结合 RAG 与 LLM 表征建模,用于可扩展的患者-临床试验匹配,在多个公开和真实临床数据集上以显著更低的计算代价达到与端到端 LLM 相当的性能。

Key Ideas

  • 将 RAG 与 LLM 表征解耦:RAG 负责从长 EHR 中选相关片段,LLM 负责编码。
  • 引入降维与轻量分类器,实现下游高效分类。
  • 冻结 LLM 对结构化数据已足够,非结构化临床叙述则必须微调。
  • 在公开基准与 Mayo Clinic 真实多模态数据集上验证可扩展性。

Approach

Pipeline 分两阶段:(1) RAG 从长 EHR 中检索与试验入组标准相关的临床片段,降低输入长度;(2) LLM 将这些片段编码为表征,再经降维后输入轻量预测器(如线性或浅层模型)完成匹配分类。对结构化字段用冻结 LLM,对自由文本叙述部分做微调。

Experiments

  • 公开基准:n2c2、SIGIR、TREC 2021/2022。
  • 真实数据:Mayo Clinic 多模态数据集 MCPMD。
  • 对比对象:端到端 LLM 方法与传统 ML 方法。
  • 指标未在摘要中给出,推测为匹配分类的标准指标(F1/accuracy 等)。

Results

摘要只给出定性结论:检索式选择显著降低计算量且保留临床信号;轻量 pipeline 在保持可比性能下计算成本大幅降低。具体数值未披露,无法独立核验。

Why It Matters

为医院场景提供可部署的患者-试验匹配方案:避免长上下文 LLM 的高成本,同时利用 LLM 的表征能力,对临床 AI infra 与试验招募自动化有直接价值。

Connections to Prior Work

  • 患者-试验匹配:TrialGPT、Criteria2Query 等。
  • RAG 在医疗领域应用:Clinical-RAG、MedRAG。
  • 冻结 LLM 表征 + 轻量分类器:probing、linear eval 传统范式。
  • 长 EHR 建模:Clinical-Longformer、GatorTron。

Open Questions

  • 未报告量化指标,难判断 “comparable” 的真实差距。
  • 检索器选择与召回率对下游影响未讨论。
  • 微调策略细节、参数规模、推理延迟未明确。
  • 在非英语或跨机构 EHR 上的泛化性未验证。
  • 对稀有试验与长尾入组标准的鲁棒性存疑。

Figures

Figure 1: Figure 1 (extracted from PDF)

Figure 1

Figure 2: Figure 2 (extracted from PDF)

Figure 2


Original abstract

Patient-trial matching requires reasoning over long, heterogeneous electronic health records (EHRs) and complex eligibility criteria, posing significant challenges for scalability, generalization, and computational efficiency. Existing approaches either rely on full-document processing with large language models (LLMs), which is computationally expensive, or use traditional machine learning methods that struggle to capture unstructured clinical narratives. In this work, we propose a lightweight framework that combines retrieval-augmented generation and large language model-based modeling for scalable patient-trial matching. The framework explicitly separates two key components: retrieval-augmented generation is used to identify clinically relevant segments from long EHRs, reducing input complexity, while large language models are used to encode these selected segments into informative representations. These representations are further refined through dimensionality reduction and modeled using lightweight predictors, enabling efficient and scalable downstream classification. We evaluate the proposed approach on multiple public benchmarks (n2c2, SIGIR, TREC 2021/2022) and a real-world multimodal dataset from Mayo Clinic (MCPMD). Results show that retrieval-based information selection significantly reduces computational burden while preserving clinically meaningful signals. We further demonstrate that frozen LLMs provide strong representations for structured clinical data, whereas fine-tuning is essential for modeling unstructured clinical narratives. Importantly, the proposed lightweight pipeline achieves performance comparable to end-to-end LLM approaches with substantially lower computational cost.