Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

arXiv: 2604.21536 · PDF

作者: Nikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko, Ilya Makarov

主分类: cs.IR · 全部: cs.AI, cs.IR

命中关键词: large language model, llm, reasoning, inference, serving, fine-tun

TL;DR

提出一种用预训练 LLM 生成的文本用户画像向序列推荐器做知识蒸馏的方法，推理时无需 LLM，兼顾语义理解与效率。

核心观点

传统序列推荐器擅长时序行为建模，但缺乏丰富的用户语义。
直接把 LLM 接入在线推荐推理成本过高，难以落地。
通过 LLM 离线生成文本用户画像，再蒸馏进序列模型，可在不改架构、不微调 LLM 的前提下获得语义增益。

方法

摘要描述较粗：利用预训练 LLM 为每个用户生成文本 profile，将其作为教师信号蒸馏到标准序列推荐器中。服务期仅跑原生序列模型，无 LLM 调用；不需要修改推荐器架构，也不需要对 LLM 进行 fine-tuning。具体蒸馏损失、对齐方式、画像生成 prompt 等摘要未披露。

实验

摘要未提供数据集、基线、指标等实验细节。

结果

摘要未给出具体数字或对比结果，仅声明方法保持了传统序列模型的推理效率，同时引入 LLM 语义。可信度需正文佐证。

为什么重要

对推荐与 AI 基础设施团队而言，这条路线以"LLM 离线教师 + 轻量在线学生"的范式绕开了 LLM 在线推理的延迟/成本瓶颈，是把 LLM 能力工程化下沉到高 QPS 系统的一种常见但实用的模式，便于在现有推荐栈上增量上线。

与已有工作的关系

延续 SASRec、BERT4Rec 等序列推荐脉络；属于 LLM-for-RecSys 方向中的 “LLM-as-teacher / profile augmentation” 子类，与 LLMRec、RecFormer、KAR、ONCE 等利用 LLM 语义增强推荐的工作相近；区别是强调服务期零 LLM 调用与免架构改动。

尚未回答的问题

蒸馏的具体目标函数与画像表示形式？
在哪些数据集/指标上、相对哪些基线提升多少？
用户兴趣漂移时画像如何更新、冷启动用户如何处理？
不同规模教师 LLM 的收益-成本曲线？
生成画像带来的隐私与偏见风险如何缓解？

论文图表

图 1: Figure 1 (extracted from PDF)

图 1

图 2: Figure 2 (extracted from PDF)

图 2

原始摘要

Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in capturing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches create prohibitive inference costs in real time. To address these limitations, we present a novel knowledge distillation method that utilizes textual user profile generated by pre-trained LLMs into sequential recommenders without requiring LLM inference at serving time. The resulting approach maintains the inference efficiency of traditional sequential models while requiring neither architectural modifications nor LLM fine-tuning.