Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation

arXiv: 2604.21264 · PDF

作者: Minping Chen, Bing Xu, Zulong Chen, Chuanfei Xu, Ying Zhou, Zui Tao, Zeyi Wen

主分类: cs.AI · 全部: cs.AI

命中关键词: large language model, llm, rag, chain-of-thought, mixture of experts, moe

TL;DR

针对在线招聘中 Person-Job Fit 任务，论文用 LLM 做数据增强润色低质量 JD，并引入 category-aware MoE 区分相似候选-岗位对，离线与线上均显著提升。

核心观点

低质量 JD 与相似候选-岗位对是 PJF 的主要瓶颈。
用 LLM + CoT 对 JD 进行改写/润色，可直接提升特征质量。
类别感知的 MoE 能够对相似样本学到更有区分度的表示。
方法在真实招聘平台落地，带来可观的商业收益。

方法

LLM-based data augmentation：用 chain-of-thought 提示 LLM 对低质量 JD 进行润色、补全、改写，得到结构更规范的文本输入。
Category-aware MoE：在 MoE 模块中引入 category embedding，按类别动态调整各 expert 权重，使模型对相似候选-岗位对学到差异化 pattern。
整体框架以 LLM 处理文本 + MoE 建模匹配，服务于 PJF 打分。

实验

数据来源：作者所在招聘平台的真实候选-岗位数据。
评估方式：离线指标 + 线上 A/B 测试。
指标：AUC、GAUC（离线），CTCVR 及外部猎头费用（线上）。
基线：现有 PJF 方法（摘要未点名具体模型）。

结果

离线：AUC 相对提升 2.40%，GAUC 相对提升 7.46%。
线上 A/B：CTCVR 提升 19.4%。
业务层面：节省数百万 CNY 外部猎头费用。
主张与数据一致，但摘要未披露数据规模、MoE 规模、消融等细节。

为什么重要

展示了 LLM 在传统推荐/匹配管线中作为"数据清洗器"而非端到端模型的实用路径，成本可控且易落地。
category-aware MoE 提供了一种处理 hard-negative-like 相似样本的通用思路，可迁移到广告、电商推荐等场景。
给 LLM + 推荐系统落地工程师一个已验证的线上收益案例。

与已有工作的关系

延续 Person-Job Fit 传统工作（基于文本匹配、双塔、交互模型等）。
方法论上结合 Mixture of Experts（MMoE、PLE 等多任务/多领域推荐工作）与 LLM data augmentation（CoT prompting、LLM for data labeling）两条路线。
与使用 LLM 做 query/JD 改写的检索增强工作思路相近。

尚未回答的问题

LLM 改写是否引入幻觉、偏差，对公平性/合规性的影响未讨论。
category 粒度如何定义、对 MoE 效果影响多大，缺少消融。
方法对冷启动岗位或小语种、跨行业迁移的鲁棒性未知。
线上收益是否来自 LLM 改写、MoE 还是二者协同，需要进一步拆解。
推理成本（LLM 调用频率、延迟）与 ROI 细节未披露。

论文图表

图 1: Figure 1 (extracted from PDF)

图 1

原始摘要

Person-Job Fit (PJF) is a critical component for online recruitment. Existing approaches face several challenges, particularly in handling low-quality job descriptions and similar candidate-job pairs, which impair model performance. To address these challenges, this paper proposes a large language model (LLM) based method with two novel techniques: (1) LLM-based data augmentation, which polishes and rewrites low-quality job descriptions by leveraging chain-of-thought (COT) prompts, and (2) category-aware Mixture of Experts (MoE) that assists in identifying similar candidate-job pairs. This MoE module incorporates category embeddings to dynamically assign weights to the experts and learns more distinguishable patterns for similar candidate-job pairs. We perform offline evaluations and online A/B tests on our recruitment platform. Our method relatively surpasses existing methods by 2.40% in AUC and 7.46% in GAUC, and boosts click-through conversion rate (CTCVR) by 19.4% in online tests, saving millions of CNY in external headhunting expenses.