2026-04-23 on JXIN's Home

Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation

Mon, 27 Apr 2026 05:10:58 +0000

Authors: Minping Chen, Bing Xu, Zulong Chen, Chuanfei Xu, Ying Zhou, Zui Tao, Zeyi Wen

Primary category: cs.AI · all: cs.AI

Matched keywords: large language model, llm, rag, chain-of-thought, mixture of experts, moe

TL;DR

The paper proposes an LLM-enhanced Person-Job Fit (PJF) system combining chain-of-thought data augmentation for low-quality job descriptions with a category-aware Mixture of Experts module to better distinguish similar candidate-job pairs, yielding measurable gains in offline metrics and online A/B tests.

LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs

Mon, 27 Apr 2026 05:10:15 +0000

arXiv: 2604.22050 · PDF

Authors: Mohamed Ali Souibgui, Jan Fostier, Rodrigo Abadía-Heredia, Bohdan Denysenko, Christian Marschke, Igor Peric

Primary category: cs.LG · all: cs.CL, cs.LG

Matched keywords: llm, inference, serving, attention, transformer, throughput, latency

TL;DR

LayerBoost is a layer-aware attention reduction method that uses sensitivity analysis to selectively apply softmax, linear sliding window, or no attention per layer, recovered via a lightweight 10M-token distillation. It improves throughput by up to 68% at high concurrency while preserving quality.

Lightweight Retrieval-Augmented Generation and Large Language Model-Based Modeling for Scalable Patient-Trial Matching

Mon, 27 Apr 2026 05:09:44 +0000

arXiv: 2604.22061 · PDF

Authors: Xiaodi Li, Yang Xiao, Munhwan Lee, Konstantinos Leventakos, Young J. Juhn, David Jones, Terence T. Sio, Wei Liu, Maria Vassilaki, Nansu Zong

Primary category: cs.CL · all: cs.AI, cs.CL, cs.LG

Matched keywords: large language model, llm, retrieval, reasoning, serving, fine-tun

TL;DR

该论文提出一种轻量级框架，结合 RAG 与 LLM 表征建模，用于可扩展的患者-临床试验匹配，在多个公开和真实临床数据集上以显著更低的计算代价达到与端到端 LLM 相当的性能。

Key Ideas

将 RAG 与 LLM 表征解耦：RAG 负责从长 EHR 中选相关片段，LLM 负责编码。
引入降维与轻量分类器，实现下游高效分类。
冻结 LLM 对结构化数据已足够，非结构化临床叙述则必须微调。
在公开基准与 Mayo Clinic 真实多模态数据集上验证可扩展性。

Approach

Pipeline 分两阶段：(1) RAG 从长 EHR 中检索与试验入组标准相关的临床片段，降低输入长度；(2) LLM 将这些片段编码为表征，再经降维后输入轻量预测器（如线性或浅层模型）完成匹配分类。对结构化字段用冻结 LLM，对自由文本叙述部分做微调。

Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework

Mon, 27 Apr 2026 05:09:07 +0000

arXiv: 2604.22119 · PDF

Authors: Tharindu Kumarage, Lisa Bauer, Yao Ma, Dan Rosen, Yashasvi Raghavendra Guduri, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, Charith Peris

Primary category: cs.AI · all: cs.AI

Matched keywords: large language model, llm, agent, agentic, reasoning

TL;DR

This paper introduces ESRRSim, a taxonomy-driven agentic framework for evaluating Emergent Strategic Reasoning Risks (ESRRs) in LLMs—behaviors like deception, evaluation gaming, and reward hacking. Across 11 reasoning LLMs, detection rates vary from 14.45% to 72.72%.

Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

Mon, 27 Apr 2026 05:08:32 +0000

arXiv: 2604.21193 · PDF

Authors: Vipula Rawte, Ryan Rossi, Franck Dernoncourt, Nedim Lipka

Primary category: cs.AI · all: cs.AI

Matched keywords: large language model, llm, retrieval, reasoning, inference, ai system

TL;DR

DAVinCI is a two-stage framework that combines claim attribution (to internal model components and external sources) with entailment-based verification and confidence calibration, improving factual reliability of LLM outputs by 5–20% over verification-only baselines on FEVER and CLIMATE-FEVER.

Key Ideas

Dual approach: pair attribution with verification rather than treating them independently.
Attribute claims both to internal LLM components and external retrieved sources.
Use entailment reasoning plus confidence recalibration for claim checking.
Release a modular implementation pluggable into existing LLM pipelines.

Approach

DAVinCI runs in two stages. Stage 1 attributes each generated claim to (a) internal model components and (b) external evidence sources. Stage 2 verifies each claim via entailment-based reasoning, then recalibrates confidence scores. The abstract does not specify the exact attribution mechanism (e.g., attention tracing, gradient-based, or retrieval citation) or which entailment model is used.

MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction

Mon, 27 Apr 2026 05:08:03 +0000

arXiv: 2604.21957 · PDF

Authors: Aladin Djuhera, Haris Gacanin, Holger Boche

Primary category: cs.IT · all: cs.AI, cs.IT, cs.LG, eess.SP

Matched keywords: large language model, llm, inference, attention, transformer, throughput, latency

TL;DR

MambaCSP replaces Transformer/LLM backbones for channel state prediction with a hybrid Mamba SSM augmented by lightweight patch-mixer attention, achieving 9–12% accuracy gains and up to 3× throughput over LLM baselines in MISO-OFDM simulations.

Key Ideas

Pure attention-based CSP suffers quadratic sequence cost, limiting real-time wireless use.
Selective SSMs (Mamba) offer linear-time alternatives but lack long-range cross-token mixing.
Hybrid design: Mamba backbone + periodic patch-mixer attention layers recovers global context cheaply.
Hardware efficiency (VRAM, latency, throughput) is treated as a first-class objective alongside accuracy.

Approach

MambaCSP swaps the LLM prediction backbone for a linear-time Mamba selective SSM operating on CSI sequences. Because pure SSMs capture mostly local dependencies, the authors periodically insert lightweight “patch-mixer” attention layers that inject cross-token interactions across patched CSI tokens. The architecture thus alternates SSM blocks (cheap sequential mixing) with sparse attention (global context), targeting MISO-OFDM channel prediction.

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

Mon, 27 Apr 2026 05:07:25 +0000

arXiv: 2604.21536 · PDF

Authors: Nikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko, Ilya Makarov

Primary category: cs.IR · all: cs.AI, cs.IR

Matched keywords: large language model, llm, reasoning, inference, serving, fine-tun

TL;DR

The paper proposes a knowledge distillation method that transfers LLM-generated textual user profiles into sequential recommender systems, enhancing user semantic understanding without incurring LLM inference costs at serving time.

Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents

Mon, 27 Apr 2026 05:06:52 +0000

arXiv: 2604.22085 · PDF

Authors: Seyed Moein Abtahi, Rasa Rahnema, Hetkumar Patel, Neel Patel, Majid Fekri, Tara Khani

Primary category: cs.AI · all: cs.AI

Matched keywords: large language model, agent, agentic, retrieval, inference, latency

TL;DR

Memanto is a memory layer for long-horizon LLM agents that replaces knowledge-graph pipelines with a typed semantic schema plus an information-theoretic retrieval engine, hitting 89.8% on LongMemEval and 87.1% on LoCoMo with single-query retrieval and no ingestion cost.

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

Mon, 27 Apr 2026 05:06:21 +0000

arXiv: 2604.21816 · PDF

Authors: Anuj Sadani, Deepak Kumar

Primary category: cs.AI · all: cs.AI

Matched keywords: large language model, llm, agent, agentic, reasoning, attention, latency

TL;DR

Tool Attention is a middleware layer that replaces MCP’s eager schema injection with intent-gated, lazy schema loading — cutting per-turn tool tokens by 95% in simulation and arguing that protocol efficiency, not context length, is the real bottleneck for scalable agentic systems.

Key Ideas

The “MCP Tax” (10k–60k tokens/turn) inflates KV cache and pushes context past known reasoning-degradation thresholds (~70%).
Generalize self-attention into attention over tools: score, gate, then selectively expose schemas.
Protocol-level efficiency is a tighter constraint than raw context window size.

Approach

A middleware sitting between agent and MCP servers with three components:

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Mon, 27 Apr 2026 05:05:47 +0000

arXiv: 2604.21896 · PDF

Authors: Chee Wei Tan, Yuchen Wang, Shangxin Guo

Primary category: cs.AI · all: cs.AI

Matched keywords: large language model, llm, agent, agentic, rag, reasoning, fine-tun

TL;DR

Nemobot is an interactive agentic environment that uses LLMs to build and deploy game-playing agents across Shannon’s taxonomy, spanning dictionary-based, solvable, heuristic, and learning-based games, aiming toward self-programming AI.

Key Ideas

Extends Shannon’s 1950 taxonomy of game-playing machines into an LLM era paradigm.
Four game classes handled distinctly: dictionary, solvable, heuristic, learning-based.
Agents combine minimax, crowd-sourced data, RLHF, and self-critique.
Programmable environment for tool-augmented generation and fine-tuning.
Positions user-in-the-loop customization as a route to self-programming.

Approach

A chatbot-driven agentic engine routes game tasks by class: compressed state-action mappings for dictionary games; exact mathematical reasoning with human-readable explanations for solvable games; hybrid minimax-plus-crowd heuristics for heuristic games; RLHF with self-critique and imitation learning for learning-based games. Nemobot exposes these as programmable, tool-augmented workflows users can customize and fine-tune.