2026-05-29 Paper Digest
845 arXiv papers on agent / LLM / AI infra submitted that day matched our topic filter. 10 were hand-picked by Claude — using title + authors + affiliations — and received a full Claude-generated analysis; the remaining 835 are listed at the bottom.
1. Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning
arXiv: 2602.00994 · cs.AI · Claude pick
在 Agentic RL 中,推理(reasoning)与工具调用(tool-use)共享参数会产生梯度方向冲突,导致联合优化效果下降。作者量化了这一干扰,并提出 DART——用两个独立 LoRA 适配器分别承接两类梯度——在 13 个 benchmark 上超越所有联合优化基线。
2. The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF
arXiv: 2605.29491 · cs.AI · Claude pick
Larger LLMs are systematically less robust to instruction-like noise embedded in reference text — a “Curse of Helpfulness” — which the new DistractionIF benchmark quantifies; GRPO-based RL partially recovers up to 15.5% robustness without hurting general instruction following.
3. Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts
arXiv: 2605.24846 · cs.LG · Claude pick
A tiny, cross-task subset of neurons (< 0.2% of all neurons) called “keystone neurons” can be identified in open-weight LLMs with just four prompts; removing them collapses all model capabilities, while fine-tuning only them matches or exceeds full-parameter fine-tuning.
4. RTP-LLM: High-Performance Alibaba LLM Inference Engine
arXiv: 2605.29639 · cs.OS · Claude pick
RTP-LLM is Alibaba’s production LLM inference engine, serving 100M+ users, that integrates prefill-decode disaggregation, multi-tiered KV cache, speculative decoding, and model-loading optimizations to deliver 4.7×–6.3× faster loading, 35–40% latency reduction, and substantial throughput gains over vLLM and SGLang.
5. GrepSeek: Training Search Agents for Direct Corpus Interaction
arXiv: 2605.29307 · cs.CL · Claude pick
GrepSeek trains a compact LLM to search large text corpora by issuing shell commands (rg, grep) directly against raw text, bypassing pre-computed indices, using a cold-start SFT + GRPO two-stage pipeline and a 7.6× sharded-parallel execution engine.
6. SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
arXiv: 2604.09557 · cs.DC · Claude pick
SPEED-Bench 是一个专为投机解码(Speculative Decoding)设计的综合评测套件,通过语义多样性驱动的数据策划与生产级引擎集成,解决现有基准在多样性、吞吐量评估和真实环境代表性上的系统性缺陷。
7. ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding
arXiv: 2604.13519 · cs.CL · Claude pick
ToolSpec 是一种免训练的推测解码方法,通过有限状态机利用预定义工具 schema 确定性地生成草稿 token,并结合历史调用检索,将工具调用生成速度提升最高 4.2×。
8. RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models
arXiv: 2603.18859 · cs.AI · Claude pick
RewardFlow builds a state graph from sampled agentic trajectories and propagates BFS-based rewards from success nodes to intermediate states, providing annotation-free dense process rewards that improve RL training across four agentic benchmarks without any reward model.
9. SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
arXiv: 2605.29796 · cs.AI · Claude pick
SAAS is an RL framework that teaches agentic search models when not to search by dynamically tracking the agent’s evolving knowledge boundary and converting that awareness into discriminative trajectory-level penalties, reducing over-search without accuracy loss.
10. When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems
arXiv: 2605.30102 · cs.MA · Claude pick
This position/workshop paper systematically examines the design space of hybrid multi-agent systems (MAS) that mix cloud-hosted frontier LLMs with on-device SLMs, finding that no single hybrid architecture dominates across tasks and that more cloud compute does not reliably improve performance.
Other matched papers
These papers matched the same topic keywords but were not among Claude’s top-N deep-analysis picks.
- Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling ·
cs.AI· arXiv 2605.29262 · score 32 —large language model, llm, agent, agentic, retrieval, rag - ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows ·
cs.CV· arXiv 2605.14113 · score 30 —large language model, llm, agent, agentic, retrieval, rag - SURGENT: A Surgical Multi-Agent Assistance System Across the Perioperative Workflow ·
cs.CL· arXiv 2605.29368 · score 29 —large language model, llm, agent, multi-agent, retrieval, reasoning - BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices ·
cs.AI· arXiv 2605.29705 · score 28 —large language model, llm, multi-agent, rag, reasoning, inference - Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation ·
cs.AI· arXiv 2605.29873 · score 27 —large language model, llm, reasoning, serving, kv cache, attention - Learning to Choose: An Empowerment-Guided Multi-Agent System with semantic communication for Adaptive Method Selection ·
cs.AI· arXiv 2605.30042 · score 23 —large language model, llm, agent, multi-agent, rag, serving - AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios ·
cs.AI· arXiv 2605.27995 · score 28 —large language model, llm, agent, tool use, tool-use, reasoning - CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems ·
cs.MA· arXiv 2605.29612 · score 27 —large language model, llm, agent, multi-agent, rag, latency - MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs ·
cs.AI· arXiv 2605.29512 · score 26 —large language model, llm, agent, multi-agent, reasoning, inference - CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective ·
cs.CL· arXiv 2502.03805 · score 26 —large language model, llm, rag, inference, kv cache, attention - Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems ·
cs.AI· arXiv 2605.29676 · score 21 —large language model, llm, agent, agentic, ai system - KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning ·
cs.AI· arXiv 2605.30002 · score 25 —large language model, llm, agent, agentic, rag, reasoning - Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization ·
cs.MA· arXiv 2605.30227 · score 25 —large language model, llm, agent, multi-agent, rag, reasoning - MediHive: A Decentralized Agent Collective for Medical Reasoning ·
cs.AI· arXiv 2603.27150 · score 21 —large language model, llm, agent, multi-agent, rag, reasoning - MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration ·
cs.AI· arXiv 2604.14889 · score 21 —llm, rag, reasoning, chain-of-thought, inference, serving - AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials ·
cs.AI· arXiv 2510.04704 · score 26 —large language model, llm, agent, agentic, retrieval, reasoning - DynaGraph: Lightweight Multi-Model Interaction Framework via Dynamic Topological Reconfiguration ·
cs.MA· arXiv 2605.29511 · score 30 —llm, agent, multi-agent, reasoning, inference, gpu - BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference ·
cs.LG· arXiv 2605.29233 · score 20 —llm, rag, inference, serving, kv-cache, parallelism - Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation ·
cs.CL· arXiv 2605.29861 · score 24 —large language model, llm, agent, multi-agent, tool use - ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation ·
cs.CV· arXiv 2604.11080 · score 20 —large language model, llm, rag, inference, quantization, attention - Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction ·
cs.CL· arXiv 2605.25297 · score 20 —llm, agent, agentic, reasoning, chain-of-thought, gpu - DFlash: Block Diffusion for Flash Speculative Decoding ·
cs.CL· arXiv 2602.06036 · score 20 —large language model, llm, inference, speculative decoding, gpu, latency - Accelerating Sparse Transformer Inference on GPU ·
cs.LG· arXiv 2506.06095 · score 20 —large language model, llm, rag, inference, attention, transformer - VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis ·
cs.AI· arXiv 2605.28978 · score 19 —large language model, llm, agent, multi-agent, reasoning - Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models ·
cs.AI· arXiv 2605.29625 · score 19 —large language model, llm, agent, multi-agent, attention - Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent ·
cs.AI· arXiv 2605.29966 · score 19 —large language model, llm, agent, rag, reasoning, fine-tun - MOOSE-Copilot: A Web-Based Interactive Assistant for Unified Exploratory and Fine-Grained Scientific Hypothesis Discovery ·
cs.CL· arXiv 2605.29475 · score 19 —large language model, llm, agent, agentic, rag - Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding ·
cs.CL· arXiv 2605.29707 · score 19 —llm, inference, serving, speculative decoding, transformer, throughput - The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems ·
cs.CL· arXiv 2602.15382 · score 19 —large language model, agent, multi-agent, rag, reasoning, quantization - Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents ·
cs.LG· arXiv 2605.28850 · score 19 —large language model, llm, agent, reasoning, transformer, fine-tun - Robust and Efficient Guardrails with Latent Reasoning ·
cs.AI· arXiv 2605.29068 · score 18 —large language model, llm, reasoning, inference, throughput, latency - Teaching Language Models to Check Grounded Claim Factuality with Human Test-Taking Strategies ·
cs.CL· arXiv 2605.29712 · score 18 —large language model, llm, retrieval, reasoning, inference, fine-tun - Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots ·
cs.CR· arXiv 2605.29963 · score 18 —llm, agent, agentic, rag, serving - Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies ·
cs.LG· arXiv 2605.30148 · score 18 —large language model, llm, inference, serving, fine-tun - E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing ·
cs.LG· arXiv 2512.03109 · score 18 —llm, agent, agentic, reasoning, ai system - Molecular Lead Optimization via Agentic Tool Planning ·
cs.LG· arXiv 2605.28862 · score 18 —llm, agent, agentic, reasoning, serving - FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models ·
cs.LG· arXiv 2511.11505 · score 18 —rag, inference, serving, parallelism, mixture of experts, moe - Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization ·
cs.AI· arXiv 2605.29396 · score 17 —large language model, llm, rag, serving, quantization - Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation ·
cs.AI· arXiv 2605.29560 · score 17 —large language model, llm, agent, rag, reasoning - VikingMem: A Memory Base Management System for Stateful LLM-based Applications ·
cs.AI· arXiv 2605.29640 · score 17 —large language model, llm, agent, retrieval, latency - OptSkills: Learning Generalizable Optimization Skills from Problem Archetypes via Cluster-Based Distillation ·
cs.AI· arXiv 2605.29829 · score 17 —large language model, llm, agent, rag, reasoning - Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning ·
cs.AI· arXiv 2605.30039 · score 17 —large language model, llm, rag, serving, fine-tun - Modularizing Educational LLM-Agency for Fostering Responsible Learning Assistance ·
cs.AI· arXiv 2605.30187 · score 17 —large language model, llm, agent, agentic - GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling ·
cs.CL· arXiv 2605.28835 · score 17 —large language model, llm, multi-agent, rag, fine-tun - Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning ·
cs.CL· arXiv 2605.28842 · score 17 —large language model, llm, reasoning, chain-of-thought, serving - First head-to-head comparison of agentic AI applied to the analysis of simulated data of the Einstein Telescope ·
cs.AI· arXiv 2605.28916 · score 17 —large language model, agent, agentic, ai system - Conf-Gen: Conformal Uncertainty Quantification for Generative Models ·
cs.LG· arXiv 2605.28920 · score 17 —large language model, llm, agent, ai system - Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents ·
cs.CL· arXiv 2605.29224 · score 17 —large language model, llm, agent, retrieval, rag - Training Deliberative Monitors for Black-Box Scheming Detection ·
cs.CL· arXiv 2605.29601 · score 17 —agent, agentic, reasoning, chain-of-thought, inference, fine-tun - Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction ·
cs.CR· arXiv 2605.29960 · score 17 —large language model, llm, agent, rag, attention - InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents ·
cs.AI· arXiv 2511.22884 · score 17 —large language model, llm, agent, multi-agent - Small Agent Group is the Future of Digital Health ·
cs.AI· arXiv 2602.08013 · score 17 —large language model, llm, agent, retrieval, reasoning - FundaPod: A Multi-Persona Agent Pod Platform with Knowledge Graph Memory for AI-Assisted Fundamental Investment Research ·
cs.AI· arXiv 2605.27864 · score 17 —large language model, llm, agent, serving - Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence ·
cs.CR· arXiv 2509.23573 · score 17 —large language model, llm, agent, rag, reasoning - When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making ·
cs.RO· arXiv 2603.16673 · score 17 —large language model, llm, agent, reasoning, latency - Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies ·
cs.CL· arXiv 2605.29062 · score 17 —large language model, llm, agent, multi-agent - WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction ·
cs.CV· arXiv 2605.29341 · score 17 —large language model, agent, agentic, retrieval, rag - ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems ·
cs.MA· arXiv 2602.08567 · score 17 —large language model, llm, agent, multi-agent - RAT+: Train Dense, Infer Sparse – Recurrence Augmented Attention for Dilated Inference ·
cs.LG· arXiv 2602.18196 · score 17 —rag, reasoning, inference, serving, kv cache, attention - Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching ·
cs.AI· arXiv 2605.29055 · score 16 —llm, agent, agentic, multi-agent - DenseSteer: Steering Small Language Models towards Dense Math Reasoning ·
cs.AI· arXiv 2605.29247 · score 16 —large language model, llm, reasoning, chain-of-thought, inference - Enhancing Multi-Agent Communication through Attention Steering with Context Relevance ·
cs.AI· arXiv 2605.30136 · score 16 —llm, agent, multi-agent, reasoning, attention - Pocket-Dentist: On-Device Dental Image Understanding via Efficient Multimodal Large Language Models ·
cs.CV· arXiv 2605.29299 · score 16 —large language model, rag, inference, serving, latency - Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge ·
cs.CV· arXiv 2605.29402 · score 16 —large language model, llm, retrieval, reasoning, inference - Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage ·
cs.CR· arXiv 2605.30040 · score 16 —large language model, llm, rag, reasoning, inference - AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning ·
cs.AI· arXiv 2602.23258 · score 16 —agent, multi-agent, retrieval, rag, reasoning, fine-tun - Grammar-Aware Literate Generative Mathematical Programming with Compiler-in-the-Loop ·
cs.PL· arXiv 2601.17670 · score 16 —large language model, llm, retrieval, rag, compiler - CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs ·
cs.MA· arXiv 2605.09823 · score 16 —llm, agent, multi-agent, serving - EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI ·
cs.CL· arXiv 2511.08949 · score 16 —large language model, llm, rag, inference, fine-tun - Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models ·
cs.AI· arXiv 2605.29303 · score 15 —large language model, reasoning, serving, fine-tun, post-train - NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs ·
cs.AI· arXiv 2605.29716 · score 15 —large language model, llm, reasoning, latency, fine-tun - Citation-Closure Retrieval and Per-Rule Attribution for Real-World Regulatory Compliance Question Answering ·
cs.AI· arXiv 2605.29742 · score 15 —large language model, llm, retrieval, rag, reasoning - Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence ·
cs.AI· arXiv 2605.29744 · score 15 —large language model, llm, multi-agent, reasoning - LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs ·
cs.AI· arXiv 2605.29756 · score 15 —large language model, llm, quantization, transformer, post-train - From GPS Points to Travel Patterns: Flexible and Semantic Trajectory Generation with LLMs ·
cs.AI· arXiv 2605.30014 · score 15 —large language model, llm, rag, quantization, fine-tun - PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers ·
cs.AI· arXiv 2605.30094 · score 15 —large language model, llm, agent, rag - Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models ·
cs.LG· arXiv 2605.28866 · score 15 —large language model, llm, reasoning, serving - Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models ·
cs.CL· arXiv 2605.30251 · score 15 —large language model, llm, rag, serving - LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback ·
cs.HC· arXiv 2605.30273 · score 15 —large language model, llm, rag, serving - RoboWits: Unexpected Challenges for Robotic Creative Problem Solving ·
cs.RO· arXiv 2605.30326 · score 15 —agent, multi-agent, tool use, reasoning, fine-tun - PersonaAgent: Bridging Memory and Action for Personalized LLM Agents ·
cs.AI· arXiv 2506.06254 · score 15 —large language model, llm, agent, rag - A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models ·
cs.AI· arXiv 2511.08548 · score 15 —large language model, llm, reasoning, ai system - SCOPE: Prompt Evolution for Enhancing Agent Effectiveness ·
cs.AI· arXiv 2512.15374 · score 15 —large language model, llm, agent, rag - AutoSizer: Automatic Sizing of Analog and Mixed-Signal Circuits via Large Language Model (LLM) Agents ·
cs.AI· arXiv 2602.02849 · score 15 —large language model, llm, agent, reasoning - Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs ·
cs.AI· arXiv 2602.02909 · score 15 —llm, reasoning, chain-of-thought, inference, attention, latency - MemCollab: Cross-Model Memory Collaboration via Contrastive Trajectory Distillation ·
cs.AI· arXiv 2603.23234 · score 15 —llm, agent, retrieval, reasoning, inference - Are LLMs Socially Adaptive? Contrasting Belief Evolution in Large Language Models and Humans ·
cs.CE· arXiv 2410.10398 · score 15 —large language model, llm, agent, reasoning - Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems ·
cs.CY· arXiv 2501.10332 · score 15 —large language model, llm, agent, rag - GroundAct: Can LLM Agents Ground Actions in Environmental States? ·
cs.CL· arXiv 2508.05614 · score 15 —llm, agent, tool use, reasoning, fine-tun - Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting ·
cs.CR· arXiv 2509.23571 · score 15 —large language model, llm, agent, reasoning - Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers ·
cs.CL· arXiv 2601.22139 · score 15 —large language model, llm, reasoning, chain-of-thought, fine-tun - Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training ·
cs.LG· arXiv 2603.00454 · score 15 —large language model, llm, serving, fine-tun - Combating Data Laundering in LLM Training ·
cs.CR· arXiv 2604.01904 · score 15 —large language model, llm, rag, serving - The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown ·
cs.AI· arXiv 2604.04956 · score 15 —large language model, llm, agent, reasoning - BIRDS: Characterizing and Understanding Biodiversity Impact of Large Language Model Serving ·
cs.AI· arXiv 2605.27480 · score 15 —large language model, llm, serving, gpu - ROVER: Routing Object-Centric Visual Evidence for Grounded Multi-Image Reasoning ·
cs.CV· arXiv 2605.27959 · score 15 —large language model, llm, rag, reasoning, attention - RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment ·
cs.CL· arXiv 2605.28827 · score 15 —large language model, llm, quantization, attention, fine-tun - Text-Preserving Lossy Text Compression: A Study of Strategic Deletion and LLM Reconstruction ·
cs.CL· arXiv 2605.29000 · score 15 —large language model, llm, serving, fine-tun - Revisiting Observation Reduction for Web Agents: Comprehensive Evaluation with a Lightweight Framework ·
cs.CL· arXiv 2605.29397 · score 15 —llm, agent, rag, inference, latency - From Blind Guess to Informed Judgment: Teaching LLMs to Evaluate Materials by Building Knowledge-Augmented Preference Signals ·
cs.CL· arXiv 2605.29555 · score 15 —large language model, llm, retrieval, reasoning, throughput - SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge? ·
cs.CL· arXiv 2605.30104 · score 15 —llm, agent, tool-use, reasoning, latency - Understanding Fact Recall in Language Models: Why Two-Stage Training Encourages Memorization but Mixed Training Teaches Knowledge ·
cs.CL· arXiv 2505.16178 · score 15 —large language model, llm, retrieval, rag, fine-tun - Long-Context Modeling with Dynamic Hierarchical Sparse Attention for Memory-Constrained LLM Inference ·
cs.CL· arXiv 2510.24606 · score 15 —llm, inference, serving, attention, gpu - How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning ·
cs.LG· arXiv 2602.02103 · score 15 —large language model, llm, rag, reasoning, chain-of-thought - K-FinHallu: A Hallucination Detection Benchmark for Multi-Turn RAG in Korean Finance ·
cs.LG· arXiv 2605.29523 · score 15 —large language model, llm, retrieval, rag, fine-tun - OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction ·
cs.LG· arXiv 2605.30247 · score 15 —large language model, llm, retrieval, rag, reasoning - Echoes within the Reasoning: Stealthy and Effective Watermarking via Chain of Thought ·
cs.CR· arXiv 2605.28890 · score 15 —large language model, rag, reasoning, chain-of-thought, quantization, fine-tun - The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane ·
cs.AI· arXiv 2605.29082 · score 14 —agent, agentic, multi-agent, throughput - Beyond Consensus: Trace-Level Synthesis in Mixture of Agents ·
cs.AI· arXiv 2605.29116 · score 14 —llm, agent, reasoning, serving - Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction ·
cs.AI· arXiv 2605.29168 · score 14 —llm, retrieval, rag, reasoning, serving - Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces ·
cs.AI· arXiv 2605.29288 · score 14 —llm, reasoning, chain-of-thought, serving, fine-tun - PassNet: Scaling Large Language Models for Graph Compiler Pass Generation ·
cs.AI· arXiv 2605.29357 · score 14 —large language model, llm, compiler, fine-tun - Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation ·
cs.AI· arXiv 2605.29430 · score 14 —llm, agent, agentic, reasoning - ParaTool: Shifting Tool Representations from Context to Parameters ·
cs.AI· arXiv 2605.29561 · score 14 —large language model, llm, inference, fine-tun - AgentSchool: An LLM-Powered Multi-Agent Simulation for Education ·
cs.AI· arXiv 2605.30144 · score 14 —llm, agent, multi-agent, reasoning - Hallucination Detection-Guided Preference Optimization for Clinical Summarization ·
cs.CL· arXiv 2605.28910 · score 14 —large language model, llm, rag, inference - CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models ·
cs.LG· arXiv 2605.28919 · score 14 —large language model, reasoning, inference, attention, transformer - SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers ·
cs.SE· arXiv 2605.29059 · score 14 —large language model, llm, reasoning, compiler - Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension ·
cs.MA· arXiv 2605.29874 · score 14 —llm, agent, multi-agent, rag - Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents ·
cs.SE· arXiv 2605.29910 · score 14 —llm, agent, multi-agent, reasoning - Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency ·
cs.SE· arXiv 2605.30208 · score 14 —llm, agent, agentic, latency - Enhancing LLM Medical Coding with Structured External Knowledge ·
cs.CL· arXiv 2605.27377 · score 14 —llm, agent, agentic, rag - EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter Adaptation ·
cs.CL· arXiv 2605.27390 · score 14 —large language model, retrieval, inference, speculative decoding - Draft-OPD: On-Policy Distillation for Speculative Draft Models ·
cs.CL· arXiv 2605.29343 · score 14 —large language model, inference, speculative decoding, fine-tun - Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting ·
cs.CL· arXiv 2605.29498 · score 14 —large language model, llm, inference, fine-tun - ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation ·
cs.CL· arXiv 2605.29791 · score 14 —large language model, llm, reasoning, inference - CCS: Clinical Consensus Selection for Radiology Report Generation ·
cs.CL· arXiv 2605.30131 · score 14 —large language model, llm, retrieval, inference - Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning ·
cs.CL· arXiv 2605.30245 · score 14 —large language model, llm, reasoning, inference - Cognitive Loop of Thought: Reversible Hierarchical Markov Chain for Efficient Mathematical Reasoning ·
cs.CL· arXiv 2604.06805 · score 14 —llm, rag, reasoning, chain-of-thought, kv cache - Inferring the Size of Large Language Models From Popular Text Memorization ·
cs.LG· arXiv 2605.29223 · score 14 —large language model, llm, rag, inference - On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference ·
cs.LG· arXiv 2605.29580 · score 14 —large language model, rag, reasoning, inference, fine-tun - Fingerprinting Inference Systems of Large Language Models ·
cs.CR· arXiv 2605.29979 · score 14 —large language model, llm, inference, attention - DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts ·
cs.LG· arXiv 2605.15422 · score 14 —parallelism, moe, attention, gpu, cuda, post-train - TC-MIS: Maximal Independent Set on Tensor-cores ·
cs.DC· arXiv 2605.29604 · score 14 —rag, inference, parallelism, gpu, cuda, throughput - Provably Secure Agent Guardrail ·
cs.AI· arXiv 2605.29251 · score 13 —large language model, agent, reasoning, latency - ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression ·
cs.AI· arXiv 2605.29350 · score 13 —rag, serving, moe, fine-tun, post-train - EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics ·
cs.AI· arXiv 2605.29394 · score 13 —large language model, llm, reasoning, fine-tun - When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs ·
cs.AI· arXiv 2605.29420 · score 13 —large language model, llm, retrieval, rag - VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data ·
cs.AI· arXiv 2605.29483 · score 13 —agent, agentic, tool use, reasoning - LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning ·
cs.AI· arXiv 2605.29649 · score 13 —large language model, llm, rag, reasoning - TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation ·
cs.AI· arXiv 2605.29656 · score 13 —large language model, llm, reasoning, chain-of-thought - Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability ·
cs.AI· arXiv 2605.29687 · score 13 —large language model, llm, reasoning, chain-of-thought - Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories ·
cs.AI· arXiv 2605.29893 · score 13 —llm, agent, tool use, reasoning - ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure ·
cs.AI· arXiv 2605.30284 · score 13 —large language model, llm, retrieval, reasoning - Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models ·
cs.CL· arXiv 2605.28828 · score 13 —large language model, llm, retrieval, reasoning - How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines ·
cs.CL· arXiv 2605.28840 · score 13 —large language model, llm, agent - GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human ·
cs.CL· arXiv 2605.28882 · score 13 —large language model, llm, agent - Sustainable Metal-Organic Framework Water Harvesters in the Artificial Intelligence Era ·
cs.AI· arXiv 2605.29179 · score 13 —large language model, llm, serving - KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing ·
cs.CR· arXiv 2605.29524 · score 13 —large language model, llm, serving - SCOPE: A Lightweight-training LLM Framework for Air Traffic Control Readback Monitoring ·
cs.LG· arXiv 2605.29543 · score 13 —large language model, llm, reasoning, latency - Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content ·
cs.LG· arXiv 2605.29659 · score 13 —large language model, llm, serving - Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning ·
cs.LG· arXiv 2605.29782 · score 13 —large language model, llm, rag, post-train - Inferring Code Correctness from Specification ·
cs.SE· arXiv 2605.29822 · score 13 —large language model, llm, reasoning, chain-of-thought - LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training ·
cs.LG· arXiv 2605.29888 · score 13 —large language model, llm, reasoning, post-train - VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies ·
cs.CV· arXiv 2605.30011 · score 13 —reasoning, chain-of-thought, inference, serving, latency - Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection ·
cs.CL· arXiv 2605.30274 · score 13 —large language model, agent, rag, reasoning - PuzzleClone: A DSL-Powered Framework for Synthesizing Verifiable Data ·
cs.AI· arXiv 2508.15180 · score 13 —large language model, llm, rag, reasoning - EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance ·
cs.AI· arXiv 2509.23730 · score 13 —large language model, llm, rag, reasoning - Controlling the Risk of Corrupted Contexts for Language Models via Early-Exiting ·
cs.AI· arXiv 2510.02480 · score 13 —large language model, llm, rag, attention - CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization ·
cs.AI· arXiv 2510.14150 · score 13 —large language model, llm, agent - Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation ·
cs.AI· arXiv 2604.10511 · score 13 —large language model, llm, reasoning, chain-of-thought - HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models ·
cs.AI· arXiv 2605.24140 · score 13 —large language model, llm, reasoning, fine-tun - Soro: A Lightweight Foundation Model and Chatbot for Tajik ·
cs.AI· arXiv 2605.27379 · score 13 —large language model, llm, rag, quantization - The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic ·
cs.AI· arXiv 2605.28700 · score 13 —large language model, llm, rag, reasoning - Jailbreaking and Mitigation of Vulnerabilities in Large Language Models ·
cs.CR· arXiv 2410.15236 · score 13 —large language model, llm, multi-agent - Less is Enough: Synthesizing Diverse Data in LLM Feature Space with Sparse Autoencoders ·
cs.CL· arXiv 2602.10388 · score 13 —large language model, llm, rag, post-train - A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search ·
cs.CL· arXiv 2602.11171 · score 13 —large language model, llm, rag, fine-tun - Steering at the Source: Style Modulation Heads for Robust Persona Control ·
cs.CL· arXiv 2603.13249 · score 13 —large language model, llm, attention, fine-tun - P$^2$RAG: Efficient Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval ·
cs.CR· arXiv 2603.14778 · score 13 —large language model, retrieval, rag, serving - Bridge-RAG: An Abstract Bridge Tree Based Retrieval Augmented Generation Algorithm ·
cs.IR· arXiv 2603.26668 · score 13 —large language model, llm, retrieval, rag - Teacher-Guided Policy Optimization for On-Policy Reasoning Distillation under Large Policy Divergence ·
cs.LG· arXiv 2605.13230 · score 13 —large language model, llm, reasoning, post-train - Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning ·
cs.CV· arXiv 2605.16385 · score 13 —llm, rag, reasoning, inference, attention - GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization ·
cs.LG· arXiv 2605.26092 · score 13 —large language model, llm, quantization, transformer - MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models ·
cs.CL· arXiv 2605.28825 · score 13 —large language model, llm, rag, reasoning - Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception ·
cs.CL· arXiv 2605.29064 · score 13 —large language model, llm, agent - Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation ·
cs.CL· arXiv 2605.29992 · score 13 —retrieval, inference, serving, transformer, gpu - PEARL: Training Socratic Tutors with Pedagogically Aligned Reinforcement Learning ·
cs.LG· arXiv 2605.29582 · score 13 —large language model, llm, agent - Catalyst-Agent: Autonomous heterogeneous catalyst screening with an LLM Agent ·
cs.CL· arXiv 2603.01311 · score 13 —llm, agent, tool use, rag - OpenCompass: A Universal Evaluation Platform for Large Language Models ·
cs.CL· arXiv 2605.19276 · score 13 —large language model, llm, rag, reasoning - OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents ·
cs.CL· arXiv 2605.23657 · score 13 —large language model, llm, agent - Feedback-to-Rubrics: Can We Learn Expert Criteria from Inline Comments? ·
cs.LG· arXiv 2605.29857 · score 13 —large language model, llm, serving - Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets ·
cs.LG· arXiv 2605.30289 · score 13 —large language model, retrieval, serving, transformer - SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones? ·
cs.LG· arXiv 2605.30329 · score 13 —large language model, llm, agent - Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection ·
cs.CR· arXiv 2605.29901 · score 13 —large language model, llm, reasoning, attention - Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding ·
cs.LG· arXiv 2511.04934 · score 13 —large language model, llm, ai system - SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs ·
cs.DC· arXiv 2603.00357 · score 13 —llm, training system, parallelism, gpu - Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes ·
cs.AI· arXiv 2605.28965 · score 12 —llm, agent, agentic - Governing Technical Debt in Agentic AI Systems ·
cs.AI· arXiv 2605.29129 · score 12 —agent, agentic, ai system - CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval ·
cs.AI· arXiv 2605.29271 · score 12 —llm, agent, retrieval, fine-tun - Formalizing Mathematics at Scale ·
cs.AI· arXiv 2605.29955 · score 12 —llm, agent, multi-agent - Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison ·
cs.AI· arXiv 2605.30087 · score 12 —llm, agent, rag, reasoning - SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation ·
cs.CL· arXiv 2605.29146 · score 12 —llm, agent, multi-agent - GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models ·
cs.LG· arXiv 2605.29398 · score 12 —large language model, llm, inference - SkillBrew: Multi-Objective Curation of Skill Banks for LLM Agents ·
cs.CL· arXiv 2605.29440 · score 12 —llm, agent, retrieval, rag - Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation ·
cs.CL· arXiv 2605.29502 · score 12 —llm, rag, serving, fine-tun - Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems ·
cs.MA· arXiv 2605.29790 · score 12 —llm, agent, multi-agent - HARP: Hadamard-Preconditioned Adaptive Rotation Processor for Extreme LLM Quantization ·
cs.LG· arXiv 2605.29843 · score 12 —llm, serving, quantization, post-train - Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas ·
cs.MA· arXiv 2605.30003 · score 12 —llm, agent, multi-agent - No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval ·
cs.IR· arXiv 2605.30120 · score 12 —retrieval, rag, serving, throughput, latency - Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms ·
cs.CY· arXiv 2605.30169 · score 12 —agent, agentic, multi-agent - Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning ·
cs.CV· arXiv 2605.30231 · score 12 —llm, rag, reasoning, transformer, fine-tun - Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents ·
cs.AI· arXiv 2602.01869 · score 12 —llm, agent, rag, reasoning - SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data ·
cs.AI· arXiv 2604.26645 · score 12 —agent, agentic, multi-agent - From Rubrics to Reliable Scores: Evidence-Grounded Text Evaluation with LLM Judges ·
cs.CL· arXiv 2601.08654 · score 12 —large language model, llm, inference - Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR ·
cs.CL· arXiv 2602.12642 · score 12 —llm, rag, reasoning, scheduler, post-train - PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration ·
cs.CL· arXiv 2605.29313 · score 12 —llm, agent, multi-agent - Learning Design Skills as Memory Policies for Agentic Photonic Inverse Design ·
cs.CL· arXiv 2605.29421 · score 12 —llm, agent, agentic - UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering ·
cs.CL· arXiv 2605.30076 · score 12 —large language model, llm, inference - Rare Event Analysis of Large Language Models ·
cs.LG· arXiv 2602.06791 · score 12 —large language model, llm, inference - HPC-vQPU: A Service-Export Architecture for Virtual QPUs on Batch-Scheduled HPC Systems ·
cs.DC· arXiv 2605.28845 · score 12 —agent, serving, gpu, scheduler - Mind Your Tone: Does Tone Alter LLM Performance? ·
cs.AI· arXiv 2605.29027 · score 11 —large language model, llm, reasoning - GTA: Generating Long-Horizon Tasks for Web Agents at Scale ·
cs.AI· arXiv 2605.29218 · score 11 —agent, tool-use, retrieval, rag - Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility ·
cs.AI· arXiv 2605.29229 · score 11 —large language model, llm, reasoning - PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing ·
cs.AI· arXiv 2605.29815 · score 11 —large language model, llm, rag - Harnessing non-adversarial robustness in large language models ·
cs.AI· arXiv 2605.29816 · score 11 —large language model, llm, fine-tun - Make LLM Learn to Synthesize from Streaming Experiences through Feedback ·
cs.AI· arXiv 2605.29940 · score 11 —large language model, llm, rag - Anchorless Diversification for Parallel LLM Ideation ·
cs.AI· arXiv 2605.30150 · score 11 —llm, inference, serving - When Should Models Change Their Minds? Contextual Belief Management in Large Language Models ·
cs.AI· arXiv 2605.30219 · score 11 —large language model, llm, rag - S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering ·
cs.CL· arXiv 2605.28831 · score 11 —agent, retrieval, rag, inference - SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation ·
cs.CL· arXiv 2605.28837 · score 11 —large language model, llm, retrieval - GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models ·
cs.CL· arXiv 2605.28848 · score 11 —large language model, llm, retrieval - Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT? ·
cs.LG· arXiv 2605.28860 · score 11 —large language model, llm, fine-tun - Label-Free Reinforcement Learning via Cross-Model Entropy ·
cs.LG· arXiv 2605.29009 · score 11 —large language model, llm, post-train - Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text ·
cs.CL· arXiv 2605.29076 · score 11 —llm, reasoning, inference, fine-tun - CA-AC-MPC: CUDA-Accelerated Actor-Critic Model Predictive Control ·
cs.RO· arXiv 2605.29155 · score 11 —inference, serving, cuda, latency - Parallax: Parameterized Local Linear Attention for Language Modeling ·
cs.LG· arXiv 2605.29157 · score 11 —large language model, llm, attention - UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning ·
cs.CL· arXiv 2605.29170 · score 11 —large language model, llm, reasoning - DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents ·
cs.CL· arXiv 2605.29256 · score 11 —large language model, agent, rag - Beyond Bilingual Transfer: Multilingual Code-Switching in Instruction Tuning ·
cs.CL· arXiv 2605.29414 · score 11 —large language model, llm, rag - Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment ·
cs.CL· arXiv 2605.29458 · score 11 —large language model, llm, reasoning - Projectional Decoding: Towards Semantic-Aware LLM Generation ·
cs.SE· arXiv 2605.30054 · score 11 —large language model, llm, reasoning - MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings ·
cs.CL· arXiv 2605.30295 · score 11 —large language model, llm, reasoning - In-Context Reward Adaptation for Robust Preference Modeling ·
cs.LG· arXiv 2605.30323 · score 11 —large language model, rag, transformer, rlhf - LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models ·
cs.AI· arXiv 2601.06431 · score 11 —large language model, rag, reasoning, attention - IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents ·
cs.AI· arXiv 2604.05157 · score 11 —large language model, agent, rag - Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling ·
cs.AI· arXiv 2604.25098 · score 11 —large language model, llm, reasoning - Hierarchical Task Network Planning with LLM-Generated Heuristics ·
cs.AI· arXiv 2605.07707 · score 11 —large language model, llm, rag - Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives ·
cs.GT· arXiv 2505.21627 · score 11 —large language model, llm, rag - An accuracy-aware extension to LRP-based pruning for CNNs to prevent cascading accuracy degradation in data-scarce transfer learning ·
cs.CV· arXiv 2511.10861 · score 11 —rag, inference, serving, fine-tun - Differential syntactic and semantic encoding in LLMs ·
cs.CL· arXiv 2601.04765 · score 11 —large language model, llm, rag - Thinking Before Constraining: A Unified Decoding Framework for Large Language Models ·
cs.CL· arXiv 2601.07525 · score 11 —large language model, llm, reasoning - Who can we trust? LLM-as-a-jury for Comparative Assessment ·
cs.CL· arXiv 2602.16610 · score 11 —large language model, llm, rag - JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments ·
cs.CV· arXiv 2602.18527 · score 11 —large language model, llm, reasoning - Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data ·
cs.LG· arXiv 2603.19294 · score 11 —large language model, llm, post-train - The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More ·
cs.CL· arXiv 2603.23971 · score 11 —agent, rag, reasoning, inference - SelfGrader: LLM Jailbreak Detection via Anchored Token-Level Logits ·
cs.CR· arXiv 2604.01473 · score 11 —large language model, llm, latency - DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories ·
cs.CL· arXiv 2604.20443 · score 11 —llm, rag, reasoning, inference - Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment ·
cs.CL· arXiv 2605.28822 · score 11 —large language model, llm, fine-tun - The Trust Paradox: How CS Researchers Engage LLM Leaderboards ·
cs.CL· arXiv 2605.28966 · score 11 —large language model, llm, rag - Reasoning-preserved Efficient Distillation of Large Language Models via Activation-aware Initialization ·
cs.CL· arXiv 2605.29327 · score 11 —large language model, llm, reasoning - FinGuard: Detecting Financial Regulatory Non-Compliance in LLM Interactions ·
cs.CL· arXiv 2605.29427 · score 11 —large language model, llm, fine-tun - Comparative Evaluation of Machine Translation Systems on Images with Text ·
cs.CL· arXiv 2605.29476 · score 11 —large language model, llm, reasoning - Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese ·
cs.CL· arXiv 2605.29667 · score 11 —large language model, llm, rag - Spurious Prompts: Can Irrelevant Prompts Steer Large Language Models? ·
cs.CL· arXiv 2605.29678 · score 11 —large language model, llm, reasoning - Understanding Safety-Sensitive Expert Behavior in Mixture-of-Experts LLMs ·
cs.CL· arXiv 2605.29708 · score 11 —llm, serving, moe - Nine Judges, Two Effective Votes: Correlated Errors Undermine LLM Evaluation Panels ·
cs.CL· arXiv 2605.29800 · score 11 —llm, reasoning, chain-of-thought, inference - Latent Performance Profiling of Large Language Models ·
cs.CL· arXiv 2605.30018 · score 11 —large language model, llm, reasoning - Who Am I? History-Aware Profiles for Student Simulation in Tutoring Dialogues ·
cs.CL· arXiv 2605.30051 · score 11 —large language model, llm, rag - CommunityFact: A Dynamic, Multilingual, Multi-domain Benchmark for Misinformation Detection in the Wild ·
cs.CL· arXiv 2605.30241 · score 11 —llm, retrieval, rag, inference - Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content ·
cs.CR· arXiv 2605.29245 · score 11 —large language model, llm, rag - Understanding the Ability of LLMs to Handle Character-Level Perturbation ·
cs.CL· arXiv 2510.14365 · score 11 —large language model, llm, rag - WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models ·
cs.CL· arXiv 2512.00837 · score 11 —large language model, llm, rag - “Be My Cheese?”: Cultural Nuance Benchmarking for Machine Translation in Multilingual LLMs ·
cs.CL· arXiv 2602.04729 · score 11 —large language model, llm, rag - Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing ·
cs.CL· arXiv 2603.17942 · score 11 —large language model, llm, throughput - HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation ·
cs.CL· arXiv 2604.09629 · score 11 —large language model, llm, fine-tun - When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance ·
cs.CL· arXiv 2605.22975 · score 11 —large language model, llm, rag - HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench ·
cs.LG· arXiv 2601.20255 · score 11 —large language model, llm, fine-tun - On-Policy Replay for Continual Supervised Fine-Tuning ·
cs.LG· arXiv 2605.29495 · score 11 —large language model, llm, fine-tun - Convergence Theory for Iterative LLM-Based Neural Architecture Search: A Parametric Cross-Entropy Framework with Closed-Form Proxy Reliability ·
cs.LG· arXiv 2605.30103 · score 11 —large language model, llm, fine-tun - The Biosecurity Blind Spot: Systematic Dual-use Detection in Open Science Infrastructure ·
cs.DL· arXiv 2605.28843 · score 11 —large language model, llm, rag - Generative Spatiotemporal Intent Sequence Recommendation via Implicit Reasoning in Amap ·
cs.IR· arXiv 2605.28888 · score 11 —llm, reasoning, inference, latency - TabPFN-3: Technical Report ·
cs.LG· arXiv 2605.13986 · score 11 —llm, inference, kv cache - Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies ·
cs.AI· arXiv 2605.29270 · score 10 —llm, agent, retrieval - DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation ·
cs.AI· arXiv 2605.29522 · score 10 —agent, agentic, retrieval - Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents ·
cs.AI· arXiv 2605.30159 · score 10 —llm, agent, reasoning - Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents ·
cs.AI· arXiv 2605.30335 · score 10 —llm, agent, retrieval - No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand ·
cs.CL· arXiv 2605.28836 · score 10 —multi-agent, serving, attention - LogDx-CI: Benchmarking Log Reduction Tools for LLM Root-Cause Diagnosis ·
cs.SE· arXiv 2605.28876 · score 10 —llm, agent, rag - OISD: On-Policy Internal Self-Distillation of Language Models ·
cs.LG· arXiv 2605.29089 · score 10 —reasoning, serving, attention, post-train - unix-ctf: Procedural Environments for Unix-Competence Reinforcement Learning ·
cs.CR· arXiv 2605.29115 · score 10 —llm, agent, fine-tun - Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA ·
cs.SE· arXiv 2605.29277 · score 10 —llm, agent, reasoning - From Prompts to Context: An Ontology-Driven Framework for Human-Generative AI Collaboration ·
cs.HC· arXiv 2605.29675 · score 10 —agent, retrieval, ai system - CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation ·
cs.CL· arXiv 2605.29886 · score 10 —llm, retrieval, rag, reasoning - Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor? ·
cs.CL· arXiv 2605.30152 · score 10 —llm, agent, gpu - On Distributional Reinforcement Learning in Chaotic Dynamical Systems ·
cs.LG· arXiv 2605.30160 · score 10 —llm, multi-agent, rag - Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection ·
cs.CR· arXiv 2605.30189 · score 10 —llm, serving, fine-tun - VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion ·
cs.CV· arXiv 2605.30351 · score 10 —kv cache, attention, throughput, latency - Graph-Enhanced Policy Optimization in LLM Agent Training ·
cs.AI· arXiv 2510.26270 · score 10 —llm, agent, rag - From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning ·
cs.AI· arXiv 2601.21909 · score 10 —llm, reasoning, fine-tun, post-train - SIA: Self Improving AI with Harness & Weight Updates ·
cs.AI· arXiv 2605.27276 · score 10 —agent, agentic, gpu - Scaling Small Agents Through Strategy Auctions ·
cs.MA· arXiv 2602.02751 · score 10 —agent, agentic, rag - Many-Shot CoT-ICL: Making In-Context Learning Truly Learn ·
cs.CL· arXiv 2605.13511 · score 10 —llm, retrieval, reasoning, chain-of-thought - Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation ·
cs.CL· arXiv 2605.29007 · score 10 —llm, agent, reasoning - Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs ·
cs.CL· arXiv 2605.30021 · score 10 —llm, serving, post-train - HEART-Bench: Do LLM Agents Exhibit Human-like Psychology? ·
cs.CL· arXiv 2605.30058 · score 10 —llm, agent, reasoning - DirectorBench: Diagnosing Long-Form Video Generation with Personalized Multi-Agent Evaluation ·
cs.CL· arXiv 2605.30090 · score 10 —llm, multi-agent, rag - When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer ·
cs.LG· arXiv 2605.29190 · score 10 —llm, reasoning, chain-of-thought, post-train - Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs ·
cs.CR· arXiv 2605.29737 · score 10 —llm, agent, rag - Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding ·
cs.CL· arXiv 2512.17220 · score 10 —llm, retrieval, rag, reasoning - SEEK: Semantic Evidence Extraction via Adaptive ChunKing for Multilingual Fact-Checking ·
cs.CL· arXiv 2605.26755 · score 10 —llm, rag, serving - DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models ·
cs.IR· arXiv 2605.07210 · score 10 —llm, retrieval, attention, latency - Rethinking Post-Training Recipes for Multimodal Time-Series Forecasting ·
cs.LG· arXiv 2605.29401 · score 10 —llm, reasoning, fine-tun, post-train - Harmless Yet Harmful: Neutral Prompting Attacks for Stealthy Hallucination Steering in Agent Skills ·
cs.CR· arXiv 2605.29354 · score 10 —llm, agent, rag - Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents ·
cs.LG· arXiv 2605.14241 · score 10 —llm, agent, latency - When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis ·
cs.AI· arXiv 2605.29025 · score 9 —large language model, llm - The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models ·
cs.AI· arXiv 2605.29123 · score 9 —reasoning, inference, serving - Opt-Verifier: Unleashing the Power of LLMs for Optimization Modeling via Dual-Side Verification ·
cs.AI· arXiv 2605.29556 · score 9 —large language model, llm - FinVerBench: Benchmark Validity and Calibration in Large Language Model Financial Statement Verification ·
cs.AI· arXiv 2605.29586 · score 9 —large language model, llm - Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation ·
cs.AI· arXiv 2605.29652 · score 9 —large language model, llm - NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs ·
cs.AI· arXiv 2605.29685 · score 9 —large language model, llm - Toward AI Systems That Understand Self and Others: A Multi-Phase Inference Framework for Human Cognitive Diversity and World-Model Alignment ·
cs.AI· arXiv 2605.29930 · score 9 —rag, inference, ai system - Teaching Values to Machines: Simulating Human-Like Behavior in LLMs ·
cs.AI· arXiv 2605.30036 · score 9 —large language model, llm - Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers ·
cs.AI· arXiv 2605.30049 · score 9 —inference, serving, transformer - Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale ·
cs.AI· arXiv 2605.30200 · score 9 —large language model, llm - Demystifying Data Organization for Enhanced LLM Training ·
cs.AI· arXiv 2605.30334 · score 9 —large language model, llm - SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations ·
cs.AI· arXiv 2605.30345 · score 9 —large language model, llm - Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning ·
cs.CL· arXiv 2605.28829 · score 9 —large language model, reasoning, post-train - Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation ·
cs.CL· arXiv 2605.28830 · score 9 —large language model, llm - GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization ·
cs.CR· arXiv 2605.29107 · score 9 —large language model, llm - Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback ·
cs.IR· arXiv 2605.29141 · score 9 —large language model, llm - Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback ·
cs.LG· arXiv 2605.29184 · score 9 —large language model, llm - SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing ·
cs.CR· arXiv 2605.29468 · score 9 —large language model, llm - VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models ·
cs.RO· arXiv 2605.29562 · score 9 —retrieval, inference, serving - Predicting Causal Effects from Natural Language Queries using Structured Representations ·
cs.CL· arXiv 2605.29631 · score 9 —large language model, llm - OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning ·
cs.CV· arXiv 2605.29657 · score 9 —inference, serving, attention - Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models ·
cs.CL· arXiv 2605.29826 · score 9 —large language model, llm - Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering ·
cs.CV· arXiv 2605.29881 · score 9 —rag, inference, attention, throughput - How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency ·
cs.CR· arXiv 2605.30096 · score 9 —large language model, llm - PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding ·
cs.CV· arXiv 2605.30126 · score 9 —rag, inference, serving - How LoRA Remembers? A Parametric Memory Law for LLM Finetuning ·
cs.CL· arXiv 2605.30260 · score 9 —large language model, llm - LLMSurgeon: Diagnosing Data Mixture of Large Language Models ·
cs.CL· arXiv 2605.30348 · score 9 —large language model, llm - Estimating the Empowerment of Language Model Agents ·
cs.AI· arXiv 2509.22504 · score 9 —agent, tool-use, rag - Benchmarking at the Edge of Comprehension ·
cs.AI· arXiv 2602.14307 · score 9 —large language model, llm - SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems ·
cs.AI· arXiv 2603.23853 · score 9 —reasoning, inference, ai system - Automatic Layer Selection for Hallucination Detection ·
cs.AI· arXiv 2605.26366 · score 9 —large language model, llm - Less Is More: Elevating RAG via Performance-Driven Context Compression ·
cs.CL· arXiv 2508.19282 · score 9 —large language model, retrieval, rag - Empathic Prompting: Non-Verbal Context Integration for Multimodal LLM Conversations ·
cs.HC· arXiv 2510.20743 · score 9 —large language model, llm - CORE-T: COherent REtrieval of Tables for Text-to-SQL ·
cs.CL· arXiv 2601.13111 · score 9 —llm, retrieval, inference - Pushing the Limits of Block Rotations in Post-Training Quantization ·
cs.LG· arXiv 2601.22347 · score 9 —inference, quantization, transformer, post-train - CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating ·
cs.CV· arXiv 2605.11723 · score 9 —reasoning, chain-of-thought, inference, fine-tun - Reducing Political Manipulation with Consistency Training ·
cs.CL· arXiv 2605.22771 · score 9 —large language model, llm - Large language models reorganize representational geometry during in-context learning ·
cs.CL· arXiv 2605.28854 · score 9 —large language model, llm - User-Aware Active Knowledge Acquisition for Emotional Support Dialogue ·
cs.CL· arXiv 2605.29715 · score 9 —large language model, rag, reasoning - AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation ·
cs.CL· arXiv 2605.29741 · score 9 —large language model, rag, fine-tun - DySem: Uncovering Dynamic Semantic Components via Multilingual Consensus for Calculating Semantic Textual Similarity ·
cs.CL· arXiv 2605.29751 · score 9 —large language model, llm - EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation ·
cs.CL· arXiv 2605.29847 · score 9 —large language model, llm - Adaptive Targeted Dynamic Chunking for Tokenization-Free Hierarchical Model ·
cs.CL· arXiv 2605.30080 · score 9 —large language model, llm - Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees ·
cs.CL· arXiv 2410.15761 · score 9 —large language model, llm - HaluNet: Learning Hallucination Risk from Internal Signals in LLM Question Answering ·
cs.CL· arXiv 2512.24562 · score 9 —large language model, llm - SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts ·
cs.CL· arXiv 2604.26506 · score 9 —large language model, llm - Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance ·
cs.CV· arXiv 2411.14279 · score 9 —llm, inference, attention - Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models ·
cs.LG· arXiv 2605.28896 · score 9 —large language model, transformer, fine-tun - MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference ·
cs.LG· arXiv 2605.30218 · score 9 —llm, inference, latency - Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models ·
cs.LG· arXiv 2602.10520 · score 9 —llm, reasoning, inference - Enhancing LLM Training via Spectral Clipping ·
cs.LG· arXiv 2603.14315 · score 9 —large language model, llm - Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance ·
cs.LG· arXiv 2605.00553 · score 9 —large language model, llm - PRIM: Meta-Learned Bayesian Root Cause Analysis ·
cs.LG· arXiv 2605.08786 · score 9 —rag, inference, transformer, fine-tun - PACE: Geometry-Aware Bridge Transport for Single-Cell Trajectory Inference ·
cs.LG· arXiv 2605.18587 · score 9 —rag, inference, serving - Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory ·
cs.PF· arXiv 2605.29135 · score 9 —large language model, gpu, throughput - ReasonOps: Operator Segmentation for LLM Reasoning Traces ·
cs.AI· arXiv 2605.29192 · score 8 —llm, reasoning, chain-of-thought - BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents ·
cs.AI· arXiv 2605.29225 · score 8 —llm, agent - DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning ·
cs.AI· arXiv 2605.29568 · score 8 —llm, rag, reasoning - GPS-Enhanced Tourist Mobility Modeling with Seasonal Spatial Priors and LLM-Based Activity Chain Generation ·
cs.AI· arXiv 2605.29578 · score 8 —llm, serving - PTCG-Bench: Can LLM Agents Master Pok'emon Trading Card Game? ·
cs.AI· arXiv 2605.29653 · score 8 —llm, agent - GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents ·
cs.AI· arXiv 2605.29668 · score 8 —llm, agent - Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling ·
cs.AI· arXiv 2605.29697 · score 8 —agent, agentic - Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations ·
cs.AI· arXiv 2605.29786 · score 8 —llm, agent - SkillsInjector: Dynamic Skill Context Construction for LLM Agents ·
cs.AI· arXiv 2605.29794 · score 8 —llm, agent - MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains ·
cs.AI· arXiv 2605.29795 · score 8 —agent, retrieval, rag - AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security ·
cs.AI· arXiv 2605.29801 · score 8 —agent, agentic - OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields ·
cs.AI· arXiv 2605.29833 · score 8 —llm, retrieval, reasoning - Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation ·
cs.AI· arXiv 2605.30000 · score 8 —llm, agent - Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization ·
cs.CL· arXiv 2605.28969 · score 8 —llm, agent - Real-rootedness of the Poincar'e polynomials of $\overline{\mathcal M}_{0,n}$: an AI-assisted proof ·
math.AG· arXiv 2605.29151 · score 8 —agent, agentic - OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources ·
cs.CL· arXiv 2605.29250 · score 8 —retrieval, rag, serving - Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles ·
cs.HC· arXiv 2605.29473 · score 8 —llm, retrieval, rag - GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing ·
cs.SE· arXiv 2605.29532 · score 8 —llm, agent - Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory ·
cs.CL· arXiv 2605.29630 · score 8 —agent, retrieval, rag - Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents ·
cs.CL· arXiv 2605.29927 · score 8 —llm, agent - BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models ·
cs.RO· arXiv 2605.30226 · score 8 —rag, serving, post-train - Gram: Assessing sabotage propensities via automated alignment auditing ·
cs.LG· arXiv 2605.30322 · score 8 —agent, agentic - SafeSearch: Automated Red-Teaming of LLM-Based Search Agents ·
cs.AI· arXiv 2509.23694 · score 8 —llm, agent - TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis ·
cs.AI· arXiv 2510.06063 · score 8 —rag, reasoning, serving - Causal-JEPA: Learning World Models through Object-Level Latent Masking ·
cs.AI· arXiv 2602.11389 · score 8 —agent, rag, reasoning - ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology ·
cs.AI· arXiv 2605.24399 · score 8 —reasoning, mixture of experts, moe - CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists ·
cs.AI· arXiv 2605.26029 · score 8 —llm, agent - GRPO is Secretly a Process Reward Model ·
cs.LG· arXiv 2509.21154 · score 8 —llm, rag, reasoning - ReflexGrad: Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing ·
cs.LG· arXiv 2511.14584 · score 8 —llm, agent - Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning ·
cs.LG· arXiv 2602.01058 · score 8 —llm, reasoning, post-train - Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover ·
cs.LG· arXiv 2603.11331 · score 8 —large language model, inference - EvA: An Evidence-First Audio Understanding Paradigm for LALMs ·
cs.SD· arXiv 2603.27667 · score 8 —rag, reasoning, serving - Graph Memory Transformer (GMT) ·
cs.LG· arXiv 2604.23862 · score 8 —serving, attention, transformer - When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks ·
cs.CL· arXiv 2604.27272 · score 8 —llm, serving - Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning ·
cs.LG· arXiv 2605.07804 · score 8 —rag, reasoning, serving - KYA: A Framework-Agnostic Trust Layer for Autonomous Systems with Verifiable Provenance and Hierarchical Policy Composition ·
cs.CR· arXiv 2605.25376 · score 8 —agent, multi-agent - Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges ·
cs.CR· arXiv 2605.26156 · score 8 —llm, serving - FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning ·
cs.CL· arXiv 2605.29317 · score 8 —serving, attention, fine-tun - On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training ·
cs.CL· arXiv 2605.29496 · score 8 —reasoning, chain-of-thought, fine-tun, post-train - GAPD: Gold-Action Policy Distillation for Agentic Reinforcement Learning in Knowledge Base Question Answering ·
cs.CL· arXiv 2605.29584 · score 8 —agent, agentic - Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering ·
cs.CL· arXiv 2605.29648 · score 8 —llm, rag, reasoning - HTAM: Hierarchical Transition-Attended Memory for Operator Optimization ·
cs.CL· arXiv 2605.29734 · score 8 —llm, gpu, cuda - GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German ·
cs.CL· arXiv 2605.30214 · score 8 —llm, rag, reasoning - LoMo: Local Modality Substitution for Deeper Vision-Language Fusion ·
cs.CV· arXiv 2605.30265 · score 8 —rag, reasoning, serving - Procedural Pretraining: Warming Up Language Models with Abstract Data ·
cs.CL· arXiv 2601.21725 · score 8 —llm, reasoning, attention - Ask Now, Use Later: Benchmarking the Proactivity Gap in Long-Lived LLM Agents ·
cs.CL· arXiv 2605.28108 · score 8 —llm, agent - ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents ·
cs.MA· arXiv 2604.07789 · score 8 —agent, agentic - When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL ·
cs.LG· arXiv 2605.28918 · score 8 —llm, agent - Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting ·
cs.LG· arXiv 2605.29727 · score 8 —speculative decoding, gpu, latency - Adapting Automotive Aerodynamics Surrogates to New Vehicle Families via Transfer Learning ·
cs.CE· arXiv 2605.27968 · score 8 —serving, transformer, fine-tun - Anytime-Valid Federated Conformal RAG for LLM Swarms ·
stat.ML· arXiv 2605.29139 · score 8 —llm, retrieval, rag - Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning ·
cs.LG· arXiv 2506.05985 · score 8 —agent, rag, fine-tun - SADA: Safe and Adaptive Aggregation of Multiple Black-Box Predictions in Semi-Supervised Learning ·
stat.ML· arXiv 2509.21707 · score 8 —large language model, inference - A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments ·
cs.LG· arXiv 2512.13517 · score 8 —agent, rag, reasoning - Ciphera: A Decentralised Biometric Identity Framework ·
cs.CR· arXiv 2605.29868 · score 8 —rag, serving, latency - UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents ·
cs.AI· arXiv 2605.29534 · score 7 —agent, inference - MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization ·
cs.AI· arXiv 2605.29951 · score 7 —rag, reasoning, inference - LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation ·
cs.LG· arXiv 2605.29280 · score 7 —inference, serving - Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset ·
cs.CV· arXiv 2605.29462 · score 7 —rag, reasoning, inference - DLM-SWAI: Steering Diffusion Language Models Before They Unmask ·
cs.CL· arXiv 2605.29626 · score 7 —inference, serving - ESPO: Early-Stopping Proximal Policy Optimization ·
cs.LG· arXiv 2605.29860 · score 7 —large language model, reasoning - Unlocking the Working Memory of Large Language Models for Latent Reasoning ·
cs.CL· arXiv 2605.30343 · score 7 —large language model, reasoning - Modeling Hierarchical Thinking in Large Reasoning Models ·
cs.AI· arXiv 2510.22437 · score 7 —reasoning, chain-of-thought, inference - Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models ·
cs.LG· arXiv 2601.01162 · score 7 —large language model, rag - Steering Language Models Before They Speak: Logit-Level Interventions ·
cs.CL· arXiv 2601.10960 · score 7 —inference, serving - From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons ·
cs.CL· arXiv 2605.27387 · score 7 —large language model, attention - LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English ·
cs.CL· arXiv 2605.29048 · score 7 —llm, inference - Kronecker Embeddings: Byte-Level Structured Token Representations for Parameter-Efficient Language Models ·
cs.CL· arXiv 2605.29459 · score 7 —large language model, attention - Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation ·
cs.CL· arXiv 2605.29714 · score 7 —rag, moe, fine-tun - Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations ·
cs.CL· arXiv 2601.08064 · score 7 —large language model, rag - Mining or Synthesis? Rethinking Exploration Efficiency in Iterative Alignment of Mathematical Reasoning ·
cs.CL· arXiv 2602.05370 · score 7 —large language model, reasoning - Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation ·
cs.CL· arXiv 2605.26428 · score 7 —large language model, rag - When the Same Coefficients Reach Different Places: Asymmetric Realizability in Transplanting Tokenizers across Large Language Models ·
cs.LG· arXiv 2601.00065 · score 7 —large language model, fine-tun - NeuroEdge: Real-Time Hand Gesture Recognition with High-Density EMG Using Deep Learning at the Edge ·
cs.LG· arXiv 2605.29326 · score 7 —rag, inference, latency - Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics ·
cs.LG· arXiv 2605.29351 · score 7 —inference, attention, transformer - AsymVLM: Asymmetric Token Pruning for Efficient Vision-Language Model Inference ·
cs.LG· arXiv 2605.29535 · score 7 —llm, inference - STAP: A Shuffle-Tokenized App Predictor with Ultra Long Context for Vocabulary-Free Mobile App Prediction ·
cs.LG· arXiv 2605.29863 · score 7 —inference, transformer, latency - CLUBench: A Clustering Benchmark ·
cs.LG· arXiv 2605.29933 · score 7 —large language model, rag - Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables ·
cs.LG· arXiv 2605.30229 · score 7 —inference, attention, transformer - Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research ·
stat.ML· arXiv 2605.29249 · score 7 —inference, serving - FPLIER: Federated Pathway-Level Information Extractor ·
cs.LG· arXiv 2605.29587 · score 7 —inference, distributed training - AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training ·
cs.DC· arXiv 2605.29664 · score 7 —serving, parallelism - Fisher-Preserving Guidance: Training-Free Manifold Constraints for Safe Diffusion Control ·
cs.RO· arXiv 2605.29937 · score 7 —inference, serving - SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation ·
cs.CV· arXiv 2605.30116 · score 7 —inference, serving - DiScoFormer: Plug-In Density and Score Estimation with Transformers ·
cs.LG· arXiv 2511.05924 · score 7 —inference, attention, transformer - Learning to Solve PDEs on Neural Shape Representations ·
cs.LG· arXiv 2512.21311 · score 7 —inference, serving - Transformed Latent Variable Multi-Output Gaussian Processes ·
cs.LG· arXiv 2605.05133 · score 7 —inference, serving - CompilerDream: Learning a Compiler World Model for General Code Optimization ·
cs.PL· arXiv 2404.16077 · score 7 —agent, compiler - Understanding and Reducing Metadata-Driven Host Overheads in Sampling-Based GNN Training ·
cs.DC· arXiv 2605.29346 · score 7 —parallelism, gpu, cuda - BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation ·
cs.AI· arXiv 2605.28994 · score 6 —llm, reasoning - Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents ·
cs.AI· arXiv 2605.29174 · score 6 —agent, rag - Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth ·
cs.AI· arXiv 2605.29234 · score 6 —llm, retrieval - OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories ·
cs.AI· arXiv 2605.29253 · score 6 —agent, fine-tun - Xetrieval: Mechanistically Explaining Dense Retrieval ·
cs.AI· arXiv 2605.29507 · score 6 —retrieval, reasoning, chain-of-thought - Uncertainty-Aware Transfer Learning for Cross-Building Energy Forecasting: Toward Robust and Scalable District-Level Energy Management ·
cs.AI· arXiv 2605.29733 · score 6 —rag, transformer, fine-tun - Accelerating Constrained Decoding with Token Space Compression ·
cs.AI· arXiv 2605.29986 · score 6 —llm, latency - Conformal Certification of Reasoning Trace Prefixes ·
cs.AI· arXiv 2605.30085 · score 6 —reasoning, serving - VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing ·
cs.AI· arXiv 2605.30117 · score 6 —serving, attention - MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection ·
cs.AI· arXiv 2605.30288 · score 6 —llm, post-train - Specialty-Specific Medical Language Model for Immune-Mediated Diseases ·
cs.CL· arXiv 2605.28838 · score 6 —llm, transformer - PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation ·
cs.LG· arXiv 2605.28867 · score 6 —rag, serving - AIRGuard: Guarding Agent Actions with Runtime Authority Control ·
cs.CR· arXiv 2605.28914 · score 6 —agent, reasoning - MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs ·
cs.CL· arXiv 2605.29300 · score 6 —llm, fine-tun - TRACER: Persistent Regularization for Robust Multimodal Finetuning ·
cs.LG· arXiv 2605.29380 · score 6 —rag, serving - Composing Non-Conjugate Factor Graphs with Closed-Form Variational Inference ·
cs.LG· arXiv 2605.29467 · score 6 —inference, mixture of experts - PhoneWorld: Scaling Phone-Use Agent Environments ·
cs.CL· arXiv 2605.29486 · score 6 —agent, rag - Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization ·
cs.LG· arXiv 2605.29547 · score 6 —serving, quantization - COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings ·
cs.SD· arXiv 2605.29628 · score 6 —retrieval, serving - Personalized Turn-Level User Conversation Satisfaction Benchmark ·
cs.CL· arXiv 2605.29711 · score 6 —llm, retrieval - Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions ·
cs.CL· arXiv 2605.29738 · score 6 —llm, reasoning - A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging ·
eess.IV· arXiv 2605.29753 · score 6 —rag, serving - Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs ·
cs.HC· arXiv 2605.29928 · score 6 —llm, reasoning - Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders ·
cs.CL· arXiv 2605.30022 · score 6 —retrieval, attention, transformer - Do Language Models Track Entities Across State Changes? ·
cs.CL· arXiv 2605.30233 · score 6 —rag, reasoning, transformer - Reinforcement Learning with Robust Rubric Rewards ·
cs.CV· arXiv 2605.30244 · score 6 —llm, reasoning - Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models ·
cs.AI· arXiv 2604.10219 · score 6 —rag, reasoning, attention - Human-Guided Harm Recovery for Computer Use Agents ·
cs.AI· arXiv 2604.18847 · score 6 —agent, rag - Dataset-Driven Channel Masks in Transformers for Multivariate Time Series ·
cs.LG· arXiv 2410.23222 · score 6 —rag, attention, transformer - Obfuscation Rules for Detecting and Detoxifying Korean Toxicity ·
cs.CL· arXiv 2510.10961 · score 6 —llm, attention - Topological Order in Neural Wavefunctions ·
cs.AI· arXiv 2512.01863 · score 6 —llm, attention - The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation ·
cs.IR· arXiv 2512.10388 · score 6 —serving, quantization - Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought ·
cs.CL· arXiv 2603.05488 · score 6 —reasoning, chain-of-thought, attention - SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems ·
cs.CR· arXiv 2604.06811 · score 6 —agent, rag - Intent-aligned Autonomous Spacecraft Guidance via Reasoning Models ·
eess.SY· arXiv 2604.17176 · score 6 —reasoning, serving - SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction ·
cs.CL· arXiv 2605.23440 · score 6 —llm, rag - The Alignment Floor: How Persona Customization Breaks Safety in Weakly-Aligned LLMs ·
cs.HC· arXiv 2605.27382 · score 6 —llm, rlhf - From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale ·
cs.CL· arXiv 2605.28826 · score 6 —llm, rlhf - Slogans or Stance? A Label-Light Diagnostic for Entrepreneurial-Discourse Measurement on Chinese SOE Speeches ·
cs.CL· arXiv 2605.29188 · score 6 —llm, rag - LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents ·
cs.CL· arXiv 2605.29559 · score 6 —agent, fine-tun - A Dual-Path Architecture for Scaling Compute and Capacity in LLMs ·
cs.CL· arXiv 2605.30202 · score 6 —llm, transformer - COMPOSE: Composing Future Theorems from Citations and Formal Structure ·
cs.CL· arXiv 2605.30333 · score 6 —llm, retrieval - RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains ·
cs.LG· arXiv 2605.29156 · score 6 —llm, post-train - Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents ·
cs.CV· arXiv 2605.29447 · score 6 —agent, fine-tun - GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases ·
cs.IR· arXiv 2605.30237 · score 6 —retrieval, rag, fine-tun - Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning ·
cs.CL· arXiv 2508.19202 · score 6 —llm, reasoning - Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context ·
cs.CL· arXiv 2510.06182 · score 6 —retrieval, rag, reasoning - MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark ·
cs.CL· arXiv 2601.04633 · score 6 —rag, reasoning, fine-tun - Do not be greedy, Think Twice: Sampling and Selection for Document-level Information Extraction ·
cs.CL· arXiv 2601.18395 · score 6 —llm, reasoning - Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs ·
cs.CL· arXiv 2603.27518 · score 6 —llm, transformer - TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script) ·
cs.CL· arXiv 2605.04583 · score 6 —rag, serving - Beyond Transcripts: A Renewed Perspective on Audio Chaptering ·
cs.SD· arXiv 2602.08979 · score 6 —llm, rag - FedQHD: Closed-Form Function-Space Federated Reinforcement Learning ·
cs.LG· arXiv 2605.29002 · score 6 —agent, rag - Apertus LLM Family Expansion via Distillation and Quantization ·
cs.LG· arXiv 2605.29128 · score 6 —llm, quantization - MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding ·
cs.LG· arXiv 2605.29850 · score 6 —rag, attention, transformer - Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching ·
cs.LG· arXiv 2605.30337 · score 6 —llm, retrieval - An End-to-End PyTorch Interface for Differentiable PDE Solvers: A RANS Model-Correction Study ·
cs.CE· arXiv 2605.28858 · score 6 —llm, rag - Offline Multi-agent Reinforcement Learning via Sequential Score Decomposition ·
cs.LG· arXiv 2505.05968 · score 6 —multi-agent, rag - In-Place Feedback: Reliable Refinement for Multi-Turn Expert-LLM Collaboration ·
cs.LG· arXiv 2510.00777 · score 6 —llm, reasoning - Optimization and Generation in Aerodynamics Inverse Design ·
cs.LG· arXiv 2602.03582 · score 6 —rag, serving - Advancing multi-site emission control: A physics-informed transfer learning framework with mixture of experts for carbon-pollutant synergy ·
cs.LG· arXiv 2604.26571 · score 6 —mixture of experts, moe - SMolLM: Small Language Models Learn Small Molecular Grammar ·
cs.LG· arXiv 2605.06322 · score 6 —llm, transformer - TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection ·
cs.LG· arXiv 2605.08870 · score 6 —rag, serving - Faster Molecular Dynamics with Neural Network Potentials via Distilled Multiple Time-Stepping and Non-Conservative Forces ·
cs.LG· arXiv 2602.14975 · score 6 —serving, fine-tun - LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol ·
eess.IV· arXiv 2603.14644 · score 6 —serving, transformer - IORM: Hierarchical I/O Governance for Thousands of Consolidated Databases on Oracle Exadata ·
cs.DB· arXiv 2605.29006 · score 6 —rag, scheduler, latency - Trends in AI and Human-AI Interaction in Clinical Trials – A Hybrid Human-AI Exploration ·
cs.AI· arXiv 2605.29096 · score 5 —large language model - Context Distillation as Latent Memory Management ·
cs.LG· arXiv 2605.28889 · score 5 —retrieval, inference - The Hamilton-Jacobi Theory of Deep Learning ·
cs.LG· arXiv 2605.28983 · score 5 —inference, transformer - GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection ·
cs.CV· arXiv 2605.29539 · score 5 —inference, fine-tun - EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL ·
cs.CL· arXiv 2605.29670 · score 5 —rag, inference - CB-SLICE: Concept-Based Interpretable Error Slice Discovery ·
cs.LG· arXiv 2605.29836 · score 5 —rag, inference - Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion ·
stat.ML· arXiv 2605.30319 · score 5 —rag, inference - You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention ·
cs.AI· arXiv 2605.27580 · score 5 —inference, attention - Relational In-Context Learning via Synthetic Pre-training with Structural Prior ·
cs.LG· arXiv 2603.03805 · score 5 —reasoning, inference - Rethinking Stepwise Model Routing: A Cost-Efficient Table Reasoning Perspective ·
cs.CL· arXiv 2605.29319 · score 5 —reasoning, inference - Evaluating Cross-lingual Knowledge Consistency in Code-Mixed vis-a-vis Indian Languages using IndicKLAR ·
cs.CL· arXiv 2605.29637 · score 5 —large language model - ExCAM: Explainable Cultural Awareness Metrics ·
cs.CL· arXiv 2605.29897 · score 5 —large language model - Causal Interventions on Continuous Variables: A Case Study on Verb Bias in Steering Vectors for In-Context Learning ·
cs.CL· arXiv 2605.29971 · score 5 —large language model - Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization ·
cs.CL· arXiv 2604.13197 · score 5 —reasoning, inference - Moment Matching Q-Learning ·
cs.LG· arXiv 2605.29033 · score 5 —inference, latency - Deep Adaptive Dimension Reduction for Bayesian Inference in Inverse Problems ·
cs.LG· arXiv 2605.29373 · score 5 —inference, fine-tun - A Full-Pipeline Framework for Evaluating Membership Inference Attacks in Machine Learning ·
cs.LG· arXiv 2605.29454 · score 5 —inference, post-train - A Geometric View of SRC: Learning Representations for Stable Residual Inference ·
cs.LG· arXiv 2605.29673 · score 5 —rag, inference - CRB-Guided Framework Design and Resource Allocation for Indoor mmWave ISCC Systems ·
cs.IT· arXiv 2605.29939 · score 5 —inference, latency - TraceCodec: A Compiler-Backed Neural Codec for Stateful Multi-Flow Network Traffic Traces ·
cs.NI· arXiv 2605.29941 · score 5 —rag, compiler - Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series ·
stat.ML· arXiv 2605.30292 · score 5 —rag, inference - KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks ·
cs.LG· arXiv 2411.00278 · score 5 —rag, inference - Diffusion-based learning framework for Constrained Nonconvex Optimization with Weighted Bootstrapped Refinement ·
cs.LG· arXiv 2502.10330 · score 5 —rag, inference - Solved in Unit Domain: JacobiNet for Differentiable Coordinate-Transformed PINNs ·
cs.LG· arXiv 2508.02537 · score 5 —rag, inference - Routing by Reaching: Composition of Pre-trained GFlowNets for Multi-Objective Generation ·
cs.LG· arXiv 2602.21565 · score 5 —inference, fine-tun - Accelerating trajectory optimization with Sobolev-trained diffusion policies ·
cs.LG· arXiv 2604.19011 · score 5 —inference, latency - Order-Agnostic Autoregressive Modelling with Missing Data ·
cs.LG· arXiv 2605.06355 · score 5 —rag, inference - Stage-wise Distortion-Perception Traversal in Zero-shot Inverse Problems with Diffusion Models ·
cs.LG· arXiv 2605.28711 · score 5 —rag, inference - Noise-Aware Differentially Private Variational Inference ·
stat.ML· arXiv 2410.19371 · score 5 —rag, inference - MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation ·
stat.ML· arXiv 2604.05446 · score 5 —rag, inference - CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation ·
cs.RO· arXiv 2605.22082 · score 5 —inference, transformer - Stop Suppressing the Tail: Causal Inference for Extreme Events ·
stat.ML· arXiv 2605.27474 · score 5 —rag, inference - Rapid GPU-Based Pangenome Graph Layout ·
cs.DC· arXiv 2409.00876 · score 5 —parallelism, gpu - Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction ·
cs.AI· arXiv 2605.28849 · score 4 —llm - Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction ·
cs.AI· arXiv 2605.28855 · score 4 —llm - The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling ·
cs.AI· arXiv 2605.28864 · score 4 —transformer, fine-tun - Review Arcade: On the Human Alignment and Gameability of LLM Reviews ·
cs.AI· arXiv 2605.28897 · score 4 —llm - Orthogonal Concept Erasure for Diffusion Models ·
cs.AI· arXiv 2605.28902 · score 4 —serving - Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild ·
cs.AI· arXiv 2605.29018 · score 4 —llm - Differentiable Belief-based Opponent Shaping ·
cs.AI· arXiv 2605.29042 · score 4 —multi-agent - The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure ·
cs.AI· arXiv 2605.29087 · score 4 —reasoning, chain-of-thought - PRO-CUA: Process-Reward Optimization for Computer Use Agents ·
cs.AI· arXiv 2605.29119 · score 4 —agent - Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark ·
cs.AI· arXiv 2605.29400 · score 4 —reasoning, fine-tun - ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control ·
cs.AI· arXiv 2605.29425 · score 4 —serving - CrystalXRD-Bench: Benchmarking Vision-Language Models for XRD Peak Indexing Across Diverse Crystalline Materials ·
cs.AI· arXiv 2605.29446 · score 4 —rag, reasoning - HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering ·
cs.AI· arXiv 2605.29606 · score 4 —retrieval, rag - Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures ·
cs.AI· arXiv 2605.29629 · score 4 —llm - Benchmarking Positional Encoding Strategies for Transformer-Based EEG Foundation Models ·
cs.AI· arXiv 2605.29754 · score 4 —transformer, fine-tun - Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk ·
cs.AI· arXiv 2605.29788 · score 4 —agent - RAISE: RAG Design as an Architecture Search Problem ·
cs.AI· arXiv 2605.30029 · score 4 —retrieval, rag - BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders ·
cs.AI· arXiv 2605.30162 · score 4 —rag, fine-tun - Persona Conditioning of Brand Recommendations in Retrieval-Augmented Commercial Chat: A Prominence-Stratified Cross-Provider Audit ·
cs.AI· arXiv 2605.30207 · score 4 —retrieval, rag - Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection ·
cs.AI· arXiv 2605.30344 · score 4 —reasoning, fine-tun - Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software ·
cs.AI· arXiv 2605.30353 · score 4 —agent - Self-Play Reinforcement Learning under Imperfect Information in Big 2 ·
cs.LG· arXiv 2605.28863 · score 4 —agent - Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision ·
cs.LG· arXiv 2605.28865 · score 4 —agent - TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models ·
cs.LG· arXiv 2605.28868 · score 4 —retrieval, rag - Representation Alignment Rests on Linear Structure ·
cs.LG· arXiv 2605.28870 · score 4 —llm - Quantum-Enhanced Adversarial Robustness in Artificial Intelligence ·
cs.CR· arXiv 2605.28899 · score 4 —ai system - Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening ·
cs.CR· arXiv 2605.28999 · score 4 —llm - Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning ·
cs.LG· arXiv 2605.29028 · score 4 —rag, fine-tun - Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG ·
cs.CL· arXiv 2605.29084 · score 4 —retrieval, rag - When and How Long? The Readout-Mediator Angle in Temporal Reasoning ·
cs.LG· arXiv 2605.29126 · score 4 —reasoning, attention - Evolutionary Refinement of Generative Graph Topologies: A Hybrid WGAN-GA Approach ·
cs.LG· arXiv 2605.29161 · score 4 —serving - Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits ·
cs.CL· arXiv 2605.29268 · score 4 —llm - Does Distributed Training Undermine Compute Governance? ·
cs.CY· arXiv 2605.29359 · score 4 —distributed training - Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies ·
cs.IR· arXiv 2605.29384 · score 4 —retrieval, rag - On the Optimizer Dependence of Neural Scaling Laws ·
cs.LG· arXiv 2605.29387 · score 4 —llm - How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions ·
cs.SE· arXiv 2605.29442 · score 4 —agent - Honest Lying: Understanding Memory Confabulation in Reflexive Agents ·
cs.LG· arXiv 2605.29463 · score 4 —agent - AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling ·
cs.CV· arXiv 2605.29488 · score 4 —rag, transformer - Brain-IT-VQA: From Brain Signals to Answers ·
cs.CV· arXiv 2605.29588 · score 4 —rag, transformer - Energy-Aware NECO for Single-Pass Pixel-wise Out-of-Distribution Detection in Semantic Segmentation ·
cs.CV· arXiv 2605.29773 · score 4 —serving - Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions ·
eess.AS· arXiv 2605.29862 · score 4 —serving - Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate ·
cs.CL· arXiv 2605.29889 · score 4 —llm - Genetically Aligned Patient Representations Improve Hematological Diagnosis ·
cs.CV· arXiv 2605.29980 · score 4 —retrieval, transformer - Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation ·
cs.SD· arXiv 2605.30031 · score 4 —reasoning, latency - Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models ·
cs.LG· arXiv 2605.30038 · score 4 —fine-tun, post-train - REPOT: Recoverable Program-of-Thought via Checkpoint Repair ·
cs.SE· arXiv 2605.30052 · score 4 —llm - xModel-KD: Cross-modal Knowledge Distillation for 3D Scene Perception using LiDAR ·
cs.CV· arXiv 2605.30111 · score 4 —retrieval, rag - iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis ·
cs.LG· arXiv 2605.30179 · score 4 —llm - PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions ·
cs.CV· arXiv 2605.30268 · score 4 —agent - Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments ·
cs.RO· arXiv 2605.30280 · score 4 —rag, reasoning - Archon: A Unified Multimodal Model for Holistic Digital Human Generation ·
cs.CV· arXiv 2605.30311 · score 4 —serving - Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes ·
cs.GR· arXiv 2605.30318 · score 4 —llm - Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure ·
cs.AI· arXiv 2602.08783 · score 4 —reasoning, chain-of-thought - FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse Autoformalization ·
cs.AI· arXiv 2603.19828 · score 4 —llm - When Models Learn to Ask Why: Adaptive Causal Reasoning for Trustworthy Medical Vision-Language Models ·
cs.AI· arXiv 2603.23085 · score 4 —reasoning, chain-of-thought - SVSR: A Self-Verification and Self-Rectification Paradigm for Multimodal Reasoning ·
cs.AI· arXiv 2604.10228 · score 4 —reasoning, fine-tun - Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents ·
cs.AI· arXiv 2604.11088 · score 4 —agent - NOVA: Fundamental Limits of Knowledge Discovery Through AI ·
cs.AI· arXiv 2605.15219 · score 4 —ai system - AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence ·
cs.AI· arXiv 2605.21739 · score 4 —llm - MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV Generation Forecasting ·
cs.LG· arXiv 2306.10356 · score 4 —attention, transformer - Crafting Desirable Climate Trajectories with RL Explored Socio-Environmental Simulations ·
cs.AI· arXiv 2410.07287 · score 4 —agent - VRAG: Learning World Models for Interactive Video Generation ·
cs.CV· arXiv 2505.21996 · score 4 —retrieval, rag - Online Fair Division with Additional Information ·
cs.GT· arXiv 2505.24503 · score 4 —agent - Position: Text Embeddings Should Capture Implicit Semantics, Not Just Surface Meaning ·
cs.CL· arXiv 2506.08354 · score 4 —rag, reasoning - Finding DoRI: Discovery of Retained Images in Diffusion Models ·
cs.CV· arXiv 2507.16880 · score 4 —rag, fine-tun - Scalable RF Simulation in Generative 4D Worlds ·
cs.CV· arXiv 2508.12176 · score 4 —serving - Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy ·
cs.LG· arXiv 2509.21190 · score 4 —rag, transformer - ScheduleStream: Temporal Planning with Samplers for GPU-Accelerated Multi-Arm Task and Motion Planning & Scheduling ·
cs.RO· arXiv 2511.04758 · score 4 —rag, gpu - Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom ·
cs.LG· arXiv 2511.11703 · score 4 —agent - Revisiting the Reliability of Language Models in Instruction-Following ·
cs.SE· arXiv 2512.14754 · score 4 —llm - HD-Prot: A Protein Language Model for Joint Sequence-Structure Modeling with Continuous Structure Tokens ·
cs.CE· arXiv 2512.15133 · score 4 —quantization, fine-tun - NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning ·
cs.LG· arXiv 2601.19947 · score 4 —serving - Learn from A Rationalist: Distilling Intermediate Interpretable Rationales ·
cs.LG· arXiv 2601.22531 · score 4 —attention, transformer - AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing ·
cs.CL· arXiv 2603.23069 · score 4 —serving - AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation ·
cs.SE· arXiv 2605.12925 · score 4 —agent - EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents ·
cs.SD· arXiv 2605.13841 · score 4 —agent - Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate ·
cs.LG· arXiv 2605.25134 · score 4 —serving - QuITE: Query-Based Irregular Time Series Embedding ·
cs.LG· arXiv 2605.28166 · score 4 —rag, attention - What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs ·
cs.CL· arXiv 2605.28823 · score 4 —llm - A Modular Architecture for Typologically Controlled Lexicon Generation ·
cs.CL· arXiv 2605.28824 · score 4 —llm - Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models ·
cs.CL· arXiv 2605.28913 · score 4 —reasoning, chain-of-thought - Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization ·
cs.CL· arXiv 2605.29274 · score 4 —llm - Accommodation Goes Both Ways: Studying Linguistic Convergence Between Humans and Language Models ·
cs.CL· arXiv 2605.29278 · score 4 —llm - STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments ·
cs.CL· arXiv 2605.29324 · score 4 —agent - A Study on Question-Answer Dataset for LLM Safety Evaluation with a Focus on Illegal Activities ·
cs.CL· arXiv 2605.29340 · score 4 —llm - BrahmicTokenizer-131K: An Indic-Capable Drop-In Replacement for o200k_base ·
cs.CL· arXiv 2605.29379 · score 4 —serving - Scaling Laws for Agent Harnesses via Effective Feedback Compute ·
cs.CL· arXiv 2605.29682 · score 4 —agent - Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking ·
cs.CL· arXiv 2605.30107 · score 4 —retrieval, rag - CorPipe at CRAC 2026: Empty Nodes and Cross-Lingual Transfer in Multilingual Coreference Resolution ·
cs.CL· arXiv 2605.30133 · score 4 —llm - Resolution Diagnostics for Paired LLM Evaluation ·
cs.CL· arXiv 2605.30315 · score 4 —llm - Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence ·
cs.SE· arXiv 2605.29054 · score 4 —agent - Offloading Score: Measuring AI Reliance Through Counterfactual Workflows ·
cs.SE· arXiv 2605.29392 · score 4 —agent - DiffSpot: Can VLMs Spot Fine-Grained Visual Differences in Web Interfaces? ·
cs.CV· arXiv 2605.29615 · score 4 —agent - How’s it going? Reinforcement learning in language models recruits a functional welfare axis ·
cs.LG· arXiv 2605.30232 · score 4 —fine-tun, post-train - VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents ·
cs.CV· arXiv 2605.30256 · score 4 —agent - Interactive In-Meeting Speaker Correction with Human Feedback ·
cs.CL· arXiv 2509.18377 · score 4 —llm - The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs ·
cs.CL· arXiv 2601.03134 · score 4 —llm - One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them ·
cs.LG· arXiv 2605.28839 · score 4 —attention, transformer - Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning ·
cs.LG· arXiv 2605.28990 · score 4 —rag, fine-tun - Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning ·
cs.LG· arXiv 2605.29032 · score 4 —agent - Parallel Adaptive Multi-Objective Evolutionary Learning of Discretized Bayesian Network Classifiers for Clinical Data ·
cs.LG· arXiv 2605.29058 · score 4 —serving - Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules ·
cs.LG· arXiv 2605.29075 · score 4 —llm - Solving Integer Linear Programming with Parallel Tempering ·
cs.LG· arXiv 2605.29366 · score 4 —serving - Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging ·
cs.LG· arXiv 2605.29489 · score 4 —llm - Cluster-Level Attention-Guided Parallel Decoding for Masked Diffusion Language Models ·
cs.LG· arXiv 2605.29607 · score 4 —reasoning, attention - M=oLe-{\Lambda}: Learning the Coupled-Cluster Response State for Energies, Gradients, and Properties ·
cs.LG· arXiv 2605.29622 · score 4 —serving - Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames ·
cs.LG· arXiv 2605.29634 · score 4 —attention, transformer - Momentum Based Reward Design for Low Emission Traffic Signal Control ·
cs.LG· arXiv 2605.29693 · score 4 —rag, throughput - MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated Fusion ·
cs.LG· arXiv 2605.29765 · score 4 —llm - Open Problem: Separating Geometric and Algorithmic Compression via Cayley-Table Completion ·
cs.LG· arXiv 2605.29885 · score 4 —serving - Reducing Experimental Testing in Space Propulsion Film Cooling Analyses by Pixelwise Generative Image Interpolation ·
cs.LG· arXiv 2605.29911 · score 4 —serving - A Fully Convolutional Approach to Denoising Structural Dynamics Data from X-Ray Photon Correlation Spectroscopy ·
cs.LG· arXiv 2605.29975 · score 4 —serving - Improving Adversarial Robustness of Attribution via Implicit Regularization ·
cs.LG· arXiv 2605.29983 · score 4 —attention, transformer - RL2ML: Finite-Rollout Surrogate Objectives from Reinforcement Learning to Maximum Likelihood ·
cs.LG· arXiv 2605.30154 · score 4 —serving - Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents ·
cs.LG· arXiv 2605.30190 · score 4 —agent - Digitally enriching a screening population for pancreatic cancer using routine blood-based measures and clinical histories ·
cs.LG· arXiv 2605.30275 · score 4 —attention, transformer - Towards a Foundation Model for the Martian Atmosphere ·
cs.LG· arXiv 2605.28851 · score 4 —retrieval, rag - Eulerian Gaussian Splatting using Hashed Probability Pyramids ·
cs.CV· arXiv 2605.29136 · score 4 —serving - Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion ·
cs.SD· arXiv 2605.29531 · score 4 —attention, fine-tun - EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation ·
cs.CV· arXiv 2605.29977 · score 4 —reasoning, attention - Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance ·
cs.RO· arXiv 2605.30056 · score 4 —rag, attention - Privacy-Enhanced Zero-Order Federated Learning via xMK-CKKS over Wireless Channels ·
cs.CR· arXiv 2605.30123 · score 4 —serving - SAHG: Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection ·
cs.SI· arXiv 2605.30166 · score 4 —llm - Looking around you: external information enhances representations for event sequences ·
cs.LG· arXiv 2502.10205 · score 4 —attention, fine-tun - Multi-level Collaborative Distillation Meets Global Workspace Model: A Unified Framework for OCIL ·
cs.LG· arXiv 2508.08677 · score 4 —serving - Horizon Activation Mapping for Neural Networks in Time Series Forecasting ·
cs.LG· arXiv 2601.02094 · score 4 —rag, attention - Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations ·
cs.LG· arXiv 2602.01456 · score 4 —serving - Size Transferability of Graph Transformers with Convolutional Positional Encodings ·
cs.LG· arXiv 2602.15239 · score 4 —attention, transformer - Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models ·
cs.LG· arXiv 2602.19619 · score 4 —llm - Statistical Consistency and Generalization of Contrastive Representation Learning ·
cs.LG· arXiv 2605.02116 · score 4 —retrieval, attention - Building a privacy-preserving Federated Recommender system for mobile devices ·
cs.LG· arXiv 2605.22924 · score 4 —serving - On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series ·
cs.LG· arXiv 2605.26194 · score 4 —attention, transformer - Density-aware Sample-specific Attack ·
cs.LG· arXiv 2605.27809 · score 4 —fine-tun, post-train - Adversarial Robustness in One-Stage Learning-to-Defer ·
stat.ML· arXiv 2510.10988 · score 4 —serving - Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds ·
cs.CV· arXiv 2510.27391 · score 4 —attention, transformer - Envy-Free Allocation of Indivisible Goods via Noisy Queries ·
cs.GT· arXiv 2602.06361 · score 4 —agent - RAFI – A Ray/Work Forwarding Infrastructure for Data Parallel Multi-Node/Multi-GPU Computing ·
cs.DC· arXiv 2605.30294 · score 4 —gpu, cuda - A Quick and Exact Method for Distributed Quantile Computation ·
cs.DC· arXiv 2511.12025 · score 4 —rag, latency - A Secure, Manifest-Based Framework for Delegated Privilege Promotion ·
cs.CR· arXiv 2605.28991 · score 4 —serving - LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers ·
cs.LG· arXiv 2605.29005 · score 3 —inference - A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router ·
math.DS· arXiv 2605.29121 · score 3 —moe - Stochastic Lifting for Generating Trajectories of Stochastic Physical Systems ·
cs.LG· arXiv 2605.29194 · score 3 —inference - Causal Disentanglement-Inspired Degradation Representation Learning for Full-Reference Image Quality Assessment ·
cs.CV· arXiv 2604.21654 · score 3 —inference - Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification ·
stat.ML· arXiv 2605.12208 · score 3 —inference - Auditing Training Data in Generative Music Models via Black-Box Membership Inference ·
cs.LG· arXiv 2605.29202 · score 3 —inference - From Short Histories to Long Futures: Horizon-Aware Graph Neural Networks for Long Horizon Forecasting ·
cs.LG· arXiv 2605.29952 · score 3 —inference - Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption ·
cs.LG· arXiv 2605.30089 · score 3 —inference - When, why, and how do diffusion posterior samplers fail? A finite-sample lens ·
cs.LG· arXiv 2605.30330 · score 3 —inference - Mixing Vector Model for Copolymer Inference via Mixed Integer Linear Programming ·
cs.LG· arXiv 2605.29329 · score 3 —inference - Wasserstein Contraction of Coordinate Ascent Variational Inference ·
stat.ML· arXiv 2605.30253 · score 3 —inference - Cooperative Variance Estimation and Bayesian Neural Networks for Disentangling Aleatoric and Epistemic Uncertainties ·
cs.LG· arXiv 2505.02743 · score 3 —inference - Adaptive Exponential Integration for Stable Gaussian Mixture Black-Box Variational Inference ·
cs.LG· arXiv 2601.14855 · score 3 —inference - Riemannian AmbientFlow: Towards Simultaneous Manifold Learning and Generative Modeling from Corrupted Data ·
cs.LG· arXiv 2601.18728 · score 3 —inference - Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning ·
cs.LG· arXiv 2605.01663 · score 3 —inference - Uncertainty Estimation via Hyperspherical Confidence Mapping ·
cs.LG· arXiv 2605.05964 · score 3 —inference - Inpainting physics: self-supervised learning for context-driven fluid simulation ·
cs.LG· arXiv 2605.08832 · score 3 —inference - Matryoshka Concept Bottleneck Models ·
cs.LG· arXiv 2605.20612 · score 3 —inference - Enhancing Membership Inference Attacks on Diffusion Models from a Frequency-Domain Perspective ·
cs.CR· arXiv 2505.20955 · score 3 —inference - Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models ·
stat.ML· arXiv 2605.28488 · score 3 —inference - Constant Depth Threshold Circuits For Exhaustive Epistasis Detection ·
cs.AR· arXiv 2605.29719 · score 3 —parallelism - Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics ·
cs.AI· arXiv 2605.29078 · score 2 —rag - Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI ·
cs.AI· arXiv 2605.29240 · score 2 —attention - Rubric-Guided Process Reward for Stepwise Model Routing ·
cs.AI· arXiv 2605.29310 · score 2 —reasoning - Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet ·
cs.AI· arXiv 2605.29358 · score 2 —transformer - Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion ·
cs.AI· arXiv 2605.29591 · score 2 —reasoning - FHRFormer: A Self-Supervised Masked Transformer Framework for Fetal Heart Rate Time-Series Inpainting and Forecasting ·
cs.AI· arXiv 2605.29695 · score 2 —transformer - From XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving Networks ·
cs.AI· arXiv 2605.29768 · score 2 —retrieval - Quantifying and Optimizing Simplicity via Polynomial Representations ·
cs.AI· arXiv 2605.29823 · score 2 —fine-tun - On the Geometry of Games and their Solvers ·
cs.AI· arXiv 2605.29919 · score 2 —rag - A comparative study of transformer-based embeddings for topic coherence ·
cs.CL· arXiv 2605.28832 · score 2 —transformer - Transcribing Children’s Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions ·
cs.CL· arXiv 2605.28833 · score 2 —fine-tun - FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks ·
cs.LG· arXiv 2605.29001 · score 2 —reasoning - Multi-Resolution End-to-End Deep Neural Network for Optimizing Latency-Accuracy Tradeoff in Autonomous Driving ·
cs.RO· arXiv 2605.29138 · score 2 —latency - Toward Ethical Facial Age Estimation: A Generalized Zero-Shot Benchmark Without Training on Children’s Data ·
cs.CV· arXiv 2605.29230 · score 2 —rag - Extreme dynamic symmetry enables omnidirectional and multifunctional robots ·
cs.RO· arXiv 2605.29254 · score 2 —rag - KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs ·
cs.LG· arXiv 2605.29259 · score 2 —rag - Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts ·
cs.LG· arXiv 2605.29283 · score 2 —rag - DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework ·
cs.AI· arXiv 2605.29428 · score 2 —gpu - How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions ·
cs.LG· arXiv 2605.29448 · score 2 —rag - Evolutionary Rule Extraction from Corporate Default Prediction Models ·
cs.NE· arXiv 2605.29478 · score 2 —rag - Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection ·
cs.CR· arXiv 2605.29526 · score 2 —rag - Data filtering methods for training language models ·
cs.CL· arXiv 2605.29807 · score 2 —fine-tun - Evaluating Skill and Stability of ArchesWeather and ArchesWeatherGen under Multi-Decadal Climate Simulations ·
cs.AI· arXiv 2605.29976 · score 2 —rag - Test Time Training for Supervised Causal Learning ·
cs.LG· arXiv 2605.30015 · score 2 —rag - Masked Diffusion Modeling for Anomaly Detection ·
cs.LG· arXiv 2605.30046 · score 2 —rag - A Predictive Law for On-Policy Self-Distillation From World Feedback ·
cs.LG· arXiv 2605.30070 · score 2 —post-train - Self-Trained Verification for Training- and Test-Time Self-Improvement ·
cs.LG· arXiv 2605.30290 · score 2 —reasoning - Reasoning with Sampling: Cutting at Decision Points ·
cs.LG· arXiv 2605.30327 · score 2 —reasoning - TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech ·
cs.AI· arXiv 2601.11178 · score 2 —reasoning - Recurrent Structural Policy Gradient for Partially Observable Mean Field Games ·
cs.AI· arXiv 2602.20141 · score 2 —rag - Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases ·
cs.AI· arXiv 2603.07916 · score 2 —rag - A Foundation Model for Zero-Shot Logical Rule Induction ·
cs.AI· arXiv 2605.04916 · score 2 —reasoning - Learning A Simulation-based Visual Policy for Real-world Peg In Unseen Holes ·
cs.RO· arXiv 2205.04297 · score 2 —fine-tun - A Composable Multimodal Framework for cine CMR-Text-Driven Prediction of Heart Failure Outcomes ·
cs.LG· arXiv 2502.16548 · score 2 —rag - Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data ·
cs.SD· arXiv 2502.20838 · score 2 —rag - Taming Data Challenges in ML-based Security Tasks Using Generative AI ·
cs.CR· arXiv 2507.06092 · score 2 —attention - MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models ·
cs.CV· arXiv 2507.09574 · score 2 —attention - Page image classification for content-specific data processing ·
cs.IR· arXiv 2507.21114 · score 2 —rag - Approximate Proportionality in Online Fair Division ·
cs.GT· arXiv 2508.03253 · score 2 —attention - The Impact of Semantic Pairs on Self-Supervised Representation Learning ·
cs.LG· arXiv 2510.08722 · score 2 —rag - MiAD: Mirage Atom Diffusion for De Novo Crystal Generation ·
cs.LG· arXiv 2511.14426 · score 2 —rag - Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach ·
cs.CV· arXiv 2511.19316 · score 2 —fine-tun - BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models ·
cs.LG· arXiv 2512.00283 · score 2 —rag - E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving ·
cs.CV· arXiv 2512.04733 · score 2 —reasoning - Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models ·
cs.LG· arXiv 2601.14758 · score 2 —post-train - S-MARC: Causal Streaming Reasoning for Full-Duplex Conversational Behavior Modeling ·
cs.CL· arXiv 2602.11065 · score 2 —reasoning - OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model ·
cs.SD· arXiv 2602.12304 · score 2 —attention - Post-Training Language Models for Crosslingual Consistency ·
cs.CL· arXiv 2603.04678 · score 2 —post-train - BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps ·
cs.SD· arXiv 2604.19532 · score 2 —transformer - MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio ·
cs.SD· arXiv 2605.00969 · score 2 —reasoning - Aes3D: Aesthetic Assessment in 3D Gaussian Splatting ·
cs.CV· arXiv 2605.05155 · score 2 —attention - AttenA+: Rectifying Action Inequality in Robotic Foundation Models ·
cs.RO· arXiv 2605.13548 · score 2 —attention - The Distillation Game: Adaptive Attacks & Efficient Defenses ·
cs.LG· arXiv 2605.22737 · score 2 —reasoning - Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery ·
cs.CV· arXiv 2605.24460 · score 2 —rag - HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos ·
cs.RO· arXiv 2605.24934 · score 2 —rag - Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4 ·
cs.LO· arXiv 2605.25556 · score 2 —rag - Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection ·
cs.LG· arXiv 2605.26193 · score 2 —rag - ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation ·
cs.LG· arXiv 2605.28293 · score 2 —rag - From Data to Insights: Exploring Program-of-Thoughts Prompting for Chart Summarization ·
cs.CL· arXiv 2605.28874 · score 2 —reasoning - Prompt-Level Reward Specifications for Open-Ended Post-Training ·
cs.CL· arXiv 2605.29275 · score 2 —post-train - Attention Asymmetry in AI Layoff Discourse on X: A Computational Analysis of Capital vs Labour Amplification ·
cs.CL· arXiv 2605.29367 · score 2 —attention - World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models ·
cs.CL· arXiv 2605.29585 · score 2 —reasoning - Metric-Dependent Annotation Saturation for Learning from Label Distributions ·
cs.CL· arXiv 2605.29797 · score 2 —fine-tun - Early Detection of Misinformation for Infodemic Management: A Domain Adaptation Approach ·
cs.CL· arXiv 2406.10238 · score 2 —rag - What Exactly do Children Receive in Language Acquisition? A Case Study on CHILDES with Automated Detection of Filler-Gap Dependencies ·
cs.CL· arXiv 2603.02082 · score 2 —rag - X-GS: An Extensible Framework for Perceiving and Thinking via 3D Gaussian Splatting ·
cs.CV· arXiv 2603.09632 · score 2 —rag - Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit ·
cs.LG· arXiv 2605.28873 · score 2 —quantization - Spectral Guidance for Flexible and Efficient Control of Diffusion Models ·
cs.LG· arXiv 2605.28900 · score 2 —rag - Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System ·
cs.LG· arXiv 2605.28909 · score 2 —gpu - Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems ·
cs.LG· arXiv 2605.28912 · score 2 —rag - Designing Active Tether-Net Systems for Space Debris Capture with Graph-Learning-Aided Mixed-Combinatorial Optimization ·
cs.LG· arXiv 2605.29021 · score 2 —fine-tun - Model Merging by Output-Space Projection ·
cs.LG· arXiv 2605.29101 · score 2 —fine-tun - Bridging Chemists and AI: An Expert-Augmented Framework for Interpretable Route Evaluation ·
cs.LG· arXiv 2605.29108 · score 2 —fine-tun - PROTOCOL: Late Interaction Retrieval for Protein Homolog Search ·
cs.LG· arXiv 2605.29158 · score 2 —retrieval - Traditional machine learning vs. deep learning from dynamic graph representations of proteins’ 3D folds in the task of protein structure classification ·
cs.LG· arXiv 2605.29228 · score 2 —rag - Robust Frequency-Calibrated Virtual EEG Channel Generation from Four Frontal Electrodes for Wearable EEG Augmentation ·
cs.LG· arXiv 2605.29263 · score 2 —attention - Information-Directed Offline-to-Online Reinforcement Learning ·
cs.LG· arXiv 2605.29405 · score 2 —rag - Convex Basins in Single-Index Model Loss Landscapes: Applications to Robust Recovery under Strong Adversarial Corruption ·
cs.LG· arXiv 2605.29497 · score 2 —retrieval - Realistic honeypot evaluations for scheming propensity ·
cs.LG· arXiv 2605.29729 · score 2 —rag - Gated Graph Attention Networks with Learnable Temperature ·
cs.LG· arXiv 2605.29803 · score 2 —attention - OVA-IB: One vs All Information Bottleneck for Multi-Modal Alignment ·
cs.LG· arXiv 2605.29900 · score 2 —retrieval - Treatment-Conditioned Diffusion for Forecasting Neurodegenerative Disease Progression ·
cs.LG· arXiv 2605.29932 · score 2 —transformer - Ridge Regression from Poisson Resetting: A Renewal Perspective on Spectral Regularization ·
cs.LG· arXiv 2605.30059 · score 2 —rag - Q-ANCHOR: Federated Quantum Learning with ZNE-guided Correction ·
cs.LG· arXiv 2605.30075 · score 2 —rag - Chess-World-Model: A 10M-Game Benchmark for Exact State Tracking from Chess Move Sequences ·
cs.LG· arXiv 2605.30100 · score 2 —transformer - Striding Across Reynolds Numbers: Representation Geometry in Neural PDE Generalisation ·
cs.LG· arXiv 2605.30112 · score 2 —retrieval - Learning to Extrapolate to New Tasks: A Relational Approach to Task Extrapolation ·
cs.LG· arXiv 2605.30132 · score 2 —fine-tun - Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts ·
cs.LG· arXiv 2605.30184 · score 2 —transformer - ExDBSCAN: Explaining DBSCAN with Counterfactual Reasoning – Additional Material ·
cs.LG· arXiv 2605.30225 · score 2 —reasoning - Neural Operator-Based Surrogate Model for CFD:Helical Coil Steam Generator in Small Modular Reactor ·
cs.LG· arXiv 2605.30277 · score 2 —rag - WASHH: An Anchor-Aware Whale-Guided Selection Hyper-Heuristic for Continuous Optimization and SVC Configuration ·
cs.NE· arXiv 2605.28844 · score 2 —rag - Financially Guided Deep Portfolio Optimization ·
cs.LG· arXiv 2605.28853 · score 2 —attention - Lightweight Complementary-Cue Fusion for Robust Video Face Forgery Detection ·
cs.CV· arXiv 2605.29092 · score 2 —rag - ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving ·
cs.CR· arXiv 2605.29114 · score 2 —reasoning - Real-Time Retargeting Using Controllability Boundary for Chandrayaan-3 Lunar Landing ·
eess.SY· arXiv 2605.29412 · score 2 —rag - Deep Optimal Individualized Treatment Rules for Bivariate Survival Outcomes via Adaptive Prediction-Powered Learning ·
stat.ML· arXiv 2605.29464 · score 2 —rag - The Complexity of Verifying Feedforward Neural Networks in Quantised Settings ·
cs.CC· arXiv 2605.29537 · score 2 —reasoning - Parameter-Efficient Subspace Decoupling ViT for Mitigating Multi-Task Negative Transfer in Histological Scoring ·
cs.CV· arXiv 2605.29852 · score 2 —transformer - Gesture-Aware Indoor THz ISAC Systems for Adaptive Resource Allocation ·
cs.IT· arXiv 2605.29913 · score 2 —rag - Visual Spatial Learning: Single-Field Spatial Interpolation Using Convolutional Neural Networks ·
stat.ML· arXiv 2605.30167 · score 2 —rag - Unveiling the Visual Counting Bottleneck in Vision-Language Models ·
cs.MM· arXiv 2605.30170 · score 2 —reasoning - DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation ·
cs.RO· arXiv 2605.30350 · score 2 —rag - An Empirical Study of the Influence of Adversarial Fine-Tuning on Compressed Neural Networks ·
cs.LG· arXiv 2403.09441 · score 2 —fine-tun - A Quotient Homology Theory of Representation in Neural Networks ·
cs.LG· arXiv 2502.01360 · score 2 —rag - Connecting Independently Trained Modes via Layer-Wise Connectivity ·
cs.LG· arXiv 2505.02604 · score 2 —transformer - Active Learning for Machine Learning Driven Molecular Dynamics ·
cs.LG· arXiv 2509.17208 · score 2 —rag - FedBiCross: Personalized One-Shot Federated Learning on Medical Images ·
cs.LG· arXiv 2601.01901 · score 2 —rag - Achieving Linear Speedup for Composite Federated Learning ·
cs.LG· arXiv 2602.03357 · score 2 —rag - Computationally Efficient Replicable Learning of Parities and Applications ·
cs.LG· arXiv 2602.09499 · score 2 —rag - Collaborative Threshold Watermarking ·
cs.LG· arXiv 2602.10765 · score 2 —fine-tun - Localizing Memorized Regions in Diffusion Models via Coordinate-Wise Curvature Differences ·
cs.LG· arXiv 2605.26756 · score 2 —attention - Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models ·
cs.LG· arXiv 2605.27975 · score 2 —fine-tun - MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball ·
cs.GT· arXiv 2506.04602 · score 2 —rag - A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization ·
math.OC· arXiv 2506.20344 · score 2 —rag - SpeedCP: Fast Kernel-based Conditional Conformal Prediction ·
stat.ME· arXiv 2509.24100 · score 2 —rag - Contrastive Representation Regularization for Vision-Language-Action Models ·
cs.RO· arXiv 2510.01711 · score 2 —rag - Permutation-Invariant Spectral Learning via Dyson Diffusion ·
stat.ML· arXiv 2510.08535 · score 2 —rag - Calibrating Generative Models to Distributional Constraints ·
stat.ML· arXiv 2510.10020 · score 2 —fine-tun - Securing SIM-Assisted Wireless Networks via Quantum Reinforcement Learning ·
cs.NI· arXiv 2602.13238 · score 2 —rag - Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression ·
stat.ME· arXiv 2604.13410 · score 2 —rag - A Deep Learning Model for Battery State Prediction towards Intelligent Energy Management ·
eess.SP· arXiv 2605.00898 · score 2 —rag - Paris 2.0: A Decentralized Diffusion Model for Video Generation ·
cs.CV· arXiv 2605.26064 · score 2 —gpu - Design and Implementation of a Serverless MapReduce Framework for Scalable Data Pipelines ·
cs.DC· arXiv 2605.29573 · score 2 —rag - PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration ·
cs.DC· arXiv 2605.29728 · score 2 —gpu - Capsule: Efficient Player Isolation for Datacenters ·
cs.DC· arXiv 2506.11483 · score 2 —gpu - Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems ·
cs.AR· arXiv 2605.29994 · score 2 —latency - elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search ·
cs.AR· arXiv 2605.30019 · score 2 —latency - Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory ·
cs.AR· arXiv 2603.06951 · score 2 —rag
- May 29, 2026 When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems
- May 29, 2026 SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
- May 29, 2026 RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models
- May 29, 2026 ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding
- May 29, 2026 SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
- May 29, 2026 GrepSeek: Training Search Agents for Direct Corpus Interaction
- May 29, 2026 RTP-LLM: High-Performance Alibaba LLM Inference Engine
- May 29, 2026 Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts
- May 29, 2026 The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF
- May 29, 2026 Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning