2026-05-29 Paper Digest

845 arXiv papers on agent / LLM / AI infra submitted that day matched our topic filter. 10 were hand-picked by Claude — using title + authors + affiliations — and received a full Claude-generated analysis; the remaining 835 are listed at the bottom.

1. Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning

arXiv: 2602.00994 · cs.AI · Claude pick

在 Agentic RL 中,推理(reasoning)与工具调用(tool-use)共享参数会产生梯度方向冲突,导致联合优化效果下降。作者量化了这一干扰,并提出 DART——用两个独立 LoRA 适配器分别承接两类梯度——在 13 个 benchmark 上超越所有联合优化基线。


Read detailed analysis →


2. The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF

arXiv: 2605.29491 · cs.AI · Claude pick

Larger LLMs are systematically less robust to instruction-like noise embedded in reference text — a “Curse of Helpfulness” — which the new DistractionIF benchmark quantifies; GRPO-based RL partially recovers up to 15.5% robustness without hurting general instruction following.

Read detailed analysis →


3. Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts

arXiv: 2605.24846 · cs.LG · Claude pick

A tiny, cross-task subset of neurons (< 0.2% of all neurons) called “keystone neurons” can be identified in open-weight LLMs with just four prompts; removing them collapses all model capabilities, while fine-tuning only them matches or exceeds full-parameter fine-tuning.

Read detailed analysis →


4. RTP-LLM: High-Performance Alibaba LLM Inference Engine

arXiv: 2605.29639 · cs.OS · Claude pick

RTP-LLM is Alibaba’s production LLM inference engine, serving 100M+ users, that integrates prefill-decode disaggregation, multi-tiered KV cache, speculative decoding, and model-loading optimizations to deliver 4.7×–6.3× faster loading, 35–40% latency reduction, and substantial throughput gains over vLLM and SGLang.

Read detailed analysis →


5. GrepSeek: Training Search Agents for Direct Corpus Interaction

arXiv: 2605.29307 · cs.CL · Claude pick

GrepSeek trains a compact LLM to search large text corpora by issuing shell commands (rg, grep) directly against raw text, bypassing pre-computed indices, using a cold-start SFT + GRPO two-stage pipeline and a 7.6× sharded-parallel execution engine.

Read detailed analysis →


6. SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

arXiv: 2604.09557 · cs.DC · Claude pick

SPEED-Bench 是一个专为投机解码(Speculative Decoding)设计的综合评测套件,通过语义多样性驱动的数据策划与生产级引擎集成,解决现有基准在多样性、吞吐量评估和真实环境代表性上的系统性缺陷。

Read detailed analysis →


7. ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding

arXiv: 2604.13519 · cs.CL · Claude pick

ToolSpec 是一种免训练的推测解码方法,通过有限状态机利用预定义工具 schema 确定性地生成草稿 token,并结合历史调用检索,将工具调用生成速度提升最高 4.2×。

Read detailed analysis →


8. RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

arXiv: 2603.18859 · cs.AI · Claude pick

RewardFlow builds a state graph from sampled agentic trajectories and propagates BFS-based rewards from success nodes to intermediate states, providing annotation-free dense process rewards that improve RL training across four agentic benchmarks without any reward model.

Read detailed analysis →


arXiv: 2605.29796 · cs.AI · Claude pick

SAAS is an RL framework that teaches agentic search models when not to search by dynamically tracking the agent’s evolving knowledge boundary and converting that awareness into discriminative trajectory-level penalties, reducing over-search without accuracy loss.


Read detailed analysis →


10. When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

arXiv: 2605.30102 · cs.MA · Claude pick

This position/workshop paper systematically examines the design space of hybrid multi-agent systems (MAS) that mix cloud-hosted frontier LLMs with on-device SLMs, finding that no single hybrid architecture dominates across tasks and that more cloud compute does not reliably improve performance.

Read detailed analysis →


Other matched papers

These papers matched the same topic keywords but were not among Claude’s top-N deep-analysis picks.

  1. Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling · cs.AI · arXiv 2605.29262 · score 32large language model, llm, agent, agentic, retrieval, rag
  2. ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows · cs.CV · arXiv 2605.14113 · score 30large language model, llm, agent, agentic, retrieval, rag
  3. SURGENT: A Surgical Multi-Agent Assistance System Across the Perioperative Workflow · cs.CL · arXiv 2605.29368 · score 29large language model, llm, agent, multi-agent, retrieval, reasoning
  4. BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices · cs.AI · arXiv 2605.29705 · score 28large language model, llm, multi-agent, rag, reasoning, inference
  5. Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation · cs.AI · arXiv 2605.29873 · score 27large language model, llm, reasoning, serving, kv cache, attention
  6. Learning to Choose: An Empowerment-Guided Multi-Agent System with semantic communication for Adaptive Method Selection · cs.AI · arXiv 2605.30042 · score 23large language model, llm, agent, multi-agent, rag, serving
  7. AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios · cs.AI · arXiv 2605.27995 · score 28large language model, llm, agent, tool use, tool-use, reasoning
  8. CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems · cs.MA · arXiv 2605.29612 · score 27large language model, llm, agent, multi-agent, rag, latency
  9. MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs · cs.AI · arXiv 2605.29512 · score 26large language model, llm, agent, multi-agent, reasoning, inference
  10. CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective · cs.CL · arXiv 2502.03805 · score 26large language model, llm, rag, inference, kv cache, attention
  11. Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems · cs.AI · arXiv 2605.29676 · score 21large language model, llm, agent, agentic, ai system
  12. KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning · cs.AI · arXiv 2605.30002 · score 25large language model, llm, agent, agentic, rag, reasoning
  13. Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization · cs.MA · arXiv 2605.30227 · score 25large language model, llm, agent, multi-agent, rag, reasoning
  14. MediHive: A Decentralized Agent Collective for Medical Reasoning · cs.AI · arXiv 2603.27150 · score 21large language model, llm, agent, multi-agent, rag, reasoning
  15. MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration · cs.AI · arXiv 2604.14889 · score 21llm, rag, reasoning, chain-of-thought, inference, serving
  16. AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials · cs.AI · arXiv 2510.04704 · score 26large language model, llm, agent, agentic, retrieval, reasoning
  17. DynaGraph: Lightweight Multi-Model Interaction Framework via Dynamic Topological Reconfiguration · cs.MA · arXiv 2605.29511 · score 30llm, agent, multi-agent, reasoning, inference, gpu
  18. BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference · cs.LG · arXiv 2605.29233 · score 20llm, rag, inference, serving, kv-cache, parallelism
  19. Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation · cs.CL · arXiv 2605.29861 · score 24large language model, llm, agent, multi-agent, tool use
  20. ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation · cs.CV · arXiv 2604.11080 · score 20large language model, llm, rag, inference, quantization, attention
  21. Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction · cs.CL · arXiv 2605.25297 · score 20llm, agent, agentic, reasoning, chain-of-thought, gpu
  22. DFlash: Block Diffusion for Flash Speculative Decoding · cs.CL · arXiv 2602.06036 · score 20large language model, llm, inference, speculative decoding, gpu, latency
  23. Accelerating Sparse Transformer Inference on GPU · cs.LG · arXiv 2506.06095 · score 20large language model, llm, rag, inference, attention, transformer
  24. VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis · cs.AI · arXiv 2605.28978 · score 19large language model, llm, agent, multi-agent, reasoning
  25. Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models · cs.AI · arXiv 2605.29625 · score 19large language model, llm, agent, multi-agent, attention
  26. Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent · cs.AI · arXiv 2605.29966 · score 19large language model, llm, agent, rag, reasoning, fine-tun
  27. MOOSE-Copilot: A Web-Based Interactive Assistant for Unified Exploratory and Fine-Grained Scientific Hypothesis Discovery · cs.CL · arXiv 2605.29475 · score 19large language model, llm, agent, agentic, rag
  28. Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding · cs.CL · arXiv 2605.29707 · score 19llm, inference, serving, speculative decoding, transformer, throughput
  29. The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems · cs.CL · arXiv 2602.15382 · score 19large language model, agent, multi-agent, rag, reasoning, quantization
  30. Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents · cs.LG · arXiv 2605.28850 · score 19large language model, llm, agent, reasoning, transformer, fine-tun
  31. Robust and Efficient Guardrails with Latent Reasoning · cs.AI · arXiv 2605.29068 · score 18large language model, llm, reasoning, inference, throughput, latency
  32. Teaching Language Models to Check Grounded Claim Factuality with Human Test-Taking Strategies · cs.CL · arXiv 2605.29712 · score 18large language model, llm, retrieval, reasoning, inference, fine-tun
  33. Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots · cs.CR · arXiv 2605.29963 · score 18llm, agent, agentic, rag, serving
  34. Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies · cs.LG · arXiv 2605.30148 · score 18large language model, llm, inference, serving, fine-tun
  35. E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing · cs.LG · arXiv 2512.03109 · score 18llm, agent, agentic, reasoning, ai system
  36. Molecular Lead Optimization via Agentic Tool Planning · cs.LG · arXiv 2605.28862 · score 18llm, agent, agentic, reasoning, serving
  37. FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models · cs.LG · arXiv 2511.11505 · score 18rag, inference, serving, parallelism, mixture of experts, moe
  38. Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization · cs.AI · arXiv 2605.29396 · score 17large language model, llm, rag, serving, quantization
  39. Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation · cs.AI · arXiv 2605.29560 · score 17large language model, llm, agent, rag, reasoning
  40. VikingMem: A Memory Base Management System for Stateful LLM-based Applications · cs.AI · arXiv 2605.29640 · score 17large language model, llm, agent, retrieval, latency
  41. OptSkills: Learning Generalizable Optimization Skills from Problem Archetypes via Cluster-Based Distillation · cs.AI · arXiv 2605.29829 · score 17large language model, llm, agent, rag, reasoning
  42. Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning · cs.AI · arXiv 2605.30039 · score 17large language model, llm, rag, serving, fine-tun
  43. Modularizing Educational LLM-Agency for Fostering Responsible Learning Assistance · cs.AI · arXiv 2605.30187 · score 17large language model, llm, agent, agentic
  44. GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling · cs.CL · arXiv 2605.28835 · score 17large language model, llm, multi-agent, rag, fine-tun
  45. Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning · cs.CL · arXiv 2605.28842 · score 17large language model, llm, reasoning, chain-of-thought, serving
  46. First head-to-head comparison of agentic AI applied to the analysis of simulated data of the Einstein Telescope · cs.AI · arXiv 2605.28916 · score 17large language model, agent, agentic, ai system
  47. Conf-Gen: Conformal Uncertainty Quantification for Generative Models · cs.LG · arXiv 2605.28920 · score 17large language model, llm, agent, ai system
  48. Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents · cs.CL · arXiv 2605.29224 · score 17large language model, llm, agent, retrieval, rag
  49. Training Deliberative Monitors for Black-Box Scheming Detection · cs.CL · arXiv 2605.29601 · score 17agent, agentic, reasoning, chain-of-thought, inference, fine-tun
  50. Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction · cs.CR · arXiv 2605.29960 · score 17large language model, llm, agent, rag, attention
  51. InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents · cs.AI · arXiv 2511.22884 · score 17large language model, llm, agent, multi-agent
  52. Small Agent Group is the Future of Digital Health · cs.AI · arXiv 2602.08013 · score 17large language model, llm, agent, retrieval, reasoning
  53. FundaPod: A Multi-Persona Agent Pod Platform with Knowledge Graph Memory for AI-Assisted Fundamental Investment Research · cs.AI · arXiv 2605.27864 · score 17large language model, llm, agent, serving
  54. Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence · cs.CR · arXiv 2509.23573 · score 17large language model, llm, agent, rag, reasoning
  55. When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making · cs.RO · arXiv 2603.16673 · score 17large language model, llm, agent, reasoning, latency
  56. Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies · cs.CL · arXiv 2605.29062 · score 17large language model, llm, agent, multi-agent
  57. WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction · cs.CV · arXiv 2605.29341 · score 17large language model, agent, agentic, retrieval, rag
  58. ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems · cs.MA · arXiv 2602.08567 · score 17large language model, llm, agent, multi-agent
  59. RAT+: Train Dense, Infer Sparse – Recurrence Augmented Attention for Dilated Inference · cs.LG · arXiv 2602.18196 · score 17rag, reasoning, inference, serving, kv cache, attention
  60. Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching · cs.AI · arXiv 2605.29055 · score 16llm, agent, agentic, multi-agent
  61. DenseSteer: Steering Small Language Models towards Dense Math Reasoning · cs.AI · arXiv 2605.29247 · score 16large language model, llm, reasoning, chain-of-thought, inference
  62. Enhancing Multi-Agent Communication through Attention Steering with Context Relevance · cs.AI · arXiv 2605.30136 · score 16llm, agent, multi-agent, reasoning, attention
  63. Pocket-Dentist: On-Device Dental Image Understanding via Efficient Multimodal Large Language Models · cs.CV · arXiv 2605.29299 · score 16large language model, rag, inference, serving, latency
  64. Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge · cs.CV · arXiv 2605.29402 · score 16large language model, llm, retrieval, reasoning, inference
  65. Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage · cs.CR · arXiv 2605.30040 · score 16large language model, llm, rag, reasoning, inference
  66. AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning · cs.AI · arXiv 2602.23258 · score 16agent, multi-agent, retrieval, rag, reasoning, fine-tun
  67. Grammar-Aware Literate Generative Mathematical Programming with Compiler-in-the-Loop · cs.PL · arXiv 2601.17670 · score 16large language model, llm, retrieval, rag, compiler
  68. CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs · cs.MA · arXiv 2605.09823 · score 16llm, agent, multi-agent, serving
  69. EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI · cs.CL · arXiv 2511.08949 · score 16large language model, llm, rag, inference, fine-tun
  70. Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models · cs.AI · arXiv 2605.29303 · score 15large language model, reasoning, serving, fine-tun, post-train
  71. NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs · cs.AI · arXiv 2605.29716 · score 15large language model, llm, reasoning, latency, fine-tun
  72. Citation-Closure Retrieval and Per-Rule Attribution for Real-World Regulatory Compliance Question Answering · cs.AI · arXiv 2605.29742 · score 15large language model, llm, retrieval, rag, reasoning
  73. Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence · cs.AI · arXiv 2605.29744 · score 15large language model, llm, multi-agent, reasoning
  74. LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs · cs.AI · arXiv 2605.29756 · score 15large language model, llm, quantization, transformer, post-train
  75. From GPS Points to Travel Patterns: Flexible and Semantic Trajectory Generation with LLMs · cs.AI · arXiv 2605.30014 · score 15large language model, llm, rag, quantization, fine-tun
  76. PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers · cs.AI · arXiv 2605.30094 · score 15large language model, llm, agent, rag
  77. Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models · cs.LG · arXiv 2605.28866 · score 15large language model, llm, reasoning, serving
  78. Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models · cs.CL · arXiv 2605.30251 · score 15large language model, llm, rag, serving
  79. LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback · cs.HC · arXiv 2605.30273 · score 15large language model, llm, rag, serving
  80. RoboWits: Unexpected Challenges for Robotic Creative Problem Solving · cs.RO · arXiv 2605.30326 · score 15agent, multi-agent, tool use, reasoning, fine-tun
  81. PersonaAgent: Bridging Memory and Action for Personalized LLM Agents · cs.AI · arXiv 2506.06254 · score 15large language model, llm, agent, rag
  82. A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models · cs.AI · arXiv 2511.08548 · score 15large language model, llm, reasoning, ai system
  83. SCOPE: Prompt Evolution for Enhancing Agent Effectiveness · cs.AI · arXiv 2512.15374 · score 15large language model, llm, agent, rag
  84. AutoSizer: Automatic Sizing of Analog and Mixed-Signal Circuits via Large Language Model (LLM) Agents · cs.AI · arXiv 2602.02849 · score 15large language model, llm, agent, reasoning
  85. Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs · cs.AI · arXiv 2602.02909 · score 15llm, reasoning, chain-of-thought, inference, attention, latency
  86. MemCollab: Cross-Model Memory Collaboration via Contrastive Trajectory Distillation · cs.AI · arXiv 2603.23234 · score 15llm, agent, retrieval, reasoning, inference
  87. Are LLMs Socially Adaptive? Contrasting Belief Evolution in Large Language Models and Humans · cs.CE · arXiv 2410.10398 · score 15large language model, llm, agent, reasoning
  88. Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems · cs.CY · arXiv 2501.10332 · score 15large language model, llm, agent, rag
  89. GroundAct: Can LLM Agents Ground Actions in Environmental States? · cs.CL · arXiv 2508.05614 · score 15llm, agent, tool use, reasoning, fine-tun
  90. Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting · cs.CR · arXiv 2509.23571 · score 15large language model, llm, agent, reasoning
  91. Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers · cs.CL · arXiv 2601.22139 · score 15large language model, llm, reasoning, chain-of-thought, fine-tun
  92. Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training · cs.LG · arXiv 2603.00454 · score 15large language model, llm, serving, fine-tun
  93. Combating Data Laundering in LLM Training · cs.CR · arXiv 2604.01904 · score 15large language model, llm, rag, serving
  94. The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown · cs.AI · arXiv 2604.04956 · score 15large language model, llm, agent, reasoning
  95. BIRDS: Characterizing and Understanding Biodiversity Impact of Large Language Model Serving · cs.AI · arXiv 2605.27480 · score 15large language model, llm, serving, gpu
  96. ROVER: Routing Object-Centric Visual Evidence for Grounded Multi-Image Reasoning · cs.CV · arXiv 2605.27959 · score 15large language model, llm, rag, reasoning, attention
  97. RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment · cs.CL · arXiv 2605.28827 · score 15large language model, llm, quantization, attention, fine-tun
  98. Text-Preserving Lossy Text Compression: A Study of Strategic Deletion and LLM Reconstruction · cs.CL · arXiv 2605.29000 · score 15large language model, llm, serving, fine-tun
  99. Revisiting Observation Reduction for Web Agents: Comprehensive Evaluation with a Lightweight Framework · cs.CL · arXiv 2605.29397 · score 15llm, agent, rag, inference, latency
  100. From Blind Guess to Informed Judgment: Teaching LLMs to Evaluate Materials by Building Knowledge-Augmented Preference Signals · cs.CL · arXiv 2605.29555 · score 15large language model, llm, retrieval, reasoning, throughput
  101. SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge? · cs.CL · arXiv 2605.30104 · score 15llm, agent, tool-use, reasoning, latency
  102. Understanding Fact Recall in Language Models: Why Two-Stage Training Encourages Memorization but Mixed Training Teaches Knowledge · cs.CL · arXiv 2505.16178 · score 15large language model, llm, retrieval, rag, fine-tun
  103. Long-Context Modeling with Dynamic Hierarchical Sparse Attention for Memory-Constrained LLM Inference · cs.CL · arXiv 2510.24606 · score 15llm, inference, serving, attention, gpu
  104. How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning · cs.LG · arXiv 2602.02103 · score 15large language model, llm, rag, reasoning, chain-of-thought
  105. K-FinHallu: A Hallucination Detection Benchmark for Multi-Turn RAG in Korean Finance · cs.LG · arXiv 2605.29523 · score 15large language model, llm, retrieval, rag, fine-tun
  106. OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction · cs.LG · arXiv 2605.30247 · score 15large language model, llm, retrieval, rag, reasoning
  107. Echoes within the Reasoning: Stealthy and Effective Watermarking via Chain of Thought · cs.CR · arXiv 2605.28890 · score 15large language model, rag, reasoning, chain-of-thought, quantization, fine-tun
  108. The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane · cs.AI · arXiv 2605.29082 · score 14agent, agentic, multi-agent, throughput
  109. Beyond Consensus: Trace-Level Synthesis in Mixture of Agents · cs.AI · arXiv 2605.29116 · score 14llm, agent, reasoning, serving
  110. Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction · cs.AI · arXiv 2605.29168 · score 14llm, retrieval, rag, reasoning, serving
  111. Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces · cs.AI · arXiv 2605.29288 · score 14llm, reasoning, chain-of-thought, serving, fine-tun
  112. PassNet: Scaling Large Language Models for Graph Compiler Pass Generation · cs.AI · arXiv 2605.29357 · score 14large language model, llm, compiler, fine-tun
  113. Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation · cs.AI · arXiv 2605.29430 · score 14llm, agent, agentic, reasoning
  114. ParaTool: Shifting Tool Representations from Context to Parameters · cs.AI · arXiv 2605.29561 · score 14large language model, llm, inference, fine-tun
  115. AgentSchool: An LLM-Powered Multi-Agent Simulation for Education · cs.AI · arXiv 2605.30144 · score 14llm, agent, multi-agent, reasoning
  116. Hallucination Detection-Guided Preference Optimization for Clinical Summarization · cs.CL · arXiv 2605.28910 · score 14large language model, llm, rag, inference
  117. CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models · cs.LG · arXiv 2605.28919 · score 14large language model, reasoning, inference, attention, transformer
  118. SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers · cs.SE · arXiv 2605.29059 · score 14large language model, llm, reasoning, compiler
  119. Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension · cs.MA · arXiv 2605.29874 · score 14llm, agent, multi-agent, rag
  120. Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents · cs.SE · arXiv 2605.29910 · score 14llm, agent, multi-agent, reasoning
  121. Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency · cs.SE · arXiv 2605.30208 · score 14llm, agent, agentic, latency
  122. Enhancing LLM Medical Coding with Structured External Knowledge · cs.CL · arXiv 2605.27377 · score 14llm, agent, agentic, rag
  123. EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter Adaptation · cs.CL · arXiv 2605.27390 · score 14large language model, retrieval, inference, speculative decoding
  124. Draft-OPD: On-Policy Distillation for Speculative Draft Models · cs.CL · arXiv 2605.29343 · score 14large language model, inference, speculative decoding, fine-tun
  125. Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting · cs.CL · arXiv 2605.29498 · score 14large language model, llm, inference, fine-tun
  126. ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation · cs.CL · arXiv 2605.29791 · score 14large language model, llm, reasoning, inference
  127. CCS: Clinical Consensus Selection for Radiology Report Generation · cs.CL · arXiv 2605.30131 · score 14large language model, llm, retrieval, inference
  128. Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning · cs.CL · arXiv 2605.30245 · score 14large language model, llm, reasoning, inference
  129. Cognitive Loop of Thought: Reversible Hierarchical Markov Chain for Efficient Mathematical Reasoning · cs.CL · arXiv 2604.06805 · score 14llm, rag, reasoning, chain-of-thought, kv cache
  130. Inferring the Size of Large Language Models From Popular Text Memorization · cs.LG · arXiv 2605.29223 · score 14large language model, llm, rag, inference
  131. On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference · cs.LG · arXiv 2605.29580 · score 14large language model, rag, reasoning, inference, fine-tun
  132. Fingerprinting Inference Systems of Large Language Models · cs.CR · arXiv 2605.29979 · score 14large language model, llm, inference, attention
  133. DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts · cs.LG · arXiv 2605.15422 · score 14parallelism, moe, attention, gpu, cuda, post-train
  134. TC-MIS: Maximal Independent Set on Tensor-cores · cs.DC · arXiv 2605.29604 · score 14rag, inference, parallelism, gpu, cuda, throughput
  135. Provably Secure Agent Guardrail · cs.AI · arXiv 2605.29251 · score 13large language model, agent, reasoning, latency
  136. ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression · cs.AI · arXiv 2605.29350 · score 13rag, serving, moe, fine-tun, post-train
  137. EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics · cs.AI · arXiv 2605.29394 · score 13large language model, llm, reasoning, fine-tun
  138. When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs · cs.AI · arXiv 2605.29420 · score 13large language model, llm, retrieval, rag
  139. VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data · cs.AI · arXiv 2605.29483 · score 13agent, agentic, tool use, reasoning
  140. LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning · cs.AI · arXiv 2605.29649 · score 13large language model, llm, rag, reasoning
  141. TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation · cs.AI · arXiv 2605.29656 · score 13large language model, llm, reasoning, chain-of-thought
  142. Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability · cs.AI · arXiv 2605.29687 · score 13large language model, llm, reasoning, chain-of-thought
  143. Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories · cs.AI · arXiv 2605.29893 · score 13llm, agent, tool use, reasoning
  144. ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure · cs.AI · arXiv 2605.30284 · score 13large language model, llm, retrieval, reasoning
  145. Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models · cs.CL · arXiv 2605.28828 · score 13large language model, llm, retrieval, reasoning
  146. How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines · cs.CL · arXiv 2605.28840 · score 13large language model, llm, agent
  147. GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human · cs.CL · arXiv 2605.28882 · score 13large language model, llm, agent
  148. Sustainable Metal-Organic Framework Water Harvesters in the Artificial Intelligence Era · cs.AI · arXiv 2605.29179 · score 13large language model, llm, serving
  149. KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing · cs.CR · arXiv 2605.29524 · score 13large language model, llm, serving
  150. SCOPE: A Lightweight-training LLM Framework for Air Traffic Control Readback Monitoring · cs.LG · arXiv 2605.29543 · score 13large language model, llm, reasoning, latency
  151. Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content · cs.LG · arXiv 2605.29659 · score 13large language model, llm, serving
  152. Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning · cs.LG · arXiv 2605.29782 · score 13large language model, llm, rag, post-train
  153. Inferring Code Correctness from Specification · cs.SE · arXiv 2605.29822 · score 13large language model, llm, reasoning, chain-of-thought
  154. LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training · cs.LG · arXiv 2605.29888 · score 13large language model, llm, reasoning, post-train
  155. VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies · cs.CV · arXiv 2605.30011 · score 13reasoning, chain-of-thought, inference, serving, latency
  156. Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection · cs.CL · arXiv 2605.30274 · score 13large language model, agent, rag, reasoning
  157. PuzzleClone: A DSL-Powered Framework for Synthesizing Verifiable Data · cs.AI · arXiv 2508.15180 · score 13large language model, llm, rag, reasoning
  158. EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance · cs.AI · arXiv 2509.23730 · score 13large language model, llm, rag, reasoning
  159. Controlling the Risk of Corrupted Contexts for Language Models via Early-Exiting · cs.AI · arXiv 2510.02480 · score 13large language model, llm, rag, attention
  160. CodeEvolve: an open source evolutionary coding agent for algorithmic discovery and optimization · cs.AI · arXiv 2510.14150 · score 13large language model, llm, agent
  161. Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation · cs.AI · arXiv 2604.10511 · score 13large language model, llm, reasoning, chain-of-thought
  162. HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models · cs.AI · arXiv 2605.24140 · score 13large language model, llm, reasoning, fine-tun
  163. Soro: A Lightweight Foundation Model and Chatbot for Tajik · cs.AI · arXiv 2605.27379 · score 13large language model, llm, rag, quantization
  164. The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic · cs.AI · arXiv 2605.28700 · score 13large language model, llm, rag, reasoning
  165. Jailbreaking and Mitigation of Vulnerabilities in Large Language Models · cs.CR · arXiv 2410.15236 · score 13large language model, llm, multi-agent
  166. Less is Enough: Synthesizing Diverse Data in LLM Feature Space with Sparse Autoencoders · cs.CL · arXiv 2602.10388 · score 13large language model, llm, rag, post-train
  167. A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search · cs.CL · arXiv 2602.11171 · score 13large language model, llm, rag, fine-tun
  168. Steering at the Source: Style Modulation Heads for Robust Persona Control · cs.CL · arXiv 2603.13249 · score 13large language model, llm, attention, fine-tun
  169. P$^2$RAG: Efficient Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval · cs.CR · arXiv 2603.14778 · score 13large language model, retrieval, rag, serving
  170. Bridge-RAG: An Abstract Bridge Tree Based Retrieval Augmented Generation Algorithm · cs.IR · arXiv 2603.26668 · score 13large language model, llm, retrieval, rag
  171. Teacher-Guided Policy Optimization for On-Policy Reasoning Distillation under Large Policy Divergence · cs.LG · arXiv 2605.13230 · score 13large language model, llm, reasoning, post-train
  172. Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning · cs.CV · arXiv 2605.16385 · score 13llm, rag, reasoning, inference, attention
  173. GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization · cs.LG · arXiv 2605.26092 · score 13large language model, llm, quantization, transformer
  174. MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models · cs.CL · arXiv 2605.28825 · score 13large language model, llm, rag, reasoning
  175. Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception · cs.CL · arXiv 2605.29064 · score 13large language model, llm, agent
  176. Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation · cs.CL · arXiv 2605.29992 · score 13retrieval, inference, serving, transformer, gpu
  177. PEARL: Training Socratic Tutors with Pedagogically Aligned Reinforcement Learning · cs.LG · arXiv 2605.29582 · score 13large language model, llm, agent
  178. Catalyst-Agent: Autonomous heterogeneous catalyst screening with an LLM Agent · cs.CL · arXiv 2603.01311 · score 13llm, agent, tool use, rag
  179. OpenCompass: A Universal Evaluation Platform for Large Language Models · cs.CL · arXiv 2605.19276 · score 13large language model, llm, rag, reasoning
  180. OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents · cs.CL · arXiv 2605.23657 · score 13large language model, llm, agent
  181. Feedback-to-Rubrics: Can We Learn Expert Criteria from Inline Comments? · cs.LG · arXiv 2605.29857 · score 13large language model, llm, serving
  182. Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets · cs.LG · arXiv 2605.30289 · score 13large language model, retrieval, serving, transformer
  183. SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones? · cs.LG · arXiv 2605.30329 · score 13large language model, llm, agent
  184. Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection · cs.CR · arXiv 2605.29901 · score 13large language model, llm, reasoning, attention
  185. Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding · cs.LG · arXiv 2511.04934 · score 13large language model, llm, ai system
  186. SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs · cs.DC · arXiv 2603.00357 · score 13llm, training system, parallelism, gpu
  187. Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes · cs.AI · arXiv 2605.28965 · score 12llm, agent, agentic
  188. Governing Technical Debt in Agentic AI Systems · cs.AI · arXiv 2605.29129 · score 12agent, agentic, ai system
  189. CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval · cs.AI · arXiv 2605.29271 · score 12llm, agent, retrieval, fine-tun
  190. Formalizing Mathematics at Scale · cs.AI · arXiv 2605.29955 · score 12llm, agent, multi-agent
  191. Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison · cs.AI · arXiv 2605.30087 · score 12llm, agent, rag, reasoning
  192. SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation · cs.CL · arXiv 2605.29146 · score 12llm, agent, multi-agent
  193. GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models · cs.LG · arXiv 2605.29398 · score 12large language model, llm, inference
  194. SkillBrew: Multi-Objective Curation of Skill Banks for LLM Agents · cs.CL · arXiv 2605.29440 · score 12llm, agent, retrieval, rag
  195. Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation · cs.CL · arXiv 2605.29502 · score 12llm, rag, serving, fine-tun
  196. Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems · cs.MA · arXiv 2605.29790 · score 12llm, agent, multi-agent
  197. HARP: Hadamard-Preconditioned Adaptive Rotation Processor for Extreme LLM Quantization · cs.LG · arXiv 2605.29843 · score 12llm, serving, quantization, post-train
  198. Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas · cs.MA · arXiv 2605.30003 · score 12llm, agent, multi-agent
  199. No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval · cs.IR · arXiv 2605.30120 · score 12retrieval, rag, serving, throughput, latency
  200. Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms · cs.CY · arXiv 2605.30169 · score 12agent, agentic, multi-agent
  201. Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning · cs.CV · arXiv 2605.30231 · score 12llm, rag, reasoning, transformer, fine-tun
  202. Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents · cs.AI · arXiv 2602.01869 · score 12llm, agent, rag, reasoning
  203. SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data · cs.AI · arXiv 2604.26645 · score 12agent, agentic, multi-agent
  204. From Rubrics to Reliable Scores: Evidence-Grounded Text Evaluation with LLM Judges · cs.CL · arXiv 2601.08654 · score 12large language model, llm, inference
  205. Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR · cs.CL · arXiv 2602.12642 · score 12llm, rag, reasoning, scheduler, post-train
  206. PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration · cs.CL · arXiv 2605.29313 · score 12llm, agent, multi-agent
  207. Learning Design Skills as Memory Policies for Agentic Photonic Inverse Design · cs.CL · arXiv 2605.29421 · score 12llm, agent, agentic
  208. UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering · cs.CL · arXiv 2605.30076 · score 12large language model, llm, inference
  209. Rare Event Analysis of Large Language Models · cs.LG · arXiv 2602.06791 · score 12large language model, llm, inference
  210. HPC-vQPU: A Service-Export Architecture for Virtual QPUs on Batch-Scheduled HPC Systems · cs.DC · arXiv 2605.28845 · score 12agent, serving, gpu, scheduler
  211. Mind Your Tone: Does Tone Alter LLM Performance? · cs.AI · arXiv 2605.29027 · score 11large language model, llm, reasoning
  212. GTA: Generating Long-Horizon Tasks for Web Agents at Scale · cs.AI · arXiv 2605.29218 · score 11agent, tool-use, retrieval, rag
  213. Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility · cs.AI · arXiv 2605.29229 · score 11large language model, llm, reasoning
  214. PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing · cs.AI · arXiv 2605.29815 · score 11large language model, llm, rag
  215. Harnessing non-adversarial robustness in large language models · cs.AI · arXiv 2605.29816 · score 11large language model, llm, fine-tun
  216. Make LLM Learn to Synthesize from Streaming Experiences through Feedback · cs.AI · arXiv 2605.29940 · score 11large language model, llm, rag
  217. Anchorless Diversification for Parallel LLM Ideation · cs.AI · arXiv 2605.30150 · score 11llm, inference, serving
  218. When Should Models Change Their Minds? Contextual Belief Management in Large Language Models · cs.AI · arXiv 2605.30219 · score 11large language model, llm, rag
  219. S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering · cs.CL · arXiv 2605.28831 · score 11agent, retrieval, rag, inference
  220. SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation · cs.CL · arXiv 2605.28837 · score 11large language model, llm, retrieval
  221. GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models · cs.CL · arXiv 2605.28848 · score 11large language model, llm, retrieval
  222. Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT? · cs.LG · arXiv 2605.28860 · score 11large language model, llm, fine-tun
  223. Label-Free Reinforcement Learning via Cross-Model Entropy · cs.LG · arXiv 2605.29009 · score 11large language model, llm, post-train
  224. Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text · cs.CL · arXiv 2605.29076 · score 11llm, reasoning, inference, fine-tun
  225. CA-AC-MPC: CUDA-Accelerated Actor-Critic Model Predictive Control · cs.RO · arXiv 2605.29155 · score 11inference, serving, cuda, latency
  226. Parallax: Parameterized Local Linear Attention for Language Modeling · cs.LG · arXiv 2605.29157 · score 11large language model, llm, attention
  227. UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning · cs.CL · arXiv 2605.29170 · score 11large language model, llm, reasoning
  228. DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents · cs.CL · arXiv 2605.29256 · score 11large language model, agent, rag
  229. Beyond Bilingual Transfer: Multilingual Code-Switching in Instruction Tuning · cs.CL · arXiv 2605.29414 · score 11large language model, llm, rag
  230. Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment · cs.CL · arXiv 2605.29458 · score 11large language model, llm, reasoning
  231. Projectional Decoding: Towards Semantic-Aware LLM Generation · cs.SE · arXiv 2605.30054 · score 11large language model, llm, reasoning
  232. MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings · cs.CL · arXiv 2605.30295 · score 11large language model, llm, reasoning
  233. In-Context Reward Adaptation for Robust Preference Modeling · cs.LG · arXiv 2605.30323 · score 11large language model, rag, transformer, rlhf
  234. LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models · cs.AI · arXiv 2601.06431 · score 11large language model, rag, reasoning, attention
  235. IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents · cs.AI · arXiv 2604.05157 · score 11large language model, agent, rag
  236. Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling · cs.AI · arXiv 2604.25098 · score 11large language model, llm, reasoning
  237. Hierarchical Task Network Planning with LLM-Generated Heuristics · cs.AI · arXiv 2605.07707 · score 11large language model, llm, rag
  238. Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives · cs.GT · arXiv 2505.21627 · score 11large language model, llm, rag
  239. An accuracy-aware extension to LRP-based pruning for CNNs to prevent cascading accuracy degradation in data-scarce transfer learning · cs.CV · arXiv 2511.10861 · score 11rag, inference, serving, fine-tun
  240. Differential syntactic and semantic encoding in LLMs · cs.CL · arXiv 2601.04765 · score 11large language model, llm, rag
  241. Thinking Before Constraining: A Unified Decoding Framework for Large Language Models · cs.CL · arXiv 2601.07525 · score 11large language model, llm, reasoning
  242. Who can we trust? LLM-as-a-jury for Comparative Assessment · cs.CL · arXiv 2602.16610 · score 11large language model, llm, rag
  243. JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments · cs.CV · arXiv 2602.18527 · score 11large language model, llm, reasoning
  244. Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data · cs.LG · arXiv 2603.19294 · score 11large language model, llm, post-train
  245. The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More · cs.CL · arXiv 2603.23971 · score 11agent, rag, reasoning, inference
  246. SelfGrader: LLM Jailbreak Detection via Anchored Token-Level Logits · cs.CR · arXiv 2604.01473 · score 11large language model, llm, latency
  247. DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories · cs.CL · arXiv 2604.20443 · score 11llm, rag, reasoning, inference
  248. Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment · cs.CL · arXiv 2605.28822 · score 11large language model, llm, fine-tun
  249. The Trust Paradox: How CS Researchers Engage LLM Leaderboards · cs.CL · arXiv 2605.28966 · score 11large language model, llm, rag
  250. Reasoning-preserved Efficient Distillation of Large Language Models via Activation-aware Initialization · cs.CL · arXiv 2605.29327 · score 11large language model, llm, reasoning
  251. FinGuard: Detecting Financial Regulatory Non-Compliance in LLM Interactions · cs.CL · arXiv 2605.29427 · score 11large language model, llm, fine-tun
  252. Comparative Evaluation of Machine Translation Systems on Images with Text · cs.CL · arXiv 2605.29476 · score 11large language model, llm, reasoning
  253. Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese · cs.CL · arXiv 2605.29667 · score 11large language model, llm, rag
  254. Spurious Prompts: Can Irrelevant Prompts Steer Large Language Models? · cs.CL · arXiv 2605.29678 · score 11large language model, llm, reasoning
  255. Understanding Safety-Sensitive Expert Behavior in Mixture-of-Experts LLMs · cs.CL · arXiv 2605.29708 · score 11llm, serving, moe
  256. Nine Judges, Two Effective Votes: Correlated Errors Undermine LLM Evaluation Panels · cs.CL · arXiv 2605.29800 · score 11llm, reasoning, chain-of-thought, inference
  257. Latent Performance Profiling of Large Language Models · cs.CL · arXiv 2605.30018 · score 11large language model, llm, reasoning
  258. Who Am I? History-Aware Profiles for Student Simulation in Tutoring Dialogues · cs.CL · arXiv 2605.30051 · score 11large language model, llm, rag
  259. CommunityFact: A Dynamic, Multilingual, Multi-domain Benchmark for Misinformation Detection in the Wild · cs.CL · arXiv 2605.30241 · score 11llm, retrieval, rag, inference
  260. Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content · cs.CR · arXiv 2605.29245 · score 11large language model, llm, rag
  261. Understanding the Ability of LLMs to Handle Character-Level Perturbation · cs.CL · arXiv 2510.14365 · score 11large language model, llm, rag
  262. WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models · cs.CL · arXiv 2512.00837 · score 11large language model, llm, rag
  263. “Be My Cheese?”: Cultural Nuance Benchmarking for Machine Translation in Multilingual LLMs · cs.CL · arXiv 2602.04729 · score 11large language model, llm, rag
  264. Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing · cs.CL · arXiv 2603.17942 · score 11large language model, llm, throughput
  265. HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation · cs.CL · arXiv 2604.09629 · score 11large language model, llm, fine-tun
  266. When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance · cs.CL · arXiv 2605.22975 · score 11large language model, llm, rag
  267. HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench · cs.LG · arXiv 2601.20255 · score 11large language model, llm, fine-tun
  268. On-Policy Replay for Continual Supervised Fine-Tuning · cs.LG · arXiv 2605.29495 · score 11large language model, llm, fine-tun
  269. Convergence Theory for Iterative LLM-Based Neural Architecture Search: A Parametric Cross-Entropy Framework with Closed-Form Proxy Reliability · cs.LG · arXiv 2605.30103 · score 11large language model, llm, fine-tun
  270. The Biosecurity Blind Spot: Systematic Dual-use Detection in Open Science Infrastructure · cs.DL · arXiv 2605.28843 · score 11large language model, llm, rag
  271. Generative Spatiotemporal Intent Sequence Recommendation via Implicit Reasoning in Amap · cs.IR · arXiv 2605.28888 · score 11llm, reasoning, inference, latency
  272. TabPFN-3: Technical Report · cs.LG · arXiv 2605.13986 · score 11llm, inference, kv cache
  273. Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies · cs.AI · arXiv 2605.29270 · score 10llm, agent, retrieval
  274. DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation · cs.AI · arXiv 2605.29522 · score 10agent, agentic, retrieval
  275. Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents · cs.AI · arXiv 2605.30159 · score 10llm, agent, reasoning
  276. Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents · cs.AI · arXiv 2605.30335 · score 10llm, agent, retrieval
  277. No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand · cs.CL · arXiv 2605.28836 · score 10multi-agent, serving, attention
  278. LogDx-CI: Benchmarking Log Reduction Tools for LLM Root-Cause Diagnosis · cs.SE · arXiv 2605.28876 · score 10llm, agent, rag
  279. OISD: On-Policy Internal Self-Distillation of Language Models · cs.LG · arXiv 2605.29089 · score 10reasoning, serving, attention, post-train
  280. unix-ctf: Procedural Environments for Unix-Competence Reinforcement Learning · cs.CR · arXiv 2605.29115 · score 10llm, agent, fine-tun
  281. Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA · cs.SE · arXiv 2605.29277 · score 10llm, agent, reasoning
  282. From Prompts to Context: An Ontology-Driven Framework for Human-Generative AI Collaboration · cs.HC · arXiv 2605.29675 · score 10agent, retrieval, ai system
  283. CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation · cs.CL · arXiv 2605.29886 · score 10llm, retrieval, rag, reasoning
  284. Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor? · cs.CL · arXiv 2605.30152 · score 10llm, agent, gpu
  285. On Distributional Reinforcement Learning in Chaotic Dynamical Systems · cs.LG · arXiv 2605.30160 · score 10llm, multi-agent, rag
  286. Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection · cs.CR · arXiv 2605.30189 · score 10llm, serving, fine-tun
  287. VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion · cs.CV · arXiv 2605.30351 · score 10kv cache, attention, throughput, latency
  288. Graph-Enhanced Policy Optimization in LLM Agent Training · cs.AI · arXiv 2510.26270 · score 10llm, agent, rag
  289. From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning · cs.AI · arXiv 2601.21909 · score 10llm, reasoning, fine-tun, post-train
  290. SIA: Self Improving AI with Harness & Weight Updates · cs.AI · arXiv 2605.27276 · score 10agent, agentic, gpu
  291. Scaling Small Agents Through Strategy Auctions · cs.MA · arXiv 2602.02751 · score 10agent, agentic, rag
  292. Many-Shot CoT-ICL: Making In-Context Learning Truly Learn · cs.CL · arXiv 2605.13511 · score 10llm, retrieval, reasoning, chain-of-thought
  293. Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation · cs.CL · arXiv 2605.29007 · score 10llm, agent, reasoning
  294. Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs · cs.CL · arXiv 2605.30021 · score 10llm, serving, post-train
  295. HEART-Bench: Do LLM Agents Exhibit Human-like Psychology? · cs.CL · arXiv 2605.30058 · score 10llm, agent, reasoning
  296. DirectorBench: Diagnosing Long-Form Video Generation with Personalized Multi-Agent Evaluation · cs.CL · arXiv 2605.30090 · score 10llm, multi-agent, rag
  297. When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer · cs.LG · arXiv 2605.29190 · score 10llm, reasoning, chain-of-thought, post-train
  298. Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs · cs.CR · arXiv 2605.29737 · score 10llm, agent, rag
  299. Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding · cs.CL · arXiv 2512.17220 · score 10llm, retrieval, rag, reasoning
  300. SEEK: Semantic Evidence Extraction via Adaptive ChunKing for Multilingual Fact-Checking · cs.CL · arXiv 2605.26755 · score 10llm, rag, serving
  301. DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models · cs.IR · arXiv 2605.07210 · score 10llm, retrieval, attention, latency
  302. Rethinking Post-Training Recipes for Multimodal Time-Series Forecasting · cs.LG · arXiv 2605.29401 · score 10llm, reasoning, fine-tun, post-train
  303. Harmless Yet Harmful: Neutral Prompting Attacks for Stealthy Hallucination Steering in Agent Skills · cs.CR · arXiv 2605.29354 · score 10llm, agent, rag
  304. Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents · cs.LG · arXiv 2605.14241 · score 10llm, agent, latency
  305. When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis · cs.AI · arXiv 2605.29025 · score 9large language model, llm
  306. The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models · cs.AI · arXiv 2605.29123 · score 9reasoning, inference, serving
  307. Opt-Verifier: Unleashing the Power of LLMs for Optimization Modeling via Dual-Side Verification · cs.AI · arXiv 2605.29556 · score 9large language model, llm
  308. FinVerBench: Benchmark Validity and Calibration in Large Language Model Financial Statement Verification · cs.AI · arXiv 2605.29586 · score 9large language model, llm
  309. Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation · cs.AI · arXiv 2605.29652 · score 9large language model, llm
  310. NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs · cs.AI · arXiv 2605.29685 · score 9large language model, llm
  311. Toward AI Systems That Understand Self and Others: A Multi-Phase Inference Framework for Human Cognitive Diversity and World-Model Alignment · cs.AI · arXiv 2605.29930 · score 9rag, inference, ai system
  312. Teaching Values to Machines: Simulating Human-Like Behavior in LLMs · cs.AI · arXiv 2605.30036 · score 9large language model, llm
  313. Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers · cs.AI · arXiv 2605.30049 · score 9inference, serving, transformer
  314. Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale · cs.AI · arXiv 2605.30200 · score 9large language model, llm
  315. Demystifying Data Organization for Enhanced LLM Training · cs.AI · arXiv 2605.30334 · score 9large language model, llm
  316. SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations · cs.AI · arXiv 2605.30345 · score 9large language model, llm
  317. Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning · cs.CL · arXiv 2605.28829 · score 9large language model, reasoning, post-train
  318. Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation · cs.CL · arXiv 2605.28830 · score 9large language model, llm
  319. GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization · cs.CR · arXiv 2605.29107 · score 9large language model, llm
  320. Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback · cs.IR · arXiv 2605.29141 · score 9large language model, llm
  321. Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback · cs.LG · arXiv 2605.29184 · score 9large language model, llm
  322. SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing · cs.CR · arXiv 2605.29468 · score 9large language model, llm
  323. VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models · cs.RO · arXiv 2605.29562 · score 9retrieval, inference, serving
  324. Predicting Causal Effects from Natural Language Queries using Structured Representations · cs.CL · arXiv 2605.29631 · score 9large language model, llm
  325. OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning · cs.CV · arXiv 2605.29657 · score 9inference, serving, attention
  326. Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models · cs.CL · arXiv 2605.29826 · score 9large language model, llm
  327. Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering · cs.CV · arXiv 2605.29881 · score 9rag, inference, attention, throughput
  328. How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency · cs.CR · arXiv 2605.30096 · score 9large language model, llm
  329. PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding · cs.CV · arXiv 2605.30126 · score 9rag, inference, serving
  330. How LoRA Remembers? A Parametric Memory Law for LLM Finetuning · cs.CL · arXiv 2605.30260 · score 9large language model, llm
  331. LLMSurgeon: Diagnosing Data Mixture of Large Language Models · cs.CL · arXiv 2605.30348 · score 9large language model, llm
  332. Estimating the Empowerment of Language Model Agents · cs.AI · arXiv 2509.22504 · score 9agent, tool-use, rag
  333. Benchmarking at the Edge of Comprehension · cs.AI · arXiv 2602.14307 · score 9large language model, llm
  334. SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems · cs.AI · arXiv 2603.23853 · score 9reasoning, inference, ai system
  335. Automatic Layer Selection for Hallucination Detection · cs.AI · arXiv 2605.26366 · score 9large language model, llm
  336. Less Is More: Elevating RAG via Performance-Driven Context Compression · cs.CL · arXiv 2508.19282 · score 9large language model, retrieval, rag
  337. Empathic Prompting: Non-Verbal Context Integration for Multimodal LLM Conversations · cs.HC · arXiv 2510.20743 · score 9large language model, llm
  338. CORE-T: COherent REtrieval of Tables for Text-to-SQL · cs.CL · arXiv 2601.13111 · score 9llm, retrieval, inference
  339. Pushing the Limits of Block Rotations in Post-Training Quantization · cs.LG · arXiv 2601.22347 · score 9inference, quantization, transformer, post-train
  340. CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating · cs.CV · arXiv 2605.11723 · score 9reasoning, chain-of-thought, inference, fine-tun
  341. Reducing Political Manipulation with Consistency Training · cs.CL · arXiv 2605.22771 · score 9large language model, llm
  342. Large language models reorganize representational geometry during in-context learning · cs.CL · arXiv 2605.28854 · score 9large language model, llm
  343. User-Aware Active Knowledge Acquisition for Emotional Support Dialogue · cs.CL · arXiv 2605.29715 · score 9large language model, rag, reasoning
  344. AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation · cs.CL · arXiv 2605.29741 · score 9large language model, rag, fine-tun
  345. DySem: Uncovering Dynamic Semantic Components via Multilingual Consensus for Calculating Semantic Textual Similarity · cs.CL · arXiv 2605.29751 · score 9large language model, llm
  346. EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation · cs.CL · arXiv 2605.29847 · score 9large language model, llm
  347. Adaptive Targeted Dynamic Chunking for Tokenization-Free Hierarchical Model · cs.CL · arXiv 2605.30080 · score 9large language model, llm
  348. Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees · cs.CL · arXiv 2410.15761 · score 9large language model, llm
  349. HaluNet: Learning Hallucination Risk from Internal Signals in LLM Question Answering · cs.CL · arXiv 2512.24562 · score 9large language model, llm
  350. SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts · cs.CL · arXiv 2604.26506 · score 9large language model, llm
  351. Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance · cs.CV · arXiv 2411.14279 · score 9llm, inference, attention
  352. Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models · cs.LG · arXiv 2605.28896 · score 9large language model, transformer, fine-tun
  353. MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference · cs.LG · arXiv 2605.30218 · score 9llm, inference, latency
  354. Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models · cs.LG · arXiv 2602.10520 · score 9llm, reasoning, inference
  355. Enhancing LLM Training via Spectral Clipping · cs.LG · arXiv 2603.14315 · score 9large language model, llm
  356. Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance · cs.LG · arXiv 2605.00553 · score 9large language model, llm
  357. PRIM: Meta-Learned Bayesian Root Cause Analysis · cs.LG · arXiv 2605.08786 · score 9rag, inference, transformer, fine-tun
  358. PACE: Geometry-Aware Bridge Transport for Single-Cell Trajectory Inference · cs.LG · arXiv 2605.18587 · score 9rag, inference, serving
  359. Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory · cs.PF · arXiv 2605.29135 · score 9large language model, gpu, throughput
  360. ReasonOps: Operator Segmentation for LLM Reasoning Traces · cs.AI · arXiv 2605.29192 · score 8llm, reasoning, chain-of-thought
  361. BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents · cs.AI · arXiv 2605.29225 · score 8llm, agent
  362. DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning · cs.AI · arXiv 2605.29568 · score 8llm, rag, reasoning
  363. GPS-Enhanced Tourist Mobility Modeling with Seasonal Spatial Priors and LLM-Based Activity Chain Generation · cs.AI · arXiv 2605.29578 · score 8llm, serving
  364. PTCG-Bench: Can LLM Agents Master Pok'emon Trading Card Game? · cs.AI · arXiv 2605.29653 · score 8llm, agent
  365. GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents · cs.AI · arXiv 2605.29668 · score 8llm, agent
  366. Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling · cs.AI · arXiv 2605.29697 · score 8agent, agentic
  367. Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations · cs.AI · arXiv 2605.29786 · score 8llm, agent
  368. SkillsInjector: Dynamic Skill Context Construction for LLM Agents · cs.AI · arXiv 2605.29794 · score 8llm, agent
  369. MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains · cs.AI · arXiv 2605.29795 · score 8agent, retrieval, rag
  370. AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security · cs.AI · arXiv 2605.29801 · score 8agent, agentic
  371. OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields · cs.AI · arXiv 2605.29833 · score 8llm, retrieval, reasoning
  372. Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation · cs.AI · arXiv 2605.30000 · score 8llm, agent
  373. Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization · cs.CL · arXiv 2605.28969 · score 8llm, agent
  374. Real-rootedness of the Poincar'e polynomials of $\overline{\mathcal M}_{0,n}$: an AI-assisted proof · math.AG · arXiv 2605.29151 · score 8agent, agentic
  375. OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources · cs.CL · arXiv 2605.29250 · score 8retrieval, rag, serving
  376. Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles · cs.HC · arXiv 2605.29473 · score 8llm, retrieval, rag
  377. GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing · cs.SE · arXiv 2605.29532 · score 8llm, agent
  378. Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory · cs.CL · arXiv 2605.29630 · score 8agent, retrieval, rag
  379. Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents · cs.CL · arXiv 2605.29927 · score 8llm, agent
  380. BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models · cs.RO · arXiv 2605.30226 · score 8rag, serving, post-train
  381. Gram: Assessing sabotage propensities via automated alignment auditing · cs.LG · arXiv 2605.30322 · score 8agent, agentic
  382. SafeSearch: Automated Red-Teaming of LLM-Based Search Agents · cs.AI · arXiv 2509.23694 · score 8llm, agent
  383. TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis · cs.AI · arXiv 2510.06063 · score 8rag, reasoning, serving
  384. Causal-JEPA: Learning World Models through Object-Level Latent Masking · cs.AI · arXiv 2602.11389 · score 8agent, rag, reasoning
  385. ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology · cs.AI · arXiv 2605.24399 · score 8reasoning, mixture of experts, moe
  386. CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists · cs.AI · arXiv 2605.26029 · score 8llm, agent
  387. GRPO is Secretly a Process Reward Model · cs.LG · arXiv 2509.21154 · score 8llm, rag, reasoning
  388. ReflexGrad: Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing · cs.LG · arXiv 2511.14584 · score 8llm, agent
  389. Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning · cs.LG · arXiv 2602.01058 · score 8llm, reasoning, post-train
  390. Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover · cs.LG · arXiv 2603.11331 · score 8large language model, inference
  391. EvA: An Evidence-First Audio Understanding Paradigm for LALMs · cs.SD · arXiv 2603.27667 · score 8rag, reasoning, serving
  392. Graph Memory Transformer (GMT) · cs.LG · arXiv 2604.23862 · score 8serving, attention, transformer
  393. When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks · cs.CL · arXiv 2604.27272 · score 8llm, serving
  394. Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning · cs.LG · arXiv 2605.07804 · score 8rag, reasoning, serving
  395. KYA: A Framework-Agnostic Trust Layer for Autonomous Systems with Verifiable Provenance and Hierarchical Policy Composition · cs.CR · arXiv 2605.25376 · score 8agent, multi-agent
  396. Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges · cs.CR · arXiv 2605.26156 · score 8llm, serving
  397. FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning · cs.CL · arXiv 2605.29317 · score 8serving, attention, fine-tun
  398. On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training · cs.CL · arXiv 2605.29496 · score 8reasoning, chain-of-thought, fine-tun, post-train
  399. GAPD: Gold-Action Policy Distillation for Agentic Reinforcement Learning in Knowledge Base Question Answering · cs.CL · arXiv 2605.29584 · score 8agent, agentic
  400. Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering · cs.CL · arXiv 2605.29648 · score 8llm, rag, reasoning
  401. HTAM: Hierarchical Transition-Attended Memory for Operator Optimization · cs.CL · arXiv 2605.29734 · score 8llm, gpu, cuda
  402. GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German · cs.CL · arXiv 2605.30214 · score 8llm, rag, reasoning
  403. LoMo: Local Modality Substitution for Deeper Vision-Language Fusion · cs.CV · arXiv 2605.30265 · score 8rag, reasoning, serving
  404. Procedural Pretraining: Warming Up Language Models with Abstract Data · cs.CL · arXiv 2601.21725 · score 8llm, reasoning, attention
  405. Ask Now, Use Later: Benchmarking the Proactivity Gap in Long-Lived LLM Agents · cs.CL · arXiv 2605.28108 · score 8llm, agent
  406. ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents · cs.MA · arXiv 2604.07789 · score 8agent, agentic
  407. When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL · cs.LG · arXiv 2605.28918 · score 8llm, agent
  408. Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting · cs.LG · arXiv 2605.29727 · score 8speculative decoding, gpu, latency
  409. Adapting Automotive Aerodynamics Surrogates to New Vehicle Families via Transfer Learning · cs.CE · arXiv 2605.27968 · score 8serving, transformer, fine-tun
  410. Anytime-Valid Federated Conformal RAG for LLM Swarms · stat.ML · arXiv 2605.29139 · score 8llm, retrieval, rag
  411. Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning · cs.LG · arXiv 2506.05985 · score 8agent, rag, fine-tun
  412. SADA: Safe and Adaptive Aggregation of Multiple Black-Box Predictions in Semi-Supervised Learning · stat.ML · arXiv 2509.21707 · score 8large language model, inference
  413. A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments · cs.LG · arXiv 2512.13517 · score 8agent, rag, reasoning
  414. Ciphera: A Decentralised Biometric Identity Framework · cs.CR · arXiv 2605.29868 · score 8rag, serving, latency
  415. UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents · cs.AI · arXiv 2605.29534 · score 7agent, inference
  416. MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization · cs.AI · arXiv 2605.29951 · score 7rag, reasoning, inference
  417. LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation · cs.LG · arXiv 2605.29280 · score 7inference, serving
  418. Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset · cs.CV · arXiv 2605.29462 · score 7rag, reasoning, inference
  419. DLM-SWAI: Steering Diffusion Language Models Before They Unmask · cs.CL · arXiv 2605.29626 · score 7inference, serving
  420. ESPO: Early-Stopping Proximal Policy Optimization · cs.LG · arXiv 2605.29860 · score 7large language model, reasoning
  421. Unlocking the Working Memory of Large Language Models for Latent Reasoning · cs.CL · arXiv 2605.30343 · score 7large language model, reasoning
  422. Modeling Hierarchical Thinking in Large Reasoning Models · cs.AI · arXiv 2510.22437 · score 7reasoning, chain-of-thought, inference
  423. Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models · cs.LG · arXiv 2601.01162 · score 7large language model, rag
  424. Steering Language Models Before They Speak: Logit-Level Interventions · cs.CL · arXiv 2601.10960 · score 7inference, serving
  425. From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons · cs.CL · arXiv 2605.27387 · score 7large language model, attention
  426. LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English · cs.CL · arXiv 2605.29048 · score 7llm, inference
  427. Kronecker Embeddings: Byte-Level Structured Token Representations for Parameter-Efficient Language Models · cs.CL · arXiv 2605.29459 · score 7large language model, attention
  428. Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation · cs.CL · arXiv 2605.29714 · score 7rag, moe, fine-tun
  429. Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations · cs.CL · arXiv 2601.08064 · score 7large language model, rag
  430. Mining or Synthesis? Rethinking Exploration Efficiency in Iterative Alignment of Mathematical Reasoning · cs.CL · arXiv 2602.05370 · score 7large language model, reasoning
  431. Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation · cs.CL · arXiv 2605.26428 · score 7large language model, rag
  432. When the Same Coefficients Reach Different Places: Asymmetric Realizability in Transplanting Tokenizers across Large Language Models · cs.LG · arXiv 2601.00065 · score 7large language model, fine-tun
  433. NeuroEdge: Real-Time Hand Gesture Recognition with High-Density EMG Using Deep Learning at the Edge · cs.LG · arXiv 2605.29326 · score 7rag, inference, latency
  434. Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics · cs.LG · arXiv 2605.29351 · score 7inference, attention, transformer
  435. AsymVLM: Asymmetric Token Pruning for Efficient Vision-Language Model Inference · cs.LG · arXiv 2605.29535 · score 7llm, inference
  436. STAP: A Shuffle-Tokenized App Predictor with Ultra Long Context for Vocabulary-Free Mobile App Prediction · cs.LG · arXiv 2605.29863 · score 7inference, transformer, latency
  437. CLUBench: A Clustering Benchmark · cs.LG · arXiv 2605.29933 · score 7large language model, rag
  438. Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables · cs.LG · arXiv 2605.30229 · score 7inference, attention, transformer
  439. Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research · stat.ML · arXiv 2605.29249 · score 7inference, serving
  440. FPLIER: Federated Pathway-Level Information Extractor · cs.LG · arXiv 2605.29587 · score 7inference, distributed training
  441. AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training · cs.DC · arXiv 2605.29664 · score 7serving, parallelism
  442. Fisher-Preserving Guidance: Training-Free Manifold Constraints for Safe Diffusion Control · cs.RO · arXiv 2605.29937 · score 7inference, serving
  443. SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation · cs.CV · arXiv 2605.30116 · score 7inference, serving
  444. DiScoFormer: Plug-In Density and Score Estimation with Transformers · cs.LG · arXiv 2511.05924 · score 7inference, attention, transformer
  445. Learning to Solve PDEs on Neural Shape Representations · cs.LG · arXiv 2512.21311 · score 7inference, serving
  446. Transformed Latent Variable Multi-Output Gaussian Processes · cs.LG · arXiv 2605.05133 · score 7inference, serving
  447. CompilerDream: Learning a Compiler World Model for General Code Optimization · cs.PL · arXiv 2404.16077 · score 7agent, compiler
  448. Understanding and Reducing Metadata-Driven Host Overheads in Sampling-Based GNN Training · cs.DC · arXiv 2605.29346 · score 7parallelism, gpu, cuda
  449. BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation · cs.AI · arXiv 2605.28994 · score 6llm, reasoning
  450. Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents · cs.AI · arXiv 2605.29174 · score 6agent, rag
  451. Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth · cs.AI · arXiv 2605.29234 · score 6llm, retrieval
  452. OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories · cs.AI · arXiv 2605.29253 · score 6agent, fine-tun
  453. Xetrieval: Mechanistically Explaining Dense Retrieval · cs.AI · arXiv 2605.29507 · score 6retrieval, reasoning, chain-of-thought
  454. Uncertainty-Aware Transfer Learning for Cross-Building Energy Forecasting: Toward Robust and Scalable District-Level Energy Management · cs.AI · arXiv 2605.29733 · score 6rag, transformer, fine-tun
  455. Accelerating Constrained Decoding with Token Space Compression · cs.AI · arXiv 2605.29986 · score 6llm, latency
  456. Conformal Certification of Reasoning Trace Prefixes · cs.AI · arXiv 2605.30085 · score 6reasoning, serving
  457. VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing · cs.AI · arXiv 2605.30117 · score 6serving, attention
  458. MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection · cs.AI · arXiv 2605.30288 · score 6llm, post-train
  459. Specialty-Specific Medical Language Model for Immune-Mediated Diseases · cs.CL · arXiv 2605.28838 · score 6llm, transformer
  460. PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation · cs.LG · arXiv 2605.28867 · score 6rag, serving
  461. AIRGuard: Guarding Agent Actions with Runtime Authority Control · cs.CR · arXiv 2605.28914 · score 6agent, reasoning
  462. MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs · cs.CL · arXiv 2605.29300 · score 6llm, fine-tun
  463. TRACER: Persistent Regularization for Robust Multimodal Finetuning · cs.LG · arXiv 2605.29380 · score 6rag, serving
  464. Composing Non-Conjugate Factor Graphs with Closed-Form Variational Inference · cs.LG · arXiv 2605.29467 · score 6inference, mixture of experts
  465. PhoneWorld: Scaling Phone-Use Agent Environments · cs.CL · arXiv 2605.29486 · score 6agent, rag
  466. Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization · cs.LG · arXiv 2605.29547 · score 6serving, quantization
  467. COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings · cs.SD · arXiv 2605.29628 · score 6retrieval, serving
  468. Personalized Turn-Level User Conversation Satisfaction Benchmark · cs.CL · arXiv 2605.29711 · score 6llm, retrieval
  469. Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions · cs.CL · arXiv 2605.29738 · score 6llm, reasoning
  470. A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging · eess.IV · arXiv 2605.29753 · score 6rag, serving
  471. Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs · cs.HC · arXiv 2605.29928 · score 6llm, reasoning
  472. Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders · cs.CL · arXiv 2605.30022 · score 6retrieval, attention, transformer
  473. Do Language Models Track Entities Across State Changes? · cs.CL · arXiv 2605.30233 · score 6rag, reasoning, transformer
  474. Reinforcement Learning with Robust Rubric Rewards · cs.CV · arXiv 2605.30244 · score 6llm, reasoning
  475. Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models · cs.AI · arXiv 2604.10219 · score 6rag, reasoning, attention
  476. Human-Guided Harm Recovery for Computer Use Agents · cs.AI · arXiv 2604.18847 · score 6agent, rag
  477. Dataset-Driven Channel Masks in Transformers for Multivariate Time Series · cs.LG · arXiv 2410.23222 · score 6rag, attention, transformer
  478. Obfuscation Rules for Detecting and Detoxifying Korean Toxicity · cs.CL · arXiv 2510.10961 · score 6llm, attention
  479. Topological Order in Neural Wavefunctions · cs.AI · arXiv 2512.01863 · score 6llm, attention
  480. The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation · cs.IR · arXiv 2512.10388 · score 6serving, quantization
  481. Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought · cs.CL · arXiv 2603.05488 · score 6reasoning, chain-of-thought, attention
  482. SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems · cs.CR · arXiv 2604.06811 · score 6agent, rag
  483. Intent-aligned Autonomous Spacecraft Guidance via Reasoning Models · eess.SY · arXiv 2604.17176 · score 6reasoning, serving
  484. SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction · cs.CL · arXiv 2605.23440 · score 6llm, rag
  485. The Alignment Floor: How Persona Customization Breaks Safety in Weakly-Aligned LLMs · cs.HC · arXiv 2605.27382 · score 6llm, rlhf
  486. From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale · cs.CL · arXiv 2605.28826 · score 6llm, rlhf
  487. Slogans or Stance? A Label-Light Diagnostic for Entrepreneurial-Discourse Measurement on Chinese SOE Speeches · cs.CL · arXiv 2605.29188 · score 6llm, rag
  488. LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents · cs.CL · arXiv 2605.29559 · score 6agent, fine-tun
  489. A Dual-Path Architecture for Scaling Compute and Capacity in LLMs · cs.CL · arXiv 2605.30202 · score 6llm, transformer
  490. COMPOSE: Composing Future Theorems from Citations and Formal Structure · cs.CL · arXiv 2605.30333 · score 6llm, retrieval
  491. RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains · cs.LG · arXiv 2605.29156 · score 6llm, post-train
  492. Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents · cs.CV · arXiv 2605.29447 · score 6agent, fine-tun
  493. GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases · cs.IR · arXiv 2605.30237 · score 6retrieval, rag, fine-tun
  494. Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning · cs.CL · arXiv 2508.19202 · score 6llm, reasoning
  495. Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context · cs.CL · arXiv 2510.06182 · score 6retrieval, rag, reasoning
  496. MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark · cs.CL · arXiv 2601.04633 · score 6rag, reasoning, fine-tun
  497. Do not be greedy, Think Twice: Sampling and Selection for Document-level Information Extraction · cs.CL · arXiv 2601.18395 · score 6llm, reasoning
  498. Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs · cs.CL · arXiv 2603.27518 · score 6llm, transformer
  499. TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script) · cs.CL · arXiv 2605.04583 · score 6rag, serving
  500. Beyond Transcripts: A Renewed Perspective on Audio Chaptering · cs.SD · arXiv 2602.08979 · score 6llm, rag
  501. FedQHD: Closed-Form Function-Space Federated Reinforcement Learning · cs.LG · arXiv 2605.29002 · score 6agent, rag
  502. Apertus LLM Family Expansion via Distillation and Quantization · cs.LG · arXiv 2605.29128 · score 6llm, quantization
  503. MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding · cs.LG · arXiv 2605.29850 · score 6rag, attention, transformer
  504. Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching · cs.LG · arXiv 2605.30337 · score 6llm, retrieval
  505. An End-to-End PyTorch Interface for Differentiable PDE Solvers: A RANS Model-Correction Study · cs.CE · arXiv 2605.28858 · score 6llm, rag
  506. Offline Multi-agent Reinforcement Learning via Sequential Score Decomposition · cs.LG · arXiv 2505.05968 · score 6multi-agent, rag
  507. In-Place Feedback: Reliable Refinement for Multi-Turn Expert-LLM Collaboration · cs.LG · arXiv 2510.00777 · score 6llm, reasoning
  508. Optimization and Generation in Aerodynamics Inverse Design · cs.LG · arXiv 2602.03582 · score 6rag, serving
  509. Advancing multi-site emission control: A physics-informed transfer learning framework with mixture of experts for carbon-pollutant synergy · cs.LG · arXiv 2604.26571 · score 6mixture of experts, moe
  510. SMolLM: Small Language Models Learn Small Molecular Grammar · cs.LG · arXiv 2605.06322 · score 6llm, transformer
  511. TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection · cs.LG · arXiv 2605.08870 · score 6rag, serving
  512. Faster Molecular Dynamics with Neural Network Potentials via Distilled Multiple Time-Stepping and Non-Conservative Forces · cs.LG · arXiv 2602.14975 · score 6serving, fine-tun
  513. LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol · eess.IV · arXiv 2603.14644 · score 6serving, transformer
  514. IORM: Hierarchical I/O Governance for Thousands of Consolidated Databases on Oracle Exadata · cs.DB · arXiv 2605.29006 · score 6rag, scheduler, latency
  515. Trends in AI and Human-AI Interaction in Clinical Trials – A Hybrid Human-AI Exploration · cs.AI · arXiv 2605.29096 · score 5large language model
  516. Context Distillation as Latent Memory Management · cs.LG · arXiv 2605.28889 · score 5retrieval, inference
  517. The Hamilton-Jacobi Theory of Deep Learning · cs.LG · arXiv 2605.28983 · score 5inference, transformer
  518. GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection · cs.CV · arXiv 2605.29539 · score 5inference, fine-tun
  519. EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL · cs.CL · arXiv 2605.29670 · score 5rag, inference
  520. CB-SLICE: Concept-Based Interpretable Error Slice Discovery · cs.LG · arXiv 2605.29836 · score 5rag, inference
  521. Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion · stat.ML · arXiv 2605.30319 · score 5rag, inference
  522. You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention · cs.AI · arXiv 2605.27580 · score 5inference, attention
  523. Relational In-Context Learning via Synthetic Pre-training with Structural Prior · cs.LG · arXiv 2603.03805 · score 5reasoning, inference
  524. Rethinking Stepwise Model Routing: A Cost-Efficient Table Reasoning Perspective · cs.CL · arXiv 2605.29319 · score 5reasoning, inference
  525. Evaluating Cross-lingual Knowledge Consistency in Code-Mixed vis-a-vis Indian Languages using IndicKLAR · cs.CL · arXiv 2605.29637 · score 5large language model
  526. ExCAM: Explainable Cultural Awareness Metrics · cs.CL · arXiv 2605.29897 · score 5large language model
  527. Causal Interventions on Continuous Variables: A Case Study on Verb Bias in Steering Vectors for In-Context Learning · cs.CL · arXiv 2605.29971 · score 5large language model
  528. Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization · cs.CL · arXiv 2604.13197 · score 5reasoning, inference
  529. Moment Matching Q-Learning · cs.LG · arXiv 2605.29033 · score 5inference, latency
  530. Deep Adaptive Dimension Reduction for Bayesian Inference in Inverse Problems · cs.LG · arXiv 2605.29373 · score 5inference, fine-tun
  531. A Full-Pipeline Framework for Evaluating Membership Inference Attacks in Machine Learning · cs.LG · arXiv 2605.29454 · score 5inference, post-train
  532. A Geometric View of SRC: Learning Representations for Stable Residual Inference · cs.LG · arXiv 2605.29673 · score 5rag, inference
  533. CRB-Guided Framework Design and Resource Allocation for Indoor mmWave ISCC Systems · cs.IT · arXiv 2605.29939 · score 5inference, latency
  534. TraceCodec: A Compiler-Backed Neural Codec for Stateful Multi-Flow Network Traffic Traces · cs.NI · arXiv 2605.29941 · score 5rag, compiler
  535. Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series · stat.ML · arXiv 2605.30292 · score 5rag, inference
  536. KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks · cs.LG · arXiv 2411.00278 · score 5rag, inference
  537. Diffusion-based learning framework for Constrained Nonconvex Optimization with Weighted Bootstrapped Refinement · cs.LG · arXiv 2502.10330 · score 5rag, inference
  538. Solved in Unit Domain: JacobiNet for Differentiable Coordinate-Transformed PINNs · cs.LG · arXiv 2508.02537 · score 5rag, inference
  539. Routing by Reaching: Composition of Pre-trained GFlowNets for Multi-Objective Generation · cs.LG · arXiv 2602.21565 · score 5inference, fine-tun
  540. Accelerating trajectory optimization with Sobolev-trained diffusion policies · cs.LG · arXiv 2604.19011 · score 5inference, latency
  541. Order-Agnostic Autoregressive Modelling with Missing Data · cs.LG · arXiv 2605.06355 · score 5rag, inference
  542. Stage-wise Distortion-Perception Traversal in Zero-shot Inverse Problems with Diffusion Models · cs.LG · arXiv 2605.28711 · score 5rag, inference
  543. Noise-Aware Differentially Private Variational Inference · stat.ML · arXiv 2410.19371 · score 5rag, inference
  544. MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation · stat.ML · arXiv 2604.05446 · score 5rag, inference
  545. CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation · cs.RO · arXiv 2605.22082 · score 5inference, transformer
  546. Stop Suppressing the Tail: Causal Inference for Extreme Events · stat.ML · arXiv 2605.27474 · score 5rag, inference
  547. Rapid GPU-Based Pangenome Graph Layout · cs.DC · arXiv 2409.00876 · score 5parallelism, gpu
  548. Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction · cs.AI · arXiv 2605.28849 · score 4llm
  549. Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction · cs.AI · arXiv 2605.28855 · score 4llm
  550. The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling · cs.AI · arXiv 2605.28864 · score 4transformer, fine-tun
  551. Review Arcade: On the Human Alignment and Gameability of LLM Reviews · cs.AI · arXiv 2605.28897 · score 4llm
  552. Orthogonal Concept Erasure for Diffusion Models · cs.AI · arXiv 2605.28902 · score 4serving
  553. Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild · cs.AI · arXiv 2605.29018 · score 4llm
  554. Differentiable Belief-based Opponent Shaping · cs.AI · arXiv 2605.29042 · score 4multi-agent
  555. The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure · cs.AI · arXiv 2605.29087 · score 4reasoning, chain-of-thought
  556. PRO-CUA: Process-Reward Optimization for Computer Use Agents · cs.AI · arXiv 2605.29119 · score 4agent
  557. Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark · cs.AI · arXiv 2605.29400 · score 4reasoning, fine-tun
  558. ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control · cs.AI · arXiv 2605.29425 · score 4serving
  559. CrystalXRD-Bench: Benchmarking Vision-Language Models for XRD Peak Indexing Across Diverse Crystalline Materials · cs.AI · arXiv 2605.29446 · score 4rag, reasoning
  560. HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering · cs.AI · arXiv 2605.29606 · score 4retrieval, rag
  561. Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures · cs.AI · arXiv 2605.29629 · score 4llm
  562. Benchmarking Positional Encoding Strategies for Transformer-Based EEG Foundation Models · cs.AI · arXiv 2605.29754 · score 4transformer, fine-tun
  563. Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk · cs.AI · arXiv 2605.29788 · score 4agent
  564. RAISE: RAG Design as an Architecture Search Problem · cs.AI · arXiv 2605.30029 · score 4retrieval, rag
  565. BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders · cs.AI · arXiv 2605.30162 · score 4rag, fine-tun
  566. Persona Conditioning of Brand Recommendations in Retrieval-Augmented Commercial Chat: A Prominence-Stratified Cross-Provider Audit · cs.AI · arXiv 2605.30207 · score 4retrieval, rag
  567. Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection · cs.AI · arXiv 2605.30344 · score 4reasoning, fine-tun
  568. Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software · cs.AI · arXiv 2605.30353 · score 4agent
  569. Self-Play Reinforcement Learning under Imperfect Information in Big 2 · cs.LG · arXiv 2605.28863 · score 4agent
  570. Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision · cs.LG · arXiv 2605.28865 · score 4agent
  571. TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models · cs.LG · arXiv 2605.28868 · score 4retrieval, rag
  572. Representation Alignment Rests on Linear Structure · cs.LG · arXiv 2605.28870 · score 4llm
  573. Quantum-Enhanced Adversarial Robustness in Artificial Intelligence · cs.CR · arXiv 2605.28899 · score 4ai system
  574. Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening · cs.CR · arXiv 2605.28999 · score 4llm
  575. Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning · cs.LG · arXiv 2605.29028 · score 4rag, fine-tun
  576. Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG · cs.CL · arXiv 2605.29084 · score 4retrieval, rag
  577. When and How Long? The Readout-Mediator Angle in Temporal Reasoning · cs.LG · arXiv 2605.29126 · score 4reasoning, attention
  578. Evolutionary Refinement of Generative Graph Topologies: A Hybrid WGAN-GA Approach · cs.LG · arXiv 2605.29161 · score 4serving
  579. Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits · cs.CL · arXiv 2605.29268 · score 4llm
  580. Does Distributed Training Undermine Compute Governance? · cs.CY · arXiv 2605.29359 · score 4distributed training
  581. Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies · cs.IR · arXiv 2605.29384 · score 4retrieval, rag
  582. On the Optimizer Dependence of Neural Scaling Laws · cs.LG · arXiv 2605.29387 · score 4llm
  583. How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions · cs.SE · arXiv 2605.29442 · score 4agent
  584. Honest Lying: Understanding Memory Confabulation in Reflexive Agents · cs.LG · arXiv 2605.29463 · score 4agent
  585. AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling · cs.CV · arXiv 2605.29488 · score 4rag, transformer
  586. Brain-IT-VQA: From Brain Signals to Answers · cs.CV · arXiv 2605.29588 · score 4rag, transformer
  587. Energy-Aware NECO for Single-Pass Pixel-wise Out-of-Distribution Detection in Semantic Segmentation · cs.CV · arXiv 2605.29773 · score 4serving
  588. Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions · eess.AS · arXiv 2605.29862 · score 4serving
  589. Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate · cs.CL · arXiv 2605.29889 · score 4llm
  590. Genetically Aligned Patient Representations Improve Hematological Diagnosis · cs.CV · arXiv 2605.29980 · score 4retrieval, transformer
  591. Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation · cs.SD · arXiv 2605.30031 · score 4reasoning, latency
  592. Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models · cs.LG · arXiv 2605.30038 · score 4fine-tun, post-train
  593. REPOT: Recoverable Program-of-Thought via Checkpoint Repair · cs.SE · arXiv 2605.30052 · score 4llm
  594. xModel-KD: Cross-modal Knowledge Distillation for 3D Scene Perception using LiDAR · cs.CV · arXiv 2605.30111 · score 4retrieval, rag
  595. iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis · cs.LG · arXiv 2605.30179 · score 4llm
  596. PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions · cs.CV · arXiv 2605.30268 · score 4agent
  597. Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments · cs.RO · arXiv 2605.30280 · score 4rag, reasoning
  598. Archon: A Unified Multimodal Model for Holistic Digital Human Generation · cs.CV · arXiv 2605.30311 · score 4serving
  599. Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes · cs.GR · arXiv 2605.30318 · score 4llm
  600. Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure · cs.AI · arXiv 2602.08783 · score 4reasoning, chain-of-thought
  601. FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse Autoformalization · cs.AI · arXiv 2603.19828 · score 4llm
  602. When Models Learn to Ask Why: Adaptive Causal Reasoning for Trustworthy Medical Vision-Language Models · cs.AI · arXiv 2603.23085 · score 4reasoning, chain-of-thought
  603. SVSR: A Self-Verification and Self-Rectification Paradigm for Multimodal Reasoning · cs.AI · arXiv 2604.10228 · score 4reasoning, fine-tun
  604. Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents · cs.AI · arXiv 2604.11088 · score 4agent
  605. NOVA: Fundamental Limits of Knowledge Discovery Through AI · cs.AI · arXiv 2605.15219 · score 4ai system
  606. AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence · cs.AI · arXiv 2605.21739 · score 4llm
  607. MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV Generation Forecasting · cs.LG · arXiv 2306.10356 · score 4attention, transformer
  608. Crafting Desirable Climate Trajectories with RL Explored Socio-Environmental Simulations · cs.AI · arXiv 2410.07287 · score 4agent
  609. VRAG: Learning World Models for Interactive Video Generation · cs.CV · arXiv 2505.21996 · score 4retrieval, rag
  610. Online Fair Division with Additional Information · cs.GT · arXiv 2505.24503 · score 4agent
  611. Position: Text Embeddings Should Capture Implicit Semantics, Not Just Surface Meaning · cs.CL · arXiv 2506.08354 · score 4rag, reasoning
  612. Finding DoRI: Discovery of Retained Images in Diffusion Models · cs.CV · arXiv 2507.16880 · score 4rag, fine-tun
  613. Scalable RF Simulation in Generative 4D Worlds · cs.CV · arXiv 2508.12176 · score 4serving
  614. Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy · cs.LG · arXiv 2509.21190 · score 4rag, transformer
  615. ScheduleStream: Temporal Planning with Samplers for GPU-Accelerated Multi-Arm Task and Motion Planning & Scheduling · cs.RO · arXiv 2511.04758 · score 4rag, gpu
  616. Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom · cs.LG · arXiv 2511.11703 · score 4agent
  617. Revisiting the Reliability of Language Models in Instruction-Following · cs.SE · arXiv 2512.14754 · score 4llm
  618. HD-Prot: A Protein Language Model for Joint Sequence-Structure Modeling with Continuous Structure Tokens · cs.CE · arXiv 2512.15133 · score 4quantization, fine-tun
  619. NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning · cs.LG · arXiv 2601.19947 · score 4serving
  620. Learn from A Rationalist: Distilling Intermediate Interpretable Rationales · cs.LG · arXiv 2601.22531 · score 4attention, transformer
  621. AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing · cs.CL · arXiv 2603.23069 · score 4serving
  622. AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation · cs.SE · arXiv 2605.12925 · score 4agent
  623. EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents · cs.SD · arXiv 2605.13841 · score 4agent
  624. Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate · cs.LG · arXiv 2605.25134 · score 4serving
  625. QuITE: Query-Based Irregular Time Series Embedding · cs.LG · arXiv 2605.28166 · score 4rag, attention
  626. What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs · cs.CL · arXiv 2605.28823 · score 4llm
  627. A Modular Architecture for Typologically Controlled Lexicon Generation · cs.CL · arXiv 2605.28824 · score 4llm
  628. Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models · cs.CL · arXiv 2605.28913 · score 4reasoning, chain-of-thought
  629. Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization · cs.CL · arXiv 2605.29274 · score 4llm
  630. Accommodation Goes Both Ways: Studying Linguistic Convergence Between Humans and Language Models · cs.CL · arXiv 2605.29278 · score 4llm
  631. STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments · cs.CL · arXiv 2605.29324 · score 4agent
  632. A Study on Question-Answer Dataset for LLM Safety Evaluation with a Focus on Illegal Activities · cs.CL · arXiv 2605.29340 · score 4llm
  633. BrahmicTokenizer-131K: An Indic-Capable Drop-In Replacement for o200k_base · cs.CL · arXiv 2605.29379 · score 4serving
  634. Scaling Laws for Agent Harnesses via Effective Feedback Compute · cs.CL · arXiv 2605.29682 · score 4agent
  635. Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking · cs.CL · arXiv 2605.30107 · score 4retrieval, rag
  636. CorPipe at CRAC 2026: Empty Nodes and Cross-Lingual Transfer in Multilingual Coreference Resolution · cs.CL · arXiv 2605.30133 · score 4llm
  637. Resolution Diagnostics for Paired LLM Evaluation · cs.CL · arXiv 2605.30315 · score 4llm
  638. Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence · cs.SE · arXiv 2605.29054 · score 4agent
  639. Offloading Score: Measuring AI Reliance Through Counterfactual Workflows · cs.SE · arXiv 2605.29392 · score 4agent
  640. DiffSpot: Can VLMs Spot Fine-Grained Visual Differences in Web Interfaces? · cs.CV · arXiv 2605.29615 · score 4agent
  641. How’s it going? Reinforcement learning in language models recruits a functional welfare axis · cs.LG · arXiv 2605.30232 · score 4fine-tun, post-train
  642. VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents · cs.CV · arXiv 2605.30256 · score 4agent
  643. Interactive In-Meeting Speaker Correction with Human Feedback · cs.CL · arXiv 2509.18377 · score 4llm
  644. The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs · cs.CL · arXiv 2601.03134 · score 4llm
  645. One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them · cs.LG · arXiv 2605.28839 · score 4attention, transformer
  646. Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning · cs.LG · arXiv 2605.28990 · score 4rag, fine-tun
  647. Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning · cs.LG · arXiv 2605.29032 · score 4agent
  648. Parallel Adaptive Multi-Objective Evolutionary Learning of Discretized Bayesian Network Classifiers for Clinical Data · cs.LG · arXiv 2605.29058 · score 4serving
  649. Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules · cs.LG · arXiv 2605.29075 · score 4llm
  650. Solving Integer Linear Programming with Parallel Tempering · cs.LG · arXiv 2605.29366 · score 4serving
  651. Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging · cs.LG · arXiv 2605.29489 · score 4llm
  652. Cluster-Level Attention-Guided Parallel Decoding for Masked Diffusion Language Models · cs.LG · arXiv 2605.29607 · score 4reasoning, attention
  653. M=oLe-{\Lambda}: Learning the Coupled-Cluster Response State for Energies, Gradients, and Properties · cs.LG · arXiv 2605.29622 · score 4serving
  654. Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames · cs.LG · arXiv 2605.29634 · score 4attention, transformer
  655. Momentum Based Reward Design for Low Emission Traffic Signal Control · cs.LG · arXiv 2605.29693 · score 4rag, throughput
  656. MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated Fusion · cs.LG · arXiv 2605.29765 · score 4llm
  657. Open Problem: Separating Geometric and Algorithmic Compression via Cayley-Table Completion · cs.LG · arXiv 2605.29885 · score 4serving
  658. Reducing Experimental Testing in Space Propulsion Film Cooling Analyses by Pixelwise Generative Image Interpolation · cs.LG · arXiv 2605.29911 · score 4serving
  659. A Fully Convolutional Approach to Denoising Structural Dynamics Data from X-Ray Photon Correlation Spectroscopy · cs.LG · arXiv 2605.29975 · score 4serving
  660. Improving Adversarial Robustness of Attribution via Implicit Regularization · cs.LG · arXiv 2605.29983 · score 4attention, transformer
  661. RL2ML: Finite-Rollout Surrogate Objectives from Reinforcement Learning to Maximum Likelihood · cs.LG · arXiv 2605.30154 · score 4serving
  662. Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents · cs.LG · arXiv 2605.30190 · score 4agent
  663. Digitally enriching a screening population for pancreatic cancer using routine blood-based measures and clinical histories · cs.LG · arXiv 2605.30275 · score 4attention, transformer
  664. Towards a Foundation Model for the Martian Atmosphere · cs.LG · arXiv 2605.28851 · score 4retrieval, rag
  665. Eulerian Gaussian Splatting using Hashed Probability Pyramids · cs.CV · arXiv 2605.29136 · score 4serving
  666. Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion · cs.SD · arXiv 2605.29531 · score 4attention, fine-tun
  667. EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation · cs.CV · arXiv 2605.29977 · score 4reasoning, attention
  668. Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance · cs.RO · arXiv 2605.30056 · score 4rag, attention
  669. Privacy-Enhanced Zero-Order Federated Learning via xMK-CKKS over Wireless Channels · cs.CR · arXiv 2605.30123 · score 4serving
  670. SAHG: Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection · cs.SI · arXiv 2605.30166 · score 4llm
  671. Looking around you: external information enhances representations for event sequences · cs.LG · arXiv 2502.10205 · score 4attention, fine-tun
  672. Multi-level Collaborative Distillation Meets Global Workspace Model: A Unified Framework for OCIL · cs.LG · arXiv 2508.08677 · score 4serving
  673. Horizon Activation Mapping for Neural Networks in Time Series Forecasting · cs.LG · arXiv 2601.02094 · score 4rag, attention
  674. Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations · cs.LG · arXiv 2602.01456 · score 4serving
  675. Size Transferability of Graph Transformers with Convolutional Positional Encodings · cs.LG · arXiv 2602.15239 · score 4attention, transformer
  676. Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models · cs.LG · arXiv 2602.19619 · score 4llm
  677. Statistical Consistency and Generalization of Contrastive Representation Learning · cs.LG · arXiv 2605.02116 · score 4retrieval, attention
  678. Building a privacy-preserving Federated Recommender system for mobile devices · cs.LG · arXiv 2605.22924 · score 4serving
  679. On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series · cs.LG · arXiv 2605.26194 · score 4attention, transformer
  680. Density-aware Sample-specific Attack · cs.LG · arXiv 2605.27809 · score 4fine-tun, post-train
  681. Adversarial Robustness in One-Stage Learning-to-Defer · stat.ML · arXiv 2510.10988 · score 4serving
  682. Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds · cs.CV · arXiv 2510.27391 · score 4attention, transformer
  683. Envy-Free Allocation of Indivisible Goods via Noisy Queries · cs.GT · arXiv 2602.06361 · score 4agent
  684. RAFI – A Ray/Work Forwarding Infrastructure for Data Parallel Multi-Node/Multi-GPU Computing · cs.DC · arXiv 2605.30294 · score 4gpu, cuda
  685. A Quick and Exact Method for Distributed Quantile Computation · cs.DC · arXiv 2511.12025 · score 4rag, latency
  686. A Secure, Manifest-Based Framework for Delegated Privilege Promotion · cs.CR · arXiv 2605.28991 · score 4serving
  687. LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers · cs.LG · arXiv 2605.29005 · score 3inference
  688. A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router · math.DS · arXiv 2605.29121 · score 3moe
  689. Stochastic Lifting for Generating Trajectories of Stochastic Physical Systems · cs.LG · arXiv 2605.29194 · score 3inference
  690. Causal Disentanglement-Inspired Degradation Representation Learning for Full-Reference Image Quality Assessment · cs.CV · arXiv 2604.21654 · score 3inference
  691. Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification · stat.ML · arXiv 2605.12208 · score 3inference
  692. Auditing Training Data in Generative Music Models via Black-Box Membership Inference · cs.LG · arXiv 2605.29202 · score 3inference
  693. From Short Histories to Long Futures: Horizon-Aware Graph Neural Networks for Long Horizon Forecasting · cs.LG · arXiv 2605.29952 · score 3inference
  694. Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption · cs.LG · arXiv 2605.30089 · score 3inference
  695. When, why, and how do diffusion posterior samplers fail? A finite-sample lens · cs.LG · arXiv 2605.30330 · score 3inference
  696. Mixing Vector Model for Copolymer Inference via Mixed Integer Linear Programming · cs.LG · arXiv 2605.29329 · score 3inference
  697. Wasserstein Contraction of Coordinate Ascent Variational Inference · stat.ML · arXiv 2605.30253 · score 3inference
  698. Cooperative Variance Estimation and Bayesian Neural Networks for Disentangling Aleatoric and Epistemic Uncertainties · cs.LG · arXiv 2505.02743 · score 3inference
  699. Adaptive Exponential Integration for Stable Gaussian Mixture Black-Box Variational Inference · cs.LG · arXiv 2601.14855 · score 3inference
  700. Riemannian AmbientFlow: Towards Simultaneous Manifold Learning and Generative Modeling from Corrupted Data · cs.LG · arXiv 2601.18728 · score 3inference
  701. Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning · cs.LG · arXiv 2605.01663 · score 3inference
  702. Uncertainty Estimation via Hyperspherical Confidence Mapping · cs.LG · arXiv 2605.05964 · score 3inference
  703. Inpainting physics: self-supervised learning for context-driven fluid simulation · cs.LG · arXiv 2605.08832 · score 3inference
  704. Matryoshka Concept Bottleneck Models · cs.LG · arXiv 2605.20612 · score 3inference
  705. Enhancing Membership Inference Attacks on Diffusion Models from a Frequency-Domain Perspective · cs.CR · arXiv 2505.20955 · score 3inference
  706. Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models · stat.ML · arXiv 2605.28488 · score 3inference
  707. Constant Depth Threshold Circuits For Exhaustive Epistasis Detection · cs.AR · arXiv 2605.29719 · score 3parallelism
  708. Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics · cs.AI · arXiv 2605.29078 · score 2rag
  709. Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI · cs.AI · arXiv 2605.29240 · score 2attention
  710. Rubric-Guided Process Reward for Stepwise Model Routing · cs.AI · arXiv 2605.29310 · score 2reasoning
  711. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet · cs.AI · arXiv 2605.29358 · score 2transformer
  712. Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion · cs.AI · arXiv 2605.29591 · score 2reasoning
  713. FHRFormer: A Self-Supervised Masked Transformer Framework for Fetal Heart Rate Time-Series Inpainting and Forecasting · cs.AI · arXiv 2605.29695 · score 2transformer
  714. From XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving Networks · cs.AI · arXiv 2605.29768 · score 2retrieval
  715. Quantifying and Optimizing Simplicity via Polynomial Representations · cs.AI · arXiv 2605.29823 · score 2fine-tun
  716. On the Geometry of Games and their Solvers · cs.AI · arXiv 2605.29919 · score 2rag
  717. A comparative study of transformer-based embeddings for topic coherence · cs.CL · arXiv 2605.28832 · score 2transformer
  718. Transcribing Children’s Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions · cs.CL · arXiv 2605.28833 · score 2fine-tun
  719. FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks · cs.LG · arXiv 2605.29001 · score 2reasoning
  720. Multi-Resolution End-to-End Deep Neural Network for Optimizing Latency-Accuracy Tradeoff in Autonomous Driving · cs.RO · arXiv 2605.29138 · score 2latency
  721. Toward Ethical Facial Age Estimation: A Generalized Zero-Shot Benchmark Without Training on Children’s Data · cs.CV · arXiv 2605.29230 · score 2rag
  722. Extreme dynamic symmetry enables omnidirectional and multifunctional robots · cs.RO · arXiv 2605.29254 · score 2rag
  723. KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs · cs.LG · arXiv 2605.29259 · score 2rag
  724. Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts · cs.LG · arXiv 2605.29283 · score 2rag
  725. DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework · cs.AI · arXiv 2605.29428 · score 2gpu
  726. How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions · cs.LG · arXiv 2605.29448 · score 2rag
  727. Evolutionary Rule Extraction from Corporate Default Prediction Models · cs.NE · arXiv 2605.29478 · score 2rag
  728. Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection · cs.CR · arXiv 2605.29526 · score 2rag
  729. Data filtering methods for training language models · cs.CL · arXiv 2605.29807 · score 2fine-tun
  730. Evaluating Skill and Stability of ArchesWeather and ArchesWeatherGen under Multi-Decadal Climate Simulations · cs.AI · arXiv 2605.29976 · score 2rag
  731. Test Time Training for Supervised Causal Learning · cs.LG · arXiv 2605.30015 · score 2rag
  732. Masked Diffusion Modeling for Anomaly Detection · cs.LG · arXiv 2605.30046 · score 2rag
  733. A Predictive Law for On-Policy Self-Distillation From World Feedback · cs.LG · arXiv 2605.30070 · score 2post-train
  734. Self-Trained Verification for Training- and Test-Time Self-Improvement · cs.LG · arXiv 2605.30290 · score 2reasoning
  735. Reasoning with Sampling: Cutting at Decision Points · cs.LG · arXiv 2605.30327 · score 2reasoning
  736. TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech · cs.AI · arXiv 2601.11178 · score 2reasoning
  737. Recurrent Structural Policy Gradient for Partially Observable Mean Field Games · cs.AI · arXiv 2602.20141 · score 2rag
  738. Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases · cs.AI · arXiv 2603.07916 · score 2rag
  739. A Foundation Model for Zero-Shot Logical Rule Induction · cs.AI · arXiv 2605.04916 · score 2reasoning
  740. Learning A Simulation-based Visual Policy for Real-world Peg In Unseen Holes · cs.RO · arXiv 2205.04297 · score 2fine-tun
  741. A Composable Multimodal Framework for cine CMR-Text-Driven Prediction of Heart Failure Outcomes · cs.LG · arXiv 2502.16548 · score 2rag
  742. Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data · cs.SD · arXiv 2502.20838 · score 2rag
  743. Taming Data Challenges in ML-based Security Tasks Using Generative AI · cs.CR · arXiv 2507.06092 · score 2attention
  744. MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models · cs.CV · arXiv 2507.09574 · score 2attention
  745. Page image classification for content-specific data processing · cs.IR · arXiv 2507.21114 · score 2rag
  746. Approximate Proportionality in Online Fair Division · cs.GT · arXiv 2508.03253 · score 2attention
  747. The Impact of Semantic Pairs on Self-Supervised Representation Learning · cs.LG · arXiv 2510.08722 · score 2rag
  748. MiAD: Mirage Atom Diffusion for De Novo Crystal Generation · cs.LG · arXiv 2511.14426 · score 2rag
  749. Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach · cs.CV · arXiv 2511.19316 · score 2fine-tun
  750. BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models · cs.LG · arXiv 2512.00283 · score 2rag
  751. E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving · cs.CV · arXiv 2512.04733 · score 2reasoning
  752. Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models · cs.LG · arXiv 2601.14758 · score 2post-train
  753. S-MARC: Causal Streaming Reasoning for Full-Duplex Conversational Behavior Modeling · cs.CL · arXiv 2602.11065 · score 2reasoning
  754. OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model · cs.SD · arXiv 2602.12304 · score 2attention
  755. Post-Training Language Models for Crosslingual Consistency · cs.CL · arXiv 2603.04678 · score 2post-train
  756. BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps · cs.SD · arXiv 2604.19532 · score 2transformer
  757. MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio · cs.SD · arXiv 2605.00969 · score 2reasoning
  758. Aes3D: Aesthetic Assessment in 3D Gaussian Splatting · cs.CV · arXiv 2605.05155 · score 2attention
  759. AttenA+: Rectifying Action Inequality in Robotic Foundation Models · cs.RO · arXiv 2605.13548 · score 2attention
  760. The Distillation Game: Adaptive Attacks & Efficient Defenses · cs.LG · arXiv 2605.22737 · score 2reasoning
  761. Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery · cs.CV · arXiv 2605.24460 · score 2rag
  762. HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos · cs.RO · arXiv 2605.24934 · score 2rag
  763. Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4 · cs.LO · arXiv 2605.25556 · score 2rag
  764. Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection · cs.LG · arXiv 2605.26193 · score 2rag
  765. ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation · cs.LG · arXiv 2605.28293 · score 2rag
  766. From Data to Insights: Exploring Program-of-Thoughts Prompting for Chart Summarization · cs.CL · arXiv 2605.28874 · score 2reasoning
  767. Prompt-Level Reward Specifications for Open-Ended Post-Training · cs.CL · arXiv 2605.29275 · score 2post-train
  768. Attention Asymmetry in AI Layoff Discourse on X: A Computational Analysis of Capital vs Labour Amplification · cs.CL · arXiv 2605.29367 · score 2attention
  769. World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models · cs.CL · arXiv 2605.29585 · score 2reasoning
  770. Metric-Dependent Annotation Saturation for Learning from Label Distributions · cs.CL · arXiv 2605.29797 · score 2fine-tun
  771. Early Detection of Misinformation for Infodemic Management: A Domain Adaptation Approach · cs.CL · arXiv 2406.10238 · score 2rag
  772. What Exactly do Children Receive in Language Acquisition? A Case Study on CHILDES with Automated Detection of Filler-Gap Dependencies · cs.CL · arXiv 2603.02082 · score 2rag
  773. X-GS: An Extensible Framework for Perceiving and Thinking via 3D Gaussian Splatting · cs.CV · arXiv 2603.09632 · score 2rag
  774. Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit · cs.LG · arXiv 2605.28873 · score 2quantization
  775. Spectral Guidance for Flexible and Efficient Control of Diffusion Models · cs.LG · arXiv 2605.28900 · score 2rag
  776. Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System · cs.LG · arXiv 2605.28909 · score 2gpu
  777. Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems · cs.LG · arXiv 2605.28912 · score 2rag
  778. Designing Active Tether-Net Systems for Space Debris Capture with Graph-Learning-Aided Mixed-Combinatorial Optimization · cs.LG · arXiv 2605.29021 · score 2fine-tun
  779. Model Merging by Output-Space Projection · cs.LG · arXiv 2605.29101 · score 2fine-tun
  780. Bridging Chemists and AI: An Expert-Augmented Framework for Interpretable Route Evaluation · cs.LG · arXiv 2605.29108 · score 2fine-tun
  781. PROTOCOL: Late Interaction Retrieval for Protein Homolog Search · cs.LG · arXiv 2605.29158 · score 2retrieval
  782. Traditional machine learning vs. deep learning from dynamic graph representations of proteins’ 3D folds in the task of protein structure classification · cs.LG · arXiv 2605.29228 · score 2rag
  783. Robust Frequency-Calibrated Virtual EEG Channel Generation from Four Frontal Electrodes for Wearable EEG Augmentation · cs.LG · arXiv 2605.29263 · score 2attention
  784. Information-Directed Offline-to-Online Reinforcement Learning · cs.LG · arXiv 2605.29405 · score 2rag
  785. Convex Basins in Single-Index Model Loss Landscapes: Applications to Robust Recovery under Strong Adversarial Corruption · cs.LG · arXiv 2605.29497 · score 2retrieval
  786. Realistic honeypot evaluations for scheming propensity · cs.LG · arXiv 2605.29729 · score 2rag
  787. Gated Graph Attention Networks with Learnable Temperature · cs.LG · arXiv 2605.29803 · score 2attention
  788. OVA-IB: One vs All Information Bottleneck for Multi-Modal Alignment · cs.LG · arXiv 2605.29900 · score 2retrieval
  789. Treatment-Conditioned Diffusion for Forecasting Neurodegenerative Disease Progression · cs.LG · arXiv 2605.29932 · score 2transformer
  790. Ridge Regression from Poisson Resetting: A Renewal Perspective on Spectral Regularization · cs.LG · arXiv 2605.30059 · score 2rag
  791. Q-ANCHOR: Federated Quantum Learning with ZNE-guided Correction · cs.LG · arXiv 2605.30075 · score 2rag
  792. Chess-World-Model: A 10M-Game Benchmark for Exact State Tracking from Chess Move Sequences · cs.LG · arXiv 2605.30100 · score 2transformer
  793. Striding Across Reynolds Numbers: Representation Geometry in Neural PDE Generalisation · cs.LG · arXiv 2605.30112 · score 2retrieval
  794. Learning to Extrapolate to New Tasks: A Relational Approach to Task Extrapolation · cs.LG · arXiv 2605.30132 · score 2fine-tun
  795. Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts · cs.LG · arXiv 2605.30184 · score 2transformer
  796. ExDBSCAN: Explaining DBSCAN with Counterfactual Reasoning – Additional Material · cs.LG · arXiv 2605.30225 · score 2reasoning
  797. Neural Operator-Based Surrogate Model for CFD:Helical Coil Steam Generator in Small Modular Reactor · cs.LG · arXiv 2605.30277 · score 2rag
  798. WASHH: An Anchor-Aware Whale-Guided Selection Hyper-Heuristic for Continuous Optimization and SVC Configuration · cs.NE · arXiv 2605.28844 · score 2rag
  799. Financially Guided Deep Portfolio Optimization · cs.LG · arXiv 2605.28853 · score 2attention
  800. Lightweight Complementary-Cue Fusion for Robust Video Face Forgery Detection · cs.CV · arXiv 2605.29092 · score 2rag
  801. ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving · cs.CR · arXiv 2605.29114 · score 2reasoning
  802. Real-Time Retargeting Using Controllability Boundary for Chandrayaan-3 Lunar Landing · eess.SY · arXiv 2605.29412 · score 2rag
  803. Deep Optimal Individualized Treatment Rules for Bivariate Survival Outcomes via Adaptive Prediction-Powered Learning · stat.ML · arXiv 2605.29464 · score 2rag
  804. The Complexity of Verifying Feedforward Neural Networks in Quantised Settings · cs.CC · arXiv 2605.29537 · score 2reasoning
  805. Parameter-Efficient Subspace Decoupling ViT for Mitigating Multi-Task Negative Transfer in Histological Scoring · cs.CV · arXiv 2605.29852 · score 2transformer
  806. Gesture-Aware Indoor THz ISAC Systems for Adaptive Resource Allocation · cs.IT · arXiv 2605.29913 · score 2rag
  807. Visual Spatial Learning: Single-Field Spatial Interpolation Using Convolutional Neural Networks · stat.ML · arXiv 2605.30167 · score 2rag
  808. Unveiling the Visual Counting Bottleneck in Vision-Language Models · cs.MM · arXiv 2605.30170 · score 2reasoning
  809. DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation · cs.RO · arXiv 2605.30350 · score 2rag
  810. An Empirical Study of the Influence of Adversarial Fine-Tuning on Compressed Neural Networks · cs.LG · arXiv 2403.09441 · score 2fine-tun
  811. A Quotient Homology Theory of Representation in Neural Networks · cs.LG · arXiv 2502.01360 · score 2rag
  812. Connecting Independently Trained Modes via Layer-Wise Connectivity · cs.LG · arXiv 2505.02604 · score 2transformer
  813. Active Learning for Machine Learning Driven Molecular Dynamics · cs.LG · arXiv 2509.17208 · score 2rag
  814. FedBiCross: Personalized One-Shot Federated Learning on Medical Images · cs.LG · arXiv 2601.01901 · score 2rag
  815. Achieving Linear Speedup for Composite Federated Learning · cs.LG · arXiv 2602.03357 · score 2rag
  816. Computationally Efficient Replicable Learning of Parities and Applications · cs.LG · arXiv 2602.09499 · score 2rag
  817. Collaborative Threshold Watermarking · cs.LG · arXiv 2602.10765 · score 2fine-tun
  818. Localizing Memorized Regions in Diffusion Models via Coordinate-Wise Curvature Differences · cs.LG · arXiv 2605.26756 · score 2attention
  819. Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models · cs.LG · arXiv 2605.27975 · score 2fine-tun
  820. MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball · cs.GT · arXiv 2506.04602 · score 2rag
  821. A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization · math.OC · arXiv 2506.20344 · score 2rag
  822. SpeedCP: Fast Kernel-based Conditional Conformal Prediction · stat.ME · arXiv 2509.24100 · score 2rag
  823. Contrastive Representation Regularization for Vision-Language-Action Models · cs.RO · arXiv 2510.01711 · score 2rag
  824. Permutation-Invariant Spectral Learning via Dyson Diffusion · stat.ML · arXiv 2510.08535 · score 2rag
  825. Calibrating Generative Models to Distributional Constraints · stat.ML · arXiv 2510.10020 · score 2fine-tun
  826. Securing SIM-Assisted Wireless Networks via Quantum Reinforcement Learning · cs.NI · arXiv 2602.13238 · score 2rag
  827. Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression · stat.ME · arXiv 2604.13410 · score 2rag
  828. A Deep Learning Model for Battery State Prediction towards Intelligent Energy Management · eess.SP · arXiv 2605.00898 · score 2rag
  829. Paris 2.0: A Decentralized Diffusion Model for Video Generation · cs.CV · arXiv 2605.26064 · score 2gpu
  830. Design and Implementation of a Serverless MapReduce Framework for Scalable Data Pipelines · cs.DC · arXiv 2605.29573 · score 2rag
  831. PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration · cs.DC · arXiv 2605.29728 · score 2gpu
  832. Capsule: Efficient Player Isolation for Datacenters · cs.DC · arXiv 2506.11483 · score 2gpu
  833. Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems · cs.AR · arXiv 2605.29994 · score 2latency
  834. elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search · cs.AR · arXiv 2605.30019 · score 2latency
  835. Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory · cs.AR · arXiv 2603.06951 · score 2rag