<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>2026-04-22 Paper Digest on JXIN&#39;s Home</title>
    <link>https://ftxj.github.io/posts/2026-04-22/</link>
    <description>Recent content in 2026-04-22 Paper Digest on JXIN&#39;s Home</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Mon, 27 Apr 2026 05:17:00 +0000</lastBuildDate>
    <atom:link href="https://ftxj.github.io/posts/2026-04-22/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks</title>
      <link>https://ftxj.github.io/posts/2026-04-22/10-co-evolving-llm-decision-and-skill-bank-agents-for-long-hori/</link>
      <pubDate>Mon, 27 Apr 2026 05:17:00 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/10-co-evolving-llm-decision-and-skill-bank-agents-for-long-hori/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20987v1&#34;&gt;2604.20987&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20987v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.AI&lt;/code&gt; · all: cs.AI&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, retrieval, rag, reasoning&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;COSPLAY is a co-evolution framework pairing an LLM decision agent with a learnable skill bank: the decision agent retrieves skills to act, while a skill-pipeline agent mines reusable skills from unlabeled rollouts. An 8B model beats four frontier LLM baselines by &amp;gt;25% average reward on six game environments.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction</title>
      <link>https://ftxj.github.io/posts/2026-04-22/09-agentic-ai-for-personalized-physiotherapy-a-multi-agent-fram/</link>
      <pubDate>Mon, 27 Apr 2026 05:16:25 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/09-agentic-ai-for-personalized-physiotherapy-a-multi-agent-fram/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.21154v1&#34;&gt;2604.21154&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.21154v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.AI&lt;/code&gt; · all: cs.AI&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, agent, agentic, multi-agent, rag&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;Proposes a four-agent system that parses clinical notes, generates patient-specific exercise videos, tracks poses in real time, and delivers corrective feedback for at-home physiotherapy. The paper is largely architectural, presenting a prototype and evaluation plan rather than clinical results.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Tele-rehabilitation gap stems from static video libraries and generic avatars ignoring patient-specific constraints.&lt;/li&gt;&#xA;&lt;li&gt;A Multi-Agent System (MAS) can close the loop by combining generative video, pose estimation, and autonomous feedback.&lt;/li&gt;&#xA;&lt;li&gt;Four specialized micro-agents cover extraction, synthesis, vision, and diagnostics.&lt;/li&gt;&#xA;&lt;li&gt;Unstructured clinical notes can be turned into kinematic constraints that condition downstream generation.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;Four micro-agents pipeline:&lt;/p&gt;</description>
    </item>
    <item>
      <title>EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation</title>
      <link>https://ftxj.github.io/posts/2026-04-22/08-evoagent-an-evolvable-agent-framework-with-skill-learning-an/</link>
      <pubDate>Mon, 27 Apr 2026 05:15:48 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/08-evoagent-an-evolvable-agent-framework-with-skill-learning-an/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20133v2&#34;&gt;2604.20133&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20133v2&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Aimin Zhang, Jiajing Guo, Fuwei Jia, Chen Lv, Boyu Wang, Fangzheng Li&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.AI&lt;/code&gt; · all: cs.AI&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, multi-agent, rag&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;EvoAgent is an evolvable LLM agent framework combining structured skill learning, hierarchical sub-agent delegation, and a three-layer memory. On real-world foreign-trade tasks with GPT5.2, it lifts a five-dimensional LLM-as-Judge score by ~28%.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Skills modeled as multi-file structured capability units with triggers and evolutionary metadata.&lt;/li&gt;&#xA;&lt;li&gt;User-feedback-driven closed loop for continuous skill generation and optimization.&lt;/li&gt;&#xA;&lt;li&gt;Three-stage skill matching plus three-layer memory architecture for long-term accumulation.&lt;/li&gt;&#xA;&lt;li&gt;Hierarchical sub-agent delegation enabling dynamic task decomposition.&lt;/li&gt;&#xA;&lt;li&gt;Agent performance depends on model–architecture synergy, not just base model strength.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;Each skill is a structured artifact (multiple files) carrying triggering logic and evolutionary metadata, so the system can decide when to invoke it and how to mutate it over time. A three-stage matcher selects skills for an incoming task; a three-layer memory separates short-term, working, and long-term context. A hierarchical delegation mechanism spawns sub-agents for decomposed subtasks, and a user-feedback closed loop drives skill creation and refinement.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving</title>
      <link>https://ftxj.github.io/posts/2026-04-22/07-dual-cluster-memory-agent-resolving-multi-paradigm-ambiguity/</link>
      <pubDate>Mon, 27 Apr 2026 05:14:56 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/07-dual-cluster-memory-agent-resolving-multi-paradigm-ambiguity/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20183v1&#34;&gt;2604.20183&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20183v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Xinyu Zhang, Yuchen Wan, Boxuan Zhang, Zesheng Yang, Lingling Zhang, Bifan Wei, Jun Liu&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CL&lt;/code&gt; · all: cs.CL&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, rag, reasoning, inference&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;DCM-Agent is a training-free framework that resolves structural ambiguity in LLM-based optimization problem solving by maintaining dual clusters of historical solutions (modeling + coding), distilled into Approach/Checklist/Pitfall knowledge, and using them for memory-augmented inference.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Optimization problems suffer from multi-paradigm ambiguity that confuses LLMs.&lt;/li&gt;&#xA;&lt;li&gt;Split memory into two clusters: modeling and coding.&lt;/li&gt;&#xA;&lt;li&gt;Distill each cluster into three structured knowledge types: Approach, Checklist, Pitfall.&lt;/li&gt;&#xA;&lt;li&gt;Use memory at inference for path navigation, error repair, and adaptive switching.&lt;/li&gt;&#xA;&lt;li&gt;Observed &amp;ldquo;knowledge inheritance&amp;rdquo;: memory from larger models lifts smaller models.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;The Dual-Cluster Memory Construction step routes prior solutions into modeling vs. coding clusters, then distills generalizable guidance into structured Approach / Checklist / Pitfall entries. At inference, the agent retrieves relevant memory to pick a reasoning path, detects and repairs errors, and adaptively switches paradigms. The entire pipeline is training-free, relying on prompting plus a structured memory bank.&lt;/p&gt;</description>
    </item>
    <item>
      <title>FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving</title>
      <link>https://ftxj.github.io/posts/2026-04-22/06-faser-fine-grained-phase-management-for-speculative-decoding/</link>
      <pubDate>Mon, 27 Apr 2026 05:14:26 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/06-faser-fine-grained-phase-management-for-speculative-decoding/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20503v1&#34;&gt;2604.20503&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20503v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Wenyan Chen, Chengzhi Lu, Yanying Lin, Dmitrii Ustiugov&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.DC&lt;/code&gt; · all: cs.DC&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; llm, inference, serving, speculative decoding, gpu, throughput, latency&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;FASER is a fine-grained speculative-decoding scheduler for dynamic LLM serving that tunes speculative length per request, prunes rejected tokens early, and spatially overlaps draft and verification phases, yielding up to 53% higher throughput and 1.92× lower latency over SOTA in vLLM.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Coarse-grained, batch-level speculative decoding (SD) wastes GPU cycles under both low and high load.&lt;/li&gt;&#xA;&lt;li&gt;Speculative length should be a per-request knob inside a continuous batch, not a global constant.&lt;/li&gt;&#xA;&lt;li&gt;Verification can be chunked into &amp;ldquo;frontiers&amp;rdquo; and overlapped with drafting via spatial multiplexing.&lt;/li&gt;&#xA;&lt;li&gt;Rejected tokens can be pruned mid-verification to avoid wasted compute.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;FASER extends vLLM with three mechanisms: (1) dynamic per-request speculative length based on acceptance behavior within a continuous batch; (2) early pruning that terminates verification for tokens already rejected, reclaiming GPU work; (3) frontier-based verification that splits the verify pass into chunks and co-executes them with draft kernels using fine-grained spatial multiplexing for low interference.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows</title>
      <link>https://ftxj.github.io/posts/2026-04-22/05-cooperative-profiles-predict-multi-agent-llm-team-performanc/</link>
      <pubDate>Mon, 27 Apr 2026 05:13:53 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/05-cooperative-profiles-predict-multi-agent-llm-team-performanc/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20658v1&#34;&gt;2604.20658&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20658v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Shivani Kumar, Adarsh Bharathwaj, David Jurgens&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CL&lt;/code&gt; · all: cs.CL&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, multi-agent, reasoning, gpu&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;Authors benchmark 35 open-weight LLMs on six behavioral-economics games and show that the resulting &amp;ldquo;cooperative profiles&amp;rdquo; predict downstream team performance in AI-for-Science workflows under shared budget constraints, offering a cheap diagnostic for multi-agent deployment.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Cooperative disposition is a distinct, measurable LLM property, not reducible to general capability.&lt;/li&gt;&#xA;&lt;li&gt;Behavioral-economics games isolate cooperation mechanisms that transfer to realistic multi-agent science tasks.&lt;/li&gt;&#xA;&lt;li&gt;Models favoring multiplicative team production over greedy strategies yield better scientific reports.&lt;/li&gt;&#xA;&lt;li&gt;Game-based screening can precede expensive multi-agent rollouts.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Evaluate 35 open-weight LLMs across six behavioral-economics games targeting distinct cooperation mechanisms (coordination, investment, resource sharing).&lt;/li&gt;&#xA;&lt;li&gt;Derive per-model &amp;ldquo;cooperative profiles&amp;rdquo; from game behavior.&lt;/li&gt;&#xA;&lt;li&gt;Deploy LLM teams in an AI-for-Science pipeline: collaboratively analyze data, build models, and write scientific reports under shared budgets (e.g., GPU/credit caps).&lt;/li&gt;&#xA;&lt;li&gt;Regress downstream outcomes on cooperative profile features while controlling for confounds (likely model size, general ability benchmarks).&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;experiments&#34;&gt;Experiments&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Models: 35 open-weight LLMs.&lt;/li&gt;&#xA;&lt;li&gt;Games: six behavioral-economics tasks (abstract not specific, but likely includes public-goods, trust, coordination variants).&lt;/li&gt;&#xA;&lt;li&gt;Downstream task: multi-agent AI-for-Science workflow with shared constraints.&lt;/li&gt;&#xA;&lt;li&gt;Metrics: report accuracy, quality, and completion.&lt;/li&gt;&#xA;&lt;li&gt;Baselines / controls: general-ability factors partialled out.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;results&#34;&gt;Results&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Cooperative profiles robustly predict downstream accuracy, quality, and completion.&lt;/li&gt;&#xA;&lt;li&gt;Effect persists after controlling for multiple confounding factors.&lt;/li&gt;&#xA;&lt;li&gt;Headline numerical effect sizes not given in the abstract.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;why-it-matters&#34;&gt;Why It Matters&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Provides a fast, inexpensive screening tool for multi-agent LLM deployments where coordination and budget-sharing matter.&lt;/li&gt;&#xA;&lt;li&gt;Reframes multi-agent selection beyond raw benchmark scores toward cooperative disposition.&lt;/li&gt;&#xA;&lt;li&gt;Useful for agent/infra teams building scientific, engineering, or tool-using LLM collectives.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;connections-to-prior-work&#34;&gt;Connections to Prior Work&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Behavioral-economics probes of LLMs (trust games, ultimatum, public-goods studies).&lt;/li&gt;&#xA;&lt;li&gt;Multi-agent LLM frameworks (AutoGen, MetaGPT, ChatDev, AI-Scientist).&lt;/li&gt;&#xA;&lt;li&gt;Work on LLM &amp;ldquo;personality&amp;rdquo; / social-preference elicitation.&lt;/li&gt;&#xA;&lt;li&gt;Emergent cooperation and game-theoretic evaluations in RL agents.&lt;/li&gt;&#xA;&lt;li&gt;Scientific-writing and data-analysis agent benchmarks.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;open-questions&#34;&gt;Open Questions&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Which specific games carry the most predictive signal, and do they generalize beyond AI-for-Science?&lt;/li&gt;&#xA;&lt;li&gt;Does cooperative profile stay stable under prompting, fine-tuning, or RLHF interventions?&lt;/li&gt;&#xA;&lt;li&gt;Are closed-weight frontier models (GPT-4.x, Claude, Gemini) consistent with the 35-model findings?&lt;/li&gt;&#xA;&lt;li&gt;Can cooperative disposition be deliberately trained or aligned, and at what cost to single-agent capability?&lt;/li&gt;&#xA;&lt;li&gt;How do heterogeneous teams (mixing cooperators and defectors) behave versus homogeneous ones?&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;figures&#34;&gt;Figures&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Figure 1:&lt;/strong&gt; Page 2 (rendered)&lt;/p&gt;</description>
    </item>
    <item>
      <title>Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models</title>
      <link>https://ftxj.github.io/posts/2026-04-22/04-breaking-mcp-with-function-hijacking-attacks-novel-threats-f/</link>
      <pubDate>Mon, 27 Apr 2026 05:13:18 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/04-breaking-mcp-with-function-hijacking-attacks-novel-threats-f/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20994v1&#34;&gt;2604.20994&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20994v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis, Seshu Tirupathi, John D. Kelleher&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CR&lt;/code&gt; · all: cs.AI, cs.CL, cs.CR&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, agentic, reasoning, attention&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;This paper introduces Function Hijacking Attacks (FHA), a novel adversarial technique that manipulates agentic LLMs&amp;rsquo; tool selection to force invocation of attacker-chosen functions, achieving 70-100% attack success rates across five models on the BFCL benchmark, largely independent of query semantics.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems</title>
      <link>https://ftxj.github.io/posts/2026-04-22/03-automatic-ontology-construction-using-llms-as-an-external-la/</link>
      <pubDate>Mon, 27 Apr 2026 05:12:44 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/03-automatic-ontology-construction-using-llms-as-an-external-la/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20795v1&#34;&gt;2604.20795&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20795v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Pavel Salovskii, Iuliia Gorshkova&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.AI&lt;/code&gt; · all: cs.AI&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, retrieval, rag, reasoning, inference&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;The paper proposes a hybrid architecture augmenting LLMs with an external RDF/OWL ontological memory layer, automatically constructed from heterogeneous sources, to enable persistent, verifiable, and semantically grounded reasoning beyond vector-based RAG.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;LLMs suffer from weak long-term memory, poor structure, and unreliable multi-step reasoning.&lt;/li&gt;&#xA;&lt;li&gt;An external ontology (RDF/OWL knowledge graph) acts as verifiable memory and planning substrate.&lt;/li&gt;&#xA;&lt;li&gt;Automated pipeline builds and maintains the ontology from documents, APIs, and dialogue logs.&lt;/li&gt;&#xA;&lt;li&gt;SHACL/OWL constraints turn inference into a generation–verification–correction loop.&lt;/li&gt;&#xA;&lt;li&gt;Hybrid inference combines vector retrieval, graph reasoning, and external tool calls.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;The pipeline extracts entities and relations from heterogeneous inputs, normalizes them, and generates RDF triples. Triples are validated against SHACL shapes and OWL axioms, then merged into a continuously updated knowledge graph. At inference time, the LLM conditions on a composite context fusing vector-retrieved passages, graph subqueries, and tool outputs. Generated answers are checked against ontology constraints; violations trigger correction, yielding a closed verify-and-repair loop.&lt;/p&gt;</description>
    </item>
    <item>
      <title>HaS: Accelerating RAG through Homology-Aware Speculative Retrieval</title>
      <link>https://ftxj.github.io/posts/2026-04-22/02-has-accelerating-rag-through-homology-aware-speculative-retr/</link>
      <pubDate>Mon, 27 Apr 2026 05:12:02 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/02-has-accelerating-rag-through-homology-aware-speculative-retr/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20452v1&#34;&gt;2604.20452&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20452v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Peng Peng, Weiwei Lin, Wentai Wu, Xinyang Wang, Yongheng Liu&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.IR&lt;/code&gt; · all: cs.CL, cs.IR&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, agentic, retrieval, rag, inference, latency&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;HaS accelerates Retrieval-Augmented Generation by speculatively retrieving from a restricted scope, then validating candidates via &amp;ldquo;homologous query re-identification&amp;rdquo; — checking whether the incoming query matches a previously-seen one. This bypasses full-database search for repeat-like queries, cutting latency 24–37% with 1–2% accuracy loss.&lt;/p&gt;</description>
    </item>
    <item>
      <title>SAKE: Self-aware Knowledge Exploitation-Exploration for Grounded Multimodal Named Entity Recognition</title>
      <link>https://ftxj.github.io/posts/2026-04-22/01-sake-self-aware-knowledge-exploitation-exploration-for-groun/</link>
      <pubDate>Mon, 27 Apr 2026 05:11:32 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-22/01-sake-self-aware-knowledge-exploitation-exploration-for-groun/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20146v1&#34;&gt;2604.20146&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20146v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Jielong Tang, Xujie Yuan, Jiayang Liu, Jianxing Yu, Xiao Dong, Lin Chen, Yunlai Teng, Shimin Di, Jian Yin&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.IR&lt;/code&gt; · all: cs.CL, cs.IR&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, agentic, tool-use, retrieval, reasoning, chain-of-thought, serving, fine-tun&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;SAKE is an end-to-end agentic framework for Grounded Multimodal Named Entity Recognition (GMNER) that blends internal MLLM knowledge with external retrieval via self-aware reasoning, deciding when to invoke search tools to handle long-tailed and unseen entities on social media.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
