<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>2026-04-21 Paper Digest on JXIN&#39;s Home</title>
    <link>https://ftxj.github.io/posts/2026-04-21/</link>
    <description>Recent content in 2026-04-21 Paper Digest on JXIN&#39;s Home</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Mon, 27 Apr 2026 05:22:40 +0000</lastBuildDate>
    <atom:link href="https://ftxj.github.io/posts/2026-04-21/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps</title>
      <link>https://ftxj.github.io/posts/2026-04-21/10-cyber-defense-benchmark-agentic-threat-hunting-evaluation-fo/</link>
      <pubDate>Mon, 27 Apr 2026 05:22:40 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/10-cyber-defense-benchmark-agentic-threat-hunting-evaluation-fo/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19533v3&#34;&gt;2604.19533&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19533v3&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Alankrit Chona, Igor Kozlov, Ambuj Kumar&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CR&lt;/code&gt; · all: cs.AI, cs.CR&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, agentic, rag&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;Cyber Defense Benchmark evaluates LLM agents on open-ended threat hunting over raw Windows logs via iterative SQL queries. Across five frontier models, all fail dramatically — the best (Claude Opus 4.6) flags only 3.8% of malicious events, and none meet the &amp;gt;=50% per-tactic recall bar for unsupervised SOC deployment.&lt;/p&gt;</description>
    </item>
    <item>
      <title>TRN-R1-Zero: Text-rich Network Reasoning via LLMs with Reinforcement Learning Only</title>
      <link>https://ftxj.github.io/posts/2026-04-21/09-trn-r1-zero-text-rich-network-reasoning-via-llms-with-reinfo/</link>
      <pubDate>Mon, 27 Apr 2026 05:21:53 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/09-trn-r1-zero-text-rich-network-reasoning-via-llms-with-reinfo/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19070v1&#34;&gt;2604.19070&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19070v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yilun Liu, Ruihong Qiu, Zi Huang&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CL&lt;/code&gt; · all: cs.CL, cs.LG&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, reasoning, chain-of-thought, inference, fine-tun, post-train&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;TRN-R1-Zero is a post-training framework that uses reinforcement learning alone to teach base LLMs to reason over text-rich networks, avoiding supervised fine-tuning or distillation while generalising across node, edge, and graph-level tasks.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;RL-only post-training for text-rich network (TRN) reasoning — no SFT, no CoT distillation from larger teachers.&lt;/li&gt;&#xA;&lt;li&gt;Neighbour-aware Group Relative Policy Optimisation (N-GRPO) that shapes rewards via a novel &amp;ldquo;margin gain&amp;rdquo; metric measuring neighbour informativeness.&lt;/li&gt;&#xA;&lt;li&gt;Node-level training transfers zero-shot to edge- and graph-level tasks, beyond typical cross-domain transfer.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;The authors extend GRPO with neighbourhood awareness: for each candidate response, rewards are dynamically adjusted by a margin gain metric capturing how much neighbouring node signals contribute to the correct answer, pushing the LLM to actually use relational context rather than text alone. Training runs only on node-level supervision signals via RL on base LLMs.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Detoxification for LLM: From Dataset Itself</title>
      <link>https://ftxj.github.io/posts/2026-04-21/08-detoxification-for-llm-from-dataset-itself/</link>
      <pubDate>Mon, 27 Apr 2026 05:21:19 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/08-detoxification-for-llm-from-dataset-itself/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19124v1&#34;&gt;2604.19124&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19124v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Wei Shao, Yihang Wang, Gaoyu Zhu, Ziqiang Cheng, Lei Yu, Jiafeng Guo, Xueqi Cheng&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CL&lt;/code&gt; · all: cs.CL&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, inference, serving, fine-tun, post-train&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;The paper proposes HSPD, a pipeline that detoxifies LLM pretraining corpora at the source by rewriting toxic spans with a Soft Contrastive Decoding (SoCD) method, yielding a drop-in replacement dataset that cuts downstream model toxicity while preserving semantics.&lt;/p&gt;</description>
    </item>
    <item>
      <title>SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving</title>
      <link>https://ftxj.github.io/posts/2026-04-21/07-saw-int4-system-aware-4-bit-kv-cache-quantization-for-real-w/</link>
      <pubDate>Mon, 27 Apr 2026 05:20:49 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/07-saw-int4-system-aware-4-bit-kv-cache-quantization-for-real-w/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19157v1&#34;&gt;2604.19157&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19157v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Jinda Jia, Jisen Li, Zhongzhu Zhou, Jung Hwan Heo, Jue Wang, Tri Dao, Shuaiwen Leon Song, Ben Athiwaratkun, Chenfeng Xu, Tianyi Zhang, Xiaoxia Wu&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.LG&lt;/code&gt; · all: cs.LG&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; llm, serving, kv-cache, quantization, attention, throughput, latency&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;SAW-INT4 proposes token-wise INT4 KV-cache quantization with block-diagonal Hadamard rotation, the simplest scheme compatible with paged memory and fused attention in real LLM serving. A fused rotation-quantization kernel matches plain INT4 throughput while recovering nearly all accuracy lost to naive INT4.&lt;/p&gt;</description>
    </item>
    <item>
      <title>If you&#39;re waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems</title>
      <link>https://ftxj.github.io/posts/2026-04-21/06-if-you-re-waiting-for-a-sign-that-might-not-be-it-mitigating/</link>
      <pubDate>Mon, 27 Apr 2026 05:20:15 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/06-if-you-re-waiting-for-a-sign-that-might-not-be-it-mitigating/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19844v1&#34;&gt;2604.19844&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19844v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Jiamin Chang, Minhui Xue, Ruoxi Sun, Shuchao Pang, Salil S. Kanhere, Hammond Pearce&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CV&lt;/code&gt; · all: cs.AI, cs.CV&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; agent, agentic, multi-agent, serving, ai system&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;This paper identifies &amp;ldquo;trust boundary confusion&amp;rdquo; in Vision-Language Agentic Systems (VLAS), where agents fail to distinguish legitimate environmental signals (e.g., traffic lights) from adversarial visual injections. The authors propose a multi-agent defense that separates perception from decision-making, improving robustness while preserving responsiveness to genuine cues.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Statistics, Not Scale: Modular Medical Dialogue with Bayesian Belief Engine</title>
      <link>https://ftxj.github.io/posts/2026-04-21/05-statistics-not-scale-modular-medical-dialogue-with-bayesian/</link>
      <pubDate>Mon, 27 Apr 2026 05:19:41 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/05-statistics-not-scale-modular-medical-dialogue-with-bayesian/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.20022v1&#34;&gt;2604.20022&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.20022v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yusuf Kesmen, Fay Elhassan, Jiayi Ma, Julien Stalhandske, David Sasu, Alexandra Kulinkina, Akhil Arora, Lars Klein, Mary-Anne Hartley&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.LG&lt;/code&gt; · all: cs.AI, cs.CL, cs.LG&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, rag, reasoning, inference&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;BMBE splits medical dialogue into an LLM &amp;ldquo;sensor&amp;rdquo; that parses utterances and a deterministic Bayesian engine that handles all diagnostic inference, yielding calibrated, private, and robust diagnosis that beats frontier standalone LLMs at a fraction of the cost.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding</title>
      <link>https://ftxj.github.io/posts/2026-04-21/04-a-mar-agent-based-multimodal-art-retrieval-for-fine-grained/</link>
      <pubDate>Mon, 27 Apr 2026 05:19:13 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/04-a-mar-agent-based-multimodal-art-retrieval-for-fine-grained/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19689v1&#34;&gt;2604.19689&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19689v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.AI&lt;/code&gt; · all: cs.AI&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, retrieval, reasoning, ai system&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;A-MAR is an agent-based multimodal retrieval framework that decomposes artwork queries into structured reasoning plans, then conditions retrieval on each step to produce grounded, interpretable explanations. It outperforms static retrieval and MLLM baselines on SemArt, Artpedia, and a new ArtCoT-QA benchmark.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms</title>
      <link>https://ftxj.github.io/posts/2026-04-21/03-rethinking-scale-deployment-trade-offs-of-small-language-mod/</link>
      <pubDate>Mon, 27 Apr 2026 05:18:43 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/03-rethinking-scale-deployment-trade-offs-of-small-language-mod/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19299v1&#34;&gt;2604.19299&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19299v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Xinlin Wang, Mats Brorsson&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.CL&lt;/code&gt; · all: cs.AI, cs.CL&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, agent, multi-agent, tool use, reasoning, latency, fine-tun&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;This paper presents the first large-scale empirical study of sub-10B open-source SLMs across three deployment paradigms—base, single-agent with tools, and multi-agent collaboration—finding that single-agent systems offer the best cost/performance balance while multi-agent setups add overhead with limited gains.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;SLMs (&amp;lt;10B params) are viable LLM alternatives if their weaknesses are compensated by agent paradigms rather than pure scaling or fine-tuning.&lt;/li&gt;&#xA;&lt;li&gt;Tool-augmented single agents systematically outperform base SLMs at modest extra cost.&lt;/li&gt;&#xA;&lt;li&gt;Multi-agent collaboration yields diminishing returns relative to its computational overhead.&lt;/li&gt;&#xA;&lt;li&gt;Deployment efficiency is a first-class design criterion for trustworthy SLM systems.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;The authors benchmark open-source SLMs under three paradigms: (1) bare base model, (2) a single agent equipped with external tools, and (3) a multi-agent collaborative system. They compare performance and cost across these configurations, though the abstract does not specify which tools, orchestration framework, or agent protocols are used.&lt;/p&gt;</description>
    </item>
    <item>
      <title>GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models</title>
      <link>https://ftxj.github.io/posts/2026-04-21/02-grasprune-global-gating-for-budgeted-structured-pruning-of-l/</link>
      <pubDate>Mon, 27 Apr 2026 05:18:10 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/02-grasprune-global-gating-for-budgeted-structured-pruning-of-l/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19398v1&#34;&gt;2604.19398&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19398v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Ziyang Wang, Jiangfeng Xiao, Chuan Xiao, Ruoxiang Li, Rui Mao, Jianbin Qin&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.AI&lt;/code&gt; · all: cs.AI&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, rag, inference, kv cache, attention, gpu, latency, fine-tun&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;GRASPrune is a post-pretraining structured pruning framework that jointly prunes FFN channels and KV head groups under a single global budget using projected straight-through gate learning, producing a smaller dense checkpoint without fine-tuning the backbone.&lt;/p&gt;</description>
    </item>
    <item>
      <title>ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration</title>
      <link>https://ftxj.github.io/posts/2026-04-21/01-chipcraftbrain-validation-first-rtl-generation-via-multi-age/</link>
      <pubDate>Mon, 27 Apr 2026 05:17:33 +0000</pubDate>
      <guid>https://ftxj.github.io/posts/2026-04-21/01-chipcraftbrain-validation-first-rtl-generation-via-multi-age/</guid>
      <description>&lt;p&gt;&lt;strong&gt;arXiv:&lt;/strong&gt; &lt;a href=&#34;https://arxiv.org/abs/2604.19856v1&#34;&gt;2604.19856&lt;/a&gt; · &lt;a href=&#34;https://arxiv.org/pdf/2604.19856v1&#34;&gt;PDF&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Cagri Eryilmaz&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Primary category:&lt;/strong&gt; &lt;code&gt;cs.AR&lt;/code&gt; · all: cs.AI, cs.AR, cs.LG&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Matched keywords:&lt;/strong&gt; large language model, llm, agent, agentic, multi-agent, retrieval, rag, reasoning&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;&#xA;&lt;p&gt;ChipCraftBrain is a multi-agent RTL generation framework combining PPO-driven orchestration, symbolic-neural reasoning, and knowledge retrieval. It hits 97.2% pass@1 on VerilogEval-Human and 94.7% on a 302-problem CVDP subset, outperforming MAGE and matching ChipAgents while using far fewer attempts than NVIDIA&amp;rsquo;s ACE-RTL.&lt;/p&gt;&#xA;&lt;h2 id=&#34;key-ideas&#34;&gt;Key Ideas&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Adaptive orchestration of six specialized agents via a PPO policy over a 168-dim state (with an MPC world-model alternative).&lt;/li&gt;&#xA;&lt;li&gt;Hybrid symbolic-neural architecture: algorithmic solvers for K-maps/truth tables, neural agents for waveforms and general RTL.&lt;/li&gt;&#xA;&lt;li&gt;Knowledge-augmented retrieval from 321 patterns + 971 open-source reference implementations with focus-aware lookup.&lt;/li&gt;&#xA;&lt;li&gt;Hierarchical spec decomposition into dependency-ordered sub-modules with interface synchronization.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;&#xA;&lt;p&gt;A controller learns (PPO) to route tasks among six agents depending on problem state. Symbolic solvers handle combinational logic exactly; neural agents handle timing/waveforms. A retrieval module injects reference patterns. Complex specs are decomposed hierarchically with cross-module interface synchronization before code generation and validation.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
