<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>MCMedia RSS Feed - Computer_Science</title>
<link>https://news.mcmedia.cam/feed/Computer_Science</link>
<description>MCMedia News RSS Feed - Computer_Science specific feed</description>
<docs>https://news.mcmedia.cam/rss-info.html</docs>
<generator>MCMedia RSS Generator v1.0</generator>
<lastBuildDate>Sun, 08 Mar 2026 19:10:39 +0000</lastBuildDate>
<atom:link href="https://news.mcmedia.cam/rss/rss_computer_science.xml" rel="self" type="application/rss+xml"/>
<item>
  <title>Data-Driven Optimization of Multi-Generational Cellular Networks: A Performance Classification Framework for Strategic Infrastructure Management</title>
  <link>https://arxiv.org/abs/2603.04425</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04425v1 Announce Type: new Abstract: The exponential growth in mobile data demand necessitates intelligent management of telecommunications infrastructure to ensure Quality of Service (QoS) and operational efficiency. This paper presents a comprehensive analysis of a multigenerational cellular network dataset, sourced from the OpenCelliD project, to identify patterns in network deployment, utilization, and infrastructure gaps. The methodology involves geographical, temporal, and performance analysis of 1,818 cell tower entries, predominantly Long Term Evolution (LTE), across three countries with a significant concentration in Pakistan. Key findings reveal the long-term persistence of legacy 2G/3G infrastructure in major urban centers, the existence of a substantial number of under-utilized towers representing opportunities for cost savings, and the identification of specific &quot;non-4G demand zones&quot; where active user bases are served by outdated technologies. By introducing a signal-density metric, we distinguish between absolute over-utilization and localized congestion. The results provide actionable intelligence for Mobile Network Operators (MNOs) to guide strategic LTE upgrades, optimize resource allocation, and bridge the digital divide in underserved regions.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>LEXA: Legal Case Retrieval via Graph Contrastive Learning with Contextualised LLM Embeddings</title>
  <link>https://arxiv.org/abs/2405.11791</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2405.11791v3 Announce Type: replace Abstract: Legal case retrieval (LCR) is a specialised information retrieval task aimed at identifying relevant cases given a query case. LCR holds pivotal significance in facilitating legal practitioners to locate legal precedents. Existing LCR methods predominantly rely on traditional lexical models or language models; however, they typically overlook the domain-specific structural information embedded in legal documents. Our previous work CaseGNN successfully harnesses text-attributed graphs and graph neural networks to incorporate structural legal information. Nonetheless, three key challenges remain in enhancing the representational capacity of CaseGNN: (1) The under-utilisation of rich edge information in text-attributed case graph (TACG). (2) The insufficiency of training signals for graph contrastive learning. (3) The lack of contextualised legal information in node and edge features. In this paper, the LEXA model, an extension of CaseGNN, is proposed to overcome these limitations by jointly leveraging rich edge information, enhanced training signals, and contextualised embeddings derived from large language models (LLMs). Specifically, an edge-updated graph attention layer (EUGAT) is proposed to comprehensively update node and edge features during graph modelling, resulting in a full utilisation of structural information of legal cases. Moreover, LEXA incorporates a novel graph contrastive learning objective with graph augmentation to provide additional training signals, thereby strengthening the model&#39;s legal comprehension capabilities. What&#39;s more, LLMs are employed to generate node and edge features for TACG. Extensive experiments on two benchmark datasets demonstrate that LEXA not only significantly improves CaseGNN but also achieves supreme performance compared to state-of-the-art LCR methods.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>History-Deterministic B\&quot;uchi Automata are Succinct</title>
  <link>https://arxiv.org/abs/2603.05380</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05380v1 Announce Type: new Abstract: We describe a history-deterministic B\&quot;uchi automaton that has strictly less states than every language-equivalent deterministic B\&quot;uchi automaton. This solves a problem that had been open since the introduction of history-determinism and actively investigated for over a decade. Our example automaton has 65 states, and proving its succinctness requires the combination of theoretical insights together with the aid of computers.</description>
  <dc:source>Computer_Science/cs.FL_(Formal_Languages_and_Automata_Theory)</dc:source>
</item>
<item>
  <title>EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue</title>
  <link>https://arxiv.org/abs/2603.04815</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04815v1 Announce Type: new Abstract: Manipulative communication, such as gaslighting, guilt-tripping, and emotional coercion, is often difficult for individuals to recognize. Existing agentic AI systems lack the structured, longitudinal memory to track these subtle, context-dependent tactics, often failing due to limited context windows and catastrophic forgetting. We introduce EchoGuard, an agentic AI framework that addresses this gap by using a Knowledge Graph (KG) as the agent&#39;s core episodic and semantic memory. EchoGuard employs a structured Log-Analyze-Reflect loop: (1) users log interactions, which the agent structures as nodes and edges in a personal, episodic KG (capturing events, emotions, and speakers); (2) the system executes complex graph queries to detect six psychologically-grounded manipulation patterns (stored as a semantic KG); and (3) an LLM generates targeted Socratic prompts grounded by the subgraph of detected patterns, guiding users toward self-discovery. This framework demonstrates how the interplay between agentic architectures and Knowledge Graphs can empower individuals in recognizing manipulative communication while maintaining personal autonomy and safety. We present the theoretical foundation, framework design, a comprehensive evaluation strategy, and a vision to validate this approach.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Adaptive Personalized Federated Reinforcement Learning for RIS-Assisted Aerial Relays in SAGINs with Fluid Antennas</title>
  <link>https://arxiv.org/abs/2603.04788</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04788v1 Announce Type: new Abstract: Space-air-ground integrated networks (SAGINs) interconnect satellites, uncrewed aerial vehicles (UAVs), and ground devices to enable flexible and ubiquitous wireless services. The integration of reconfigurable intelligent surfaces (RISs) and fluid antenna systems (FASs) further enhances radio environment controllability. However, the tight integration of cross-layer facilities and radio enhancement technologies leads to pronounced environmental dynamics and heterogeneity, posing fundamental challenges for system modeling and optimization in large-scale SAGINs. This paper investigates a SAGIN in which low Earth orbit (LEO) satellite constellations communicate with multiple ground hotspots via RIS-assisted UAV relays, serving both FAS-equipped and conventional users. A system model is developed that explicitly captures satellite mobility, UAV trajectories, RIS phase control, and heterogeneous user reception capabilities. Accordingly, a multi-hotspot downlink rate maximization problem is studied, whose solvability is analyzed through a hierarchical Stackelberg game. To address heterogeneous and time-varying multi-hotspot environments, an adaptive personalized federated reinforcement learning (FRL) algorithm is proposed for adaptive optimization of UAV trajectories and RIS phase controls. Simulation results demonstrate superior performance and validate the effectiveness of personalization in dynamic heterogeneous SAGIN scenarios.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning</title>
  <link>https://arxiv.org/abs/2603.04898</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04898v1 Announce Type: cross Abstract: This demonstration presents U-Parking, a distributed Ultra-Wideband (UWB)-assisted autonomous parking system. By integrating Large Language Models (LLMs)-assisted planning with robust fusion localization and trajectory tracking, it enables reliable automated parking in challenging indoor environments, as validated through real-vehicle demonstrations.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>V2N-Based Algorithm and Communication Protocol for Autonomous Non-Stop Intersections</title>
  <link>https://arxiv.org/abs/2603.05165</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05165v1 Announce Type: new Abstract: Intersections are critical areas for road safety and traffic efficiency, accounting for a significant portion of vehicle crashes and fatalities. While connected and autonomous vehicle (CAV) technologies offer a promising solution for autonomous intersection management, many existing proposals either rely on computationally heavy centralized controllers or overlook the practical impairments of real-world communication networks. This paper introduces seamless mobility of vehicles over intersections (Moveover), a novel algorithm comprising a vehicle-to-network (V2N) communication protocol designed to let vehicles cross autonomous intersections without stopping. Moveover delegates trajectory and speed profile selection to individual vehicles, allowing each CAV to optimize them according to its unique kinematic characteristics. Simultaneously, a local intersection controller prevents collisions through deterministic conflict zone reservations. The algorithm is rigorously evaluated under both ideal and non-ideal networking conditions, specifically modeling 4G and 5G communication delays, across multiple layouts including single-lane, multi-lane, and roundabouts. Furthermore, we test Moveover on a real urban map with multiple intersections. Simulation results demonstrate that Moveover significantly outperforms baseline strategies, offering substantial improvements in travel times and reduced pollutant emissions.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery</title>
  <link>https://arxiv.org/abs/2510.22503</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2510.22503v2 Announce Type: replace-cross Abstract: Materials discovery requires navigating vast chemical and structural spaces while satisfying multiple, often conflicting, objectives. We present LLM-guided Evolution for MAterials discovery (LLEMA), a unified framework that couples the scientific knowledge embedded in large language models with chemistry-informed evolutionary rules and memory-based refinement. At each iteration, an LLM proposes crystallographically specified candidates under explicit property constraints; a surrogate-augmented oracle estimates physicochemical properties; and a multi-objective scorer updates success/failure memories to guide subsequent generations. Evaluated on 14 realistic tasks that span electronics, energy, coatings, optics, and aerospace, LLEMA discovers candidates that are chemically plausible, thermodynamically stable, and property-aligned, achieving higher hit rates and improved Pareto front quality relative to generative and LLM-only baselines. Ablation studies confirm the importance of rule-guided generation, memory-based refinement, and surrogate prediction. By enforcing synthesizability and multi-objective trade-offs, LLEMA provides a principled approach to accelerating practical materials discovery. Project website: https://scientific-discovery.github.io/llema-project/</description>
  <dc:source>Computer_Science/cs.NE_(Neural_and_Evolutionary_Computing)</dc:source>
</item>
<item>
  <title>EmboTeam: Grounding LLM Reasoning into Reactive Behavior Trees via PDDL for Embodied Multi-Robot Collaboration</title>
  <link>https://arxiv.org/abs/2601.11063</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2601.11063v2 Announce Type: replace-cross Abstract: In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (LLMs) show promise in instruction parsing and preliminary planning, they exhibit limitations in long-term reasoning and dynamic multi-robot coordination. We propose EmboTeam, a novel embodied multi-robot task planning framework that addresses these issues through a three-stage cascaded architecture: 1) It leverages an LLM to parse instructions and generate Planning Domain Definition Language (PDDL) problem descriptions, thereby transforming commands into formal planning problems; 2) It combines the semantic reasoning of LLMs with the search capabilities of a classical planner to produce optimized action sequences; 3) It compiles the resulting plan into behavior trees for reactive control. The framework supports dynamically sized heterogeneous robot teams via a shared blackboard mechanism for communication and state synchronization. To validate our approach, we introduce the MACE-THOR benchmark dataset, comprising 42 complex tasks across 8 distinct household layouts. Experiments show EmboTeam improves the task success rate from 12% to 55% and goal condition recall from 32% to 72% over the LaMMA-P baseline.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding</title>
  <link>https://arxiv.org/abs/2512.03194</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2512.03194v3 Announce Type: replace-cross Abstract: Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling for lifelong Multi-Agent Pickup-and-Delivery (MAPD) and propose a hybrid method that couples learning-based global guidance with lightweight optimization. A graph neural network policy trained via reinforcement learning outputs a desired distribution of free agents over an aggregated warehouse graph. This signal is converted into region-to-region rebalancing through a minimum-cost flow, and finalized by small, local assignment problems, preserving accuracy while keeping per-step latency within a 1 s compute budget. We call this approach GRAND: a hierarchical algorithm that relies on Guidance, Rebalancing, and Assignment to explicitly leverage the workspace Network structure and Dispatch agents to tasks. On congested warehouse benchmarks from the League of Robot Runners (LoRR) with up to 500 agents, our approach improves throughput by up to 10% over the 2024 winning scheduler while maintaining real-time execution. The results indicate that coupling graph-structured learned guidance with tractable solvers reduces congestion and yields a practical, scalable blueprint for high-throughput scheduling in large fleets.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Foam-Agent: Towards Automated Intelligent CFD Workflows</title>
  <link>https://arxiv.org/abs/2505.04997</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2505.04997v2 Announce Type: replace-cross Abstract: Computational fluid dynamics (CFD) has been the main workhorse of computational physics. Yet its steep learning curve and fragmented, multi-stage workflow create significant barriers. To address these challenges, we present Foam-Agent, a multi-agent framework leveraging large language models (LLMs) to automate the end-to-end CFD workflow from a single natural language prompt. Foam-Agent orchestrates the comprehensive simulation workflow from mesh generation and high-performance computing job scripting to post-processing visualization. The system integrates retrieval-augmented generation with dependency-aware scheduling to synthesize high-fidelity simulation configurations. Furthermore, Foam-Agent adopts the Model Context Protocol to expose its core functions as discrete, callable tools. This allows for flexible integration and use by any other agentic systems. Evaluated on 110 simulation tasks, Foam-Agent achieved a state-of-the-art execution success rate of 88.2% without expert intervention. These results demonstrate how specialized multi-agent systems can effectively reduce expertise barriers and streamline complex fluid simulations.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Conflict-Based Search as a Protocol: A Multi-Agent Motion Planning Protocol for Heterogeneous Agents, Solvers, and Independent Tasks</title>
  <link>https://arxiv.org/abs/2510.00425</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2510.00425v2 Announce Type: replace Abstract: Imagine the future construction site, hospital, or office with dozens of robots bought from different manufacturers. How can we enable these different robots to effectively move in a shared environment, given that each robot may have its own independent motion planning system? This work shows how we can get efficient collision-free movements between algorithmically heterogeneous agents by using Conflict-Based Search (Sharon et al. 2015) as a protocol. At its core, the CBS Protocol requires one specific single-agent motion planning API; finding a collision-free path that satisfies certain space-time constraints. Given such an API, CBS uses a central planner to find collision-free paths - independent of how the API is implemented. We demonstrate how this protocol enables multi-agent motion planning for a heterogeneous team of agents completing independent tasks with a variety of single-agent planners including: Heuristic Search (e.g., A*), Sampling Based Search (e.g., RRT), Optimization (e.g., Direct Collocation), Diffusion, and Reinforcement Learning.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Real-Time BDI Agents: a model and its implementation</title>
  <link>https://arxiv.org/abs/2205.00979</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2205.00979v2 Announce Type: replace Abstract: The BDI model proved to be effective for developing applications requiring high-levels of autonomy and to deal with the complexity and unpredictability of real-world scenarios. The model, however, has significant limitations in reacting and handling contingencies within the given real-time constraints. Without an explicit representation of time, existing real-time BDI implementations overlook the temporal implications during the agent&#39;s decision process that may result in delays or unresponsiveness of the system when it gets overloaded. In this paper, we redefine the BDI agent control loop inspired by well established algorithms for real-time systems to ensure a proper reaction of agents and their effective application in typical real-time domains. Our model proposes an effective real-time management of goals, plans, and actions with respect to time constraints and resources availability. We propose an implementation of the model for a resource-collection video-game and we validate the approach against a set of significant scenarios.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>The effect of a toroidal opinion space on opinion bi-polarisation</title>
  <link>https://arxiv.org/abs/2603.05337</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05337v1 Announce Type: cross Abstract: Many models of opinion dynamics include measures of distance between opinions. Such models are susceptible to boundary effects where the choice of the topology of the opinion space may influence the dynamics. In this paper we study an opinion dynamics model following the seminal model by Axelrod, with the goal of understanding the effect of a toroidal opinion space. To do this we systematically compare two versions of the model: one with toroidal opinion space and one with cubic opinion space. In their most basic form the two versions of our model result in similar dynamics (consensus is attained eventually). However, as we include bounded confidence and eventually per agent weighting of opinion elements the dynamics become quite contrasting. The toroidal opinion space consistently allows for a greater number of groups in steady state than the cubic opinion space model. Furthermore, the outcome of the dynamics in the toroidal opinion space model are more sensitive to the inclusion of extensions than in the cubic opinion space model.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus</title>
  <link>https://arxiv.org/abs/2603.05129</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05129v1 Announce Type: cross Abstract: Diagnosing hepatic diseases accurately and interpretably is critical, yet it remains challenging in real-world clinical settings. Existing AI approaches for clinical diagnosis often lack transparency, structured reasoning, and deployability. Recent efforts have leveraged large language models (LLMs), retrieval-augmented generation (RAG), and multi-agent collaboration. However, these approaches typically retrieve evidence from a single source and fail to support iterative, role-specialized deliberation grounded in structured clinical data. To address this, we propose MedCoRAG (i.e., Medical Collaborative RAG), an end-to-end framework that generates diagnostic hypotheses from standardized abnormal findings and constructs a patient-specific evidence package by jointly retrieving and pruning UMLS knowledge graph paths and clinical guidelines. It then performs Multi-Agent Collaborative Reasoning: a Router Agent dynamically dispatches Specialist Agents based on case complexity; these agents iteratively reason over the evidence and trigger targeted re-retrievals when needed, while a Generalist Agent synthesizes all deliberations into a traceable consensus diagnosis that emulates multidisciplinary consultation. Experimental results on hepatic disease cases from MIMIC-IV show that MedCoRAG outperforms existing methods and closed-source models in both diagnostic performance and reasoning interpretability.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Jagarin: A Three-Layer Architecture for Hibernating Personal Duty Agents on Mobile</title>
  <link>https://arxiv.org/abs/2603.05069</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05069v1 Announce Type: cross Abstract: Personal AI agents face a fundamental deployment paradox on mobile: persistent background execution drains battery and violates platform sandboxing policies, yet purely reactive agents miss time-sensitive obligations until the user remembers to ask. We present Jagarin, a three-layer architecture that resolves this paradox through structured hibernation and demand-driven wake. The first layer, DAWN (Duty-Aware Wake Network), is an on-device heuristic engine that computes a composite urgency score from four signals: duty-typed optimal action windows, user behavioral engagement prediction, opportunity cost of inaction, and cross-duty batch resonance. It uses adaptive per-user thresholds to decide when a sleeping agent should nudge or escalate. The second layer, ARIA (Agent Relay Identity Architecture), is a commercial email identity proxy that routes the full commercial inbox -- obligations, promotional offers, loyalty rewards, and platform updates -- to appropriate DAWN handlers by message category, eliminating cold-start and removing manual data entry. The third layer, ACE (Agent-Centric Exchange), is a protocol framework for direct machine-readable communication from institutions to personal agents, replacing human-targeted email as the canonical channel. Together, these three layers form a complete stack from institutional signal to on-device action, without persistent cloud state, continuous background execution, or privacy compromise. A working Flutter prototype is demonstrated on Android, combining all three layers with an ephemeral cloud agent invoked only on user-initiated escalation.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>RepoLaunch: Automating Build&amp;Test Pipeline of Code Repositories on ANY Language and ANY Platform</title>
  <link>https://arxiv.org/abs/2603.05026</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05026v1 Announce Type: cross Abstract: Building software repositories typically requires significant manual effort. Recent advances in large language model (LLM) agents have accelerated automation in software engineering (SWE). We introduce RepoLaunch, the first agent capable of automatically resolving dependencies, compiling source code, and extracting test results for repositories across arbitrary programming languages and operating systems. To demonstrate its utility, we further propose a fully automated pipeline for SWE dataset creation, where task design is the only human intervention. RepoLaunch automates the remaining steps, enabling scalable benchmarking and training of coding agents and LLMs. Notably, several works on agentic benchmarking and training have recently adopted RepoLaunch for automated task generation.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems</title>
  <link>https://arxiv.org/abs/2603.05000</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05000v1 Announce Type: cross Abstract: Autonomous Mobility-on-Demand (AMoD) systems promise to revolutionize urban transportation by providing affordable on-demand services to meet growing travel demand. However, realistic AMoD markets will be competitive, with multiple operators competing for passengers through strategic pricing and fleet deployment. While reinforcement learning has shown promise in optimizing single-operator AMoD control, existing work fails to capture competitive market dynamics. We investigate the impact of competition on policy learning by introducing a multi-operator reinforcement learning framework where two operators simultaneously learn pricing and fleet rebalancing policies. By integrating discrete choice theory, we enable passenger allocation and demand competition to emerge endogenously from utility-maximizing decisions. Experiments using real-world data from multiple cities demonstrate that competition fundamentally alters learned behaviors, leading to lower prices and distinct fleet positioning patterns compared to monopolistic settings. Notably, we demonstrate that learning-based approaches are robust to the additional stochasticity of competition, with competitive agents successfully converging to effective policies while accounting for partially unobserved competitor strategies.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Auction-Based RIS Allocation With DRL: Controlling the Cost-Performance Trade-Off</title>
  <link>https://arxiv.org/abs/2603.04433</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04433v1 Announce Type: cross Abstract: We study the allocation of reconfigurable intelligent surfaces (RISs) in a multi-cell wireless network, where base stations compete for control of shared RIS units deployed at the cell edges. These RISs, provided by an independent operator, are dynamically leased to the highest bidder using a simultaneously ascending auction format. Each base station estimates the utility of acquiring additional RISs based on macroscopic channel parameters, enabling a scalable and low-overhead allocation mechanism. To optimize the bidding behavior, we integrate deep reinforcement learning (DRL) agents that learn to maximize performance while adhering to budget constraints. Through simulations in clustered cell-edge environments, we demonstrate that reinforcement learning (RL)-based bidding significantly outperforms heuristic strategies, achieving optimal trade-offs between cost and spectral efficiency. Furthermore, we introduce a tunable parameter that governs the bidding aggressiveness of RL agents, enabling a flexible control of the trade-off between network performance and expenditure. Our results highlight the potential of combining auction-based allocation with adaptive RL mechanisms for efficient and fair utilization of RISs in next-generation wireless networks.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Dual-Interaction-Aware Cooperative Control Strategy for Alleviating Mixed Traffic Congestion</title>
  <link>https://arxiv.org/abs/2603.03848</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03848v1 Announce Type: cross Abstract: As Intelligent Transportation System (ITS) develops, Connected and Automated Vehicles (CAVs) are expected to significantly reduce traffic congestion through cooperative strategies, such as in bottleneck areas. However, the uncertainty and diversity in the behaviors of Human-Driven Vehicles (HDVs) in mixed traffic environments present major challenges for CAV cooperation. This paper proposes a Dual-Interaction-Aware Cooperative Control (DIACC) strategy that enhances both local and global interaction perception within the Multi-Agent Reinforcement Learning (MARL) framework for Connected and Automated Vehicles (CAVs) in mixed traffic bottleneck scenarios. The DIACC strategy consists of three key innovations: 1) A Decentralized Interaction-Adaptive Decision-Making (D-IADM) module that enhances actor&#39;s local interaction perception by distinguishing CAV-CAV cooperative interactions from CAV-HDV observational interactions. 2) A Centralized Interaction-Enhanced Critic (C-IEC) that improves critic&#39;s global traffic understanding through interaction-aware value estimation, providing more accurate guidance for policy updates. 3) A reward design that employs softmin aggregation with temperature annealing to prioritize interaction-intensive scenarios in mixed traffic. Additionally, a lightweight Proactive Safety-based Action Refinement (PSAR) module applies rule-based corrections to accelerate training convergence. Experimental results demonstrate that DIACC significantly improves traffic efficiency and adaptability compared to rule-based and benchmark MARL models.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning</title>
  <link>https://arxiv.org/abs/2603.04833</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04833v1 Announce Type: new Abstract: Communication can improve coordination in partially observed multi-agent reinforcement learning (MARL), but learning \emph{when} and \emph{who} to communicate with requires choosing among many possible sender-recipient pairs, and the effect of any single message on future reward is hard to isolate. We introduce \textbf{SCoUT} (\textbf{S}calable \textbf{Co}mmunication via \textbf{U}tility-guided \textbf{T}emporal grouping), which addresses both these challenges via temporal and agent abstraction within traditional MARL. During training, SCoUT resamples \textit{soft} agent groups every \(K\) environment steps (macro-steps) via Gumbel-Softmax; these groups are latent clusters that induce an affinity used as a differentiable prior over recipients. Using the same assignments, a group-aware critic predicts values for each agent group and maps them to per-agent baselines through the same soft assignments, reducing critic complexity and variance. Each agent is trained with a three-headed policy: environment action, send decision, and recipient selection. To obtain precise communication learning signals, we derive counterfactual communication advantages by analytically removing each sender&#39;s contribution from the recipient&#39;s aggregated messages. This counterfactual computation enables precise credit assignment for both send and recipient-selection decisions. At execution time, all centralized training components are discarded and only the per-agent policy is run, preserving decentralized execution. Project website, videos and code: \hyperlink{https://scout-comm.github.io/}{https://scout-comm.github.io/}</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Strategic Interactions in Multi-Level Stackelberg Games with Non-Follower Agents and Heterogeneous Leaders</title>
  <link>https://arxiv.org/abs/2603.04628</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04628v1 Announce Type: new Abstract: Strategic interaction in congested systems is commonly modelled using Stackelberg games, where competing leaders anticipate the behaviour of self-interested followers. A key limitation of existing models is that they typically ignore agents who do not directly participate in market competition, yet both contribute to and adapt to congestion. Although such non-follower agents do not generate revenue or respond to market incentives, their behaviour reshapes congestion patterns, which in turn affects the decisions of leaders and followers through shared resources. We argue that overlooking non-followers leads to systematically distorted equilibrium predictions in congestion-coupled markets. To address this, we introduce a three-level Stackelberg framework with heterogeneous leaders differing in decision horizons and feasible actions, strategic followers, and non-follower agents that captures bidirectional coupling between infrastructure decisions, competition, and equilibrium congestion. We instantiate the framework in the context of electric vehicle (EV) charging infrastructure, where charging providers compete with rivals, while EV and non-EV traffic jointly shape congestion. The model illustrates how explicitly accounting for non-followers and heterogeneous competitors qualitatively alters strategic incentives and equilibrium outcomes. Beyond EV charging, the framework applies to a broad class of congestion-coupled multi-agent systems in mobility, energy, and computing markets.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>On Solving String Equations via Powers and Parikh Images</title>
  <link>https://arxiv.org/abs/2603.05273</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05273v1 Announce Type: new Abstract: We present a new approach for solving string equations as extensions of Nielsen transformations. Key to our work are the combination of three techniques: a power operator for strings; generalisations of Parikh images; and equality decomposition. Using these methods allows us to solve complex string equations, including less commonly encountered SMT inputs over strings.</description>
  <dc:source>Computer_Science/cs.LO_(Logic_in_Computer_Science)</dc:source>
</item>
<item>
  <title>Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning</title>
  <link>https://arxiv.org/abs/2602.09229</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.09229v2 Announce Type: replace-cross Abstract: Cosine similarity is prevalent in contrastive learning, yet it assumes embedding magnitude is noise. We systematically study magnitude learning through a framework that independently controls query-side and document-side normalization. First, magnitude learning benefits retrieval and Retrieval-Augmented Generation (RAG) where queries and documents have distinct roles, but not Semantic Textual Similarity (STS) or CLIP where inputs are interchangeable. Second, query and document magnitudes serve different roles: document magnitude scales inference scores, while query magnitude modulates training gradients. Normalizing one side consistently outperforms both sides, and the Fisher Information Matrix condition number predicts which side to normalize. Third, magnitude learning improves out-of-domain generalization more than in-domain performance, with gains up to +72\% vs +7\%, requiring retrieval-specialized pre-training or sufficient data. These findings provide practical guidance for retrieval and RAG across text and vision domains.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>Oracle-efficient Hybrid Learning with Constrained Adversaries</title>
  <link>https://arxiv.org/abs/2603.04546</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04546v1 Announce Type: new Abstract: The Hybrid Online Learning Problem, where features are drawn i.i.d. from an unknown distribution but labels are generated adversarially, is a well-motivated setting positioned between statistical and fully-adversarial online learning. Prior work has presented a dichotomy: algorithms that are statistically-optimal, but computationally intractable (Wu et al., 2023), and algorithms that are computationally-efficient (given an ERM oracle), but statistically-suboptimal (Wu et al., 2024). This paper takes a significant step towards achieving statistical optimality and computational efficiency simultaneously in the Hybrid Learning setting. To do so, we consider a structured setting, where the Adversary is constrained to pick labels from an expressive, but fixed, class of functions $R$. Our main result is a new learning algorithm, which runs efficiently given an ERM oracle and obtains regret scaling with the Rademacher complexity of a class derived from the Learner&#39;s hypothesis class $H$ and the Adversary&#39;s label class $R$. As a key corollary, we give an oracle-efficient algorithm for computing equilibria in stochastic zero-sum games when action sets may be high-dimensional but the payoff function exhibits a type of low-dimensional structure. Technically, we develop a number of tools for the design and analysis of our learning algorithm, including a novel Frank-Wolfe reduction with &quot;truncated entropy regularizer&quot; and a new tail bound for sums of &quot;hybrid&quot; martingale difference sequences.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Augmenting representations with scientific papers</title>
  <link>https://arxiv.org/abs/2603.04516</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04516v1 Announce Type: new Abstract: Astronomers have acquired vast repositories of multimodal data, including images, spectra, and time series, complemented by decades of literature that analyzes astrophysical sources. Still, these data sources are rarely systematically integrated. This work introduces a contrastive learning framework designed to align X-ray spectra with domain knowledge extracted from scientific literature, facilitating the development of shared multimodal representations. Establishing this connection is inherently complex, as scientific texts encompass a broader and more diverse physical context than spectra. We propose a contrastive pipeline that achieves a 20% Recall@1% when retrieving texts from spectra, proving that a meaningful alignment between these modalities is not only possible but capable of accelerating the interpretation of rare or poorly understood sources. Furthermore, the resulting shared latent space effectively encodes physically significant information. By fusing spectral and textual data, we improve the estimation of 20 physical variables by 16-18% over unimodal spectral baselines. Our results indicate that a Mixture of Experts (MoE) strategy, which leverages both unimodal and shared representations, yields superior performance. Finally, outlier analysis within the multimodal latent space identifies high-priority targets for follow-up investigation, including a candidate pulsating ULX (PULX) and a gravitational lens system. Importantly, this framework can be extended to other scientific domains where aligning observational data with existing literature is possible.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring</title>
  <link>https://arxiv.org/abs/2603.04986</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04986v1 Announce Type: new Abstract: Sequential Recommendation (SR) predicts users next interactions by modeling the temporal order of their historical behaviors. Existing approaches, including traditional sequential models and generative recommenders, achieve strong performance but primarily rely on explicit interactions such as clicks or purchases while overlooking item exposures. This ignorance introduces selection bias, where exposed but unclicked items are misinterpreted as disinterest, and exposure bias, where unexposed items are treated as irrelevant. Effectively addressing these biases requires distinguishing between items that were &quot;not exposed&quot; and those that were &quot;not of interest&quot;, which cannot be reliably inferred from correlations in historical data. Counterfactual reasoning provides a natural solution by estimating user preferences under hypothetical exposure, and Inverse Propensity Scoring (IPS) is a common tool for such estimation. However, conventional IPS methods are static and fail to capture the sequential dependencies and temporal dynamics of user behavior. To overcome these limitations, we propose Time aware Inverse Propensity Scoring (TIPS). Unlike traditional static IPS, TIPS effectively accounts for sequential dependencies and temporal dynamics, thereby capturing user preferences more accurately. Extensive experiments show that TIPS consistently enhances recommendation performance as a plug-in for various sequential recommenders. Our code will be publicly available upon acceptance.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>MEC Task Offloading in AIoT: A User-Centric DRL Model Splitting Inference Scheme</title>
  <link>https://arxiv.org/abs/2504.16729</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2504.16729v2 Announce Type: replace Abstract: With the rapid development of the Artificial Intelligence of Things (AIoT), mobile edge computing (MEC) becomes an essential technology underpinning AIoT applications. However, multi-angle resource constraints, multi-user task competition, and the complexity of task offloading decisions in dynamic MEC environments present new technical challenges. Therefore, a user-centric deep reinforcement learning (DRL) model splitting inference scheme is proposed to address the problem. This scheme combines model splitting inference technology and designs a UCMS_MADDPG-based offloading algorithm to realize efficient model splitting inference responses in the dynamic MEC environment with multi-angle resource constraints. Specifically, we formulate a joint optimization problem that integrates resource allocation, server selection, and task offloading, aiming to minimize the weighted sum of task execution delay and energy consumption. We also introduce a user-server co-selection algorithm to address the selection issue between users and servers. Furthermore, we design an algorithm centered on user pre-decision to coordinate the outputs of continuous and discrete hybrid decisions, and introduce a priority sampling mechanism based on reward-error trade-off to optimize the experience replay mechanism of the network. Simulation results show that the proposed UCMS_MADDPG-based offloading algorithm demonstrates superior overall performance compared with other benchmark algorithms in dynamic environments.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Analysis of Proactive Uncoordinated Techniques to Mitigate Interference in FMCW Automotive Radars</title>
  <link>https://arxiv.org/abs/2603.04944</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04944v1 Announce Type: new Abstract: Modern vehicles increasingly rely on advanced driver-assistance systems (ADAS), with radars playing a key role due to their cost-effectiveness and reliable performance. However, the growing number of radars operating in the same spectrum raises concerns about mutual interference, which could lead to system malfunctions and potential safety risks. This study focuses on a scenario in which all vehicles are equipped with frequency-modulated continuous-wave (FMCW) radars, and it assesses the impact of interference on radar functionality - expressed in terms of probability of failure - by considering both direct and reflected signals. The radars may employ one of the following proactive mitigation methods to reduce the impact of interference, all of which require no inter-vehicle coordination but differ in complexity: (i) random carrier-frequency hopping on a frame-by-frame basis, (ii) random carrier-frequency hopping on a chirp-by-chirp basis, and (iii) a directional, compass-based method specifically addressing interference from opposite directions, which can be combined with either of the two previous methods. In this work, we assume realistic simulated road traffic scenarios and develop a novel model that captures correlated interference and accounts for the main radar setting parameters. Results reveal that dense scenarios pose a high risk of radar malfunctions. Among the analyzed methods, chirp-by-chirp frequency hopping emerges as the most effective approach to mitigate interference and ensure system reliability, but only when combined with a sufficiently large bandwidth. The compass-based method, on the other hand, shows limited effectiveness and appears not worth the additional system complexity.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Selfish Cooperation Towards Low-Altitude Economy: Integrated Multi-Service Deployment with Resilient Federated Reinforcement Learning</title>
  <link>https://arxiv.org/abs/2603.04779</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04779v1 Announce Type: new Abstract: The low-altitude economy (LAE) is a rapidly emerging paradigm that builds a service-centric economic ecosystem through large-scale and sustainable uncrewed aerial vehicle (UAV)-enabled service provisioning, reflecting the transition of the 6G era from technological advancement toward commercial deployment. The significant market potential of LAE attracts an increasing number of service providers (SPs), resulting in intensified competition in service deployment. In this paper, we study a realistic LAE scenario in which multiple SPs dynamically deploy UAVs to deliver multiple services to user hotspots, aiming to jointly optimize communication and computation resource allocation. To resolve deployment competition among SPs, an authenticity-guaranteed auction mechanism is designed, and game-theoretic analysis is conducted to establish the solvability of the proposed resource allocation problem. Furthermore, a resilient federated reinforcement learning (FRL)-based solution is developed with strong fault tolerance, effectively countering transmission errors and malicious competition while facilitating potential cooperation among self-interested SPs. Simulation results demonstrate that the proposed approach significantly improves service performance and robustness compared with baseline methods, providing a practical and scalable solution for competitive LAE service deployment.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Body-scale NFC for wearables: human-centric body-scale NFC networking for ultra-low-power wearable devices (Demo of UTokyo Kawahara Lab 2025)</title>
  <link>https://arxiv.org/abs/2603.04777</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04777v1 Announce Type: new Abstract: Near Field Communication (NFC) is a promising technology for ultra-low-power wearables, yet its short communication range limits its use to narrow-area, point-to-point interactions. We propose a body-scale NFC networking system that extends NFC coverage around the body, enabling surface-to-multipoint communication with distributed NFC sensor tags. This demonstration introduces two key technologies: Meander NFC and picoRing NFC. First, Meander NFC expands a clothing-based NFC networking area up to body scale while enabling a stable readout of small NFC tags occupying 1% of the coverage area. Meander NFC uses a meander coil which creates a spatially confined inductive field along the textile surface, ensuring robust coupling with small tags while preventing undesired electromagnetic body coupling. Second, picoRing NFC solves the weak inductive coupling caused by distance and size mismatches. By leveraging middle-range NFC and coil optimization, picoRing NFC extends the communication range to connect these disparate nodes between the ring and wristband.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Transformer-Based Multipath Congestion Control: A Decoupled Approach for Wireless Uplinks</title>
  <link>https://arxiv.org/abs/2603.04550</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04550v1 Announce Type: new Abstract: The proliferation of artificial intelligence applications on edge devices necessitates efficient transport protocols that leverage multi-homed connectivity across heterogeneous networks. While Multipath TCP enables bandwidth aggregation, its in-kernel congestion control mechanisms lack the programmability and flexibility needed for achieving efficient transmission. Additionally, inherent measurement noise renders network state partially observable, challenging data-driven approaches like deep reinforcement learning (DRL). To address these challenges, we propose a Transformer-based Congestion Control Optimization (TCCO) framework for multipath transport. TCCO employs a decoupled architecture that offloads control decisions to an external decision engine via a lightweight in-kernel client and user-space proxy, enabling edge devices to leverage external computational resources while maintaining TCP/IP compatibility. The Transformer-based DRL agent in the external decision engine uses self-attention to capture temporal dependencies, filter noise, and coordinate control across subflows through a unified policy. Extensive evaluation on both simulated and real dual-band Wi-Fi testbeds demonstrates that TCCO achieves superior adaptability and performance than state-of-the-art baselines, validating the feasibility and effectiveness of TCCO for wireless networks.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models</title>
  <link>https://arxiv.org/abs/2603.04444</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04444v1 Announce Type: new Abstract: As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing -- selecting the right model for each query at inference time -- has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The central innovation is composable signal orchestration: the system extracts heterogeneous signal types from each request -- from sub-millisecond heuristic features (keyword patterns, language detection, context length, role-based authorization) to neural classifiers (domain, embedding similarity, factual grounding, modality) -- and composes them through configurable Boolean decision rules into deployment-specific routing policies. Different deployment scenarios -- multi-cloud enterprise, privacy-regulated, cost-optimized, latency-sensitive -- are expressed as different signal-decision configurations over the same architecture, without code changes. Matched decisions drive semantic model routing: over a dozen of selection algorithms analyze request characteristics to find the best model cost-effectively, while per-decision plugin chains enforce privacy and safety constraints (jailbreak detection, PII filtering, hallucination detection via the three-stage HaluGate pipeline). The system provides OpenAI API support for stateful multi-turn conversations, multi-endpoint and multi-provider routing across heterogeneous backends (vLLM, OpenAI, Anthropic, Azure, Bedrock, Gemini, Vertex AI), and a pluggable authorization factory supporting multiple auth providers. Deployed in production as an Envoy external processor, the architecture demonstrates that composable signal orchestration enables a single routing framework to serve diverse deployment scenarios with differentiated cost, privacy, and safety policies.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Towards Green Connectivity: An AI-Driven Mesh Architecture for Sustainable and Scalable Wireless Networks</title>
  <link>https://arxiv.org/abs/2603.04442</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04442v1 Announce Type: new Abstract: Traditional macro-cell and micro-cell infrastructures suffer from severe inefficiencies, with current macro-cell networks operating at less than 5 percent energy efficiency, leading to nearly 95 percent of RF power wasted in covering vacant areas. The problem becomes particularly acute in high-density scenarios such as the Hajj, where approximately 7,000 temporary diesel-powered towers are deployed each year, consuming 56 million liters of fuel and emitting around 148,000 tons of CO2, yet still experiencing failure rates of nearly 40 percent at peak demand. To overcome these limitations, we propose an AI-driven mesh architecture based on three integrated enablers: (i) proximity-based deployment of low-power nodes within 250 to 300 meters of users, yielding a 38 dB link-budget gain and up to 6000 times efficiency improvement; (ii) spatial frequency reuse, which partitions cells into multiple non-interfering zones and achieves nearly 20 times capacity gain; and (iii) predictive network intelligence leveraging LSTMs to forecast traffic 5 seconds ahead, enabling smarter allocation and reducing congestion by about 60 percent. System-level evaluations combining propagation modeling and validated link-budget analysis demonstrate that this architecture delivers up to an 84 times improvement in useful energy delivery, reduces deployment costs by nearly 74 percent, and eliminates diesel dependence through solar-powered operations, thereby enabling sustainable, green connectivity for both rural and ultra-dense urban environments.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Energy Efficiency Testing and Modeling of a Commercial O-RAN System</title>
  <link>https://arxiv.org/abs/2603.04435</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04435v1 Announce Type: new Abstract: Network energy efficiency is of critical importance to mobile network operators for economic and ecological reasons. The advent of the O-RAN architecture has brought disaggregation and virtualization, and in order to achieve the highest energy savings gains, we need rigorous measurement, analysis, and modeling of energy consumption at both the component and system levels. However, there remains a lack of publicly-available, quantitative data characterizing the behavior of commercial-grade O-RAN systems. In this white paper, we present a detailed energy-efficiency characterization and modeling of a commercial O-RAN system based on comprehensive power and performance measurements, using a network deployment that faithfully replicates a production O-RAN network deployed by a wireless carrier. The results are drawn from an energy test campaign conducted through a joint collaboration between the Open RAN Center for Integration and Deployment (ORCID) Lab Testing and Evaluation (T&amp;E) Project and the Open Networking Foundation / Rutgers WINLAB Energy Efficiency R&amp;D project. The test environment includes an O-RAN system with an AWS-hosted O-CU, a dedicated-server O-DU, and six high-power, multi-band O-RUs. Our results identify the dominant factors influencing power consumption across the O-RAN stack and quantify energy usage variation under different operational and traffic scenarios. These measurements can be used by operators to parameterize power-consumption models, ultimately supporting data-driven energy optimization and more sustainable operation of commercial O-RAN networks.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Periodic Scheduling of Grouped Time-Triggered Signals on a Single Resource</title>
  <link>https://arxiv.org/abs/2603.04434</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04434v1 Announce Type: new Abstract: Time-triggered messages are of crucial importance in modern communication networks. Offline-generated schedules, which specify start times for periodic messages, enable us to achieve deterministic behavior in critical applications. In automotive and avionics domains, so-called signals (measurements and commands) are periodically generated and communicated (via messages) among sensors, controllers, and actuators. However, the message contains not only the useful signal data, but also necessary metadata, e.g., message ID. Metadata is stored as a header or tail and extends the message size; when the signal is very short (as it often is in applications), sending each in a separate message is inefficient. Thus, several signals are grouped into a single message, depending on their periodicity and length, and sent with just one header. Such an approach increases the utilization of the communication resource (link or bus), since less bandwidth is wasted on headers (Kuaban et al. 2021). However, grouping the signals into messages is complicated. The maximum size of the message (including the metadata) is finite, since longer messages have a lower probability of successful delivery. Also, longer messages are less flexible for scheduling in a periodic setting. This is similar to the work of Huan et al. (2019), where the compromise between energy efficiency and latency for IoT devices was investigated. In this paper, we study the fundamental problem of grouping time-triggered signals into messages and periodic scheduling of messages on a single resource.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Arterial Network Traffic State Prediction with Connected Vehicle Data: An Abnormality-Aware Spatiotemporal Network</title>
  <link>https://arxiv.org/abs/2603.04432</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04432v1 Announce Type: new Abstract: Emerging connected-vehicle (CV) data shows great potential in urban traffic monitoring and forecasting. However, prior CV-based studies on arterial traffic measures prediction are limited to simulated high-penetration scenarios or small networks, which are challenging to apply in real-world city-scale arterial networks. To address such gaps, we develop a CV data-based arterial traffic prediction framework with two components: (1) a two-stage traffic state extraction method that estimates vehicle-level traffic measures from CV trajectories and then aggregates them into network-level traffic state measures; (2) an Abnormality-aware spatiotemporal graph convolution network (AASTGCN) that adopts a dual-expert architecture to separately model normal and abnormal traffic, and jointly captures short-term traffic dynamics and long-term periodicity via spatiotemporal GCN with a gated-fusion mechanism. Real-world CV data are used to test our method in a large arterial network with 1,050 links. Experimental results show that: 1) The proposed traffic estimation method is effective for large arterial networks to provide real-time traffic measures (e.g., link-level average travel delay and queue length), which are critical for urban traffic operation and evaluation. 2) Abnormal traffic prediction is typically challenging for existing methods. By modeling abnormal cases separately from normal traffic in two dedicated experts, AASTGCN outperforms existing models for both normal and abnormal traffic conditions. 3) The gate-fusion mechanism adaptively balances real-time and historical information: it leverages more historical-periodic information in normal traffic and shifts a higher weight to real-time traffic dynamics for abnormal traffic deviating abruptly from historical patterns.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Constraint Learning for Non-confluent Proof Search</title>
  <link>https://arxiv.org/abs/2603.05258</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05258v1 Announce Type: new Abstract: Proof search in non-confluent tableau calculi, such as the connection tableau calculus, suffers from excess backtracking, but simple restrictions on backtracking are incomplete. We adopt constraint learning to reduce backtracking in the classical first-order connection calculus, while retaining completeness. An initial constraint learning language for connection-driven search is iteratively refined to greatly reduce backtracking in practice. The approach may be useful for proof search in other non-confluent tableau calculi.</description>
  <dc:source>Computer_Science/cs.LO_(Logic_in_Computer_Science)</dc:source>
</item>
<item>
  <title>Yukthi Opus: A Multi-Chain Hybrid Metaheuristic for Large-Scale NP-Hard Optimization</title>
  <link>https://arxiv.org/abs/2601.01832</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2601.01832v3 Announce Type: replace Abstract: We present Yukthi Opus (YO), a multi-chain hybrid metaheuristic designed for NP-hard optimization under explicit evaluation budget constraints. YO integrates three complementary mechanisms in a structured two-phase architecture: Markov Chain Monte Carlo (MCMC) for global exploration, greedy local search for exploitation, and simulated annealing with adaptive reheating to enable controlled escape from local minima. A dedicated burn-in phase allocates evaluations to probabilistic exploration, after which a hybrid optimization loop refines promising candidates. YO further incorporates a spatial blacklist mechanism to avoid repeated evaluation of poor regions and a multi-chain execution strategy to improve robustness and reduce sensitivity to initialization. We evaluate YO on three benchmarks: the Rastrigin function (5D) with ablation studies, the Traveling Salesman Problem with 50 to 200 cities, and the Rosenbrock function (5D) with comparisons against established optimizers including CMA-ES, Bayesian optimization, and accelerated particle swarm optimization. Results show that MCMC exploration and greedy refinement are critical for solution quality, while simulated annealing and multi-chain execution primarily improve stability and variance reduction. Overall, YO achieves competitive performance on large and multimodal problems while maintaining predictable evaluation budgets, making it suitable for expensive black-box optimization settings.</description>
  <dc:source>Computer_Science/cs.NE_(Neural_and_Evolutionary_Computing)</dc:source>
</item>
<item>
  <title>Neural Network-Based Parameter Estimation of a Labour Market Agent-Based Model</title>
  <link>https://arxiv.org/abs/2602.15572</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.15572v2 Announce Type: replace-cross Abstract: Agent-based modelling (ABM) is a widespread approach to simulate complex systems. Advancements in computational processing and storage have facilitated the adoption of ABMs across many fields; however, ABMs face challenges that limit their use as decision-support tools. A significant issue is parameter estimation in large-scale ABMs, particularly due to computational constraints on exploring the parameter space. This study evaluates a state-of-the-art simulation-based inference (SBI) framework that uses neural networks (NN) for parameter estimation. This framework is applied to an established labour market ABM based on job transition networks. The ABM is initiated with synthetic datasets and the real U.S. labour market. Next, we compare the effectiveness of summary statistics derived from a list of statistical measures with that learned by an embedded NN. The results demonstrate that the NN-based approach recovers the original parameters when evaluating posterior distributions across various dataset scales and improves efficiency compared to traditional Bayesian methods.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Why Do Neural Networks Forget: A Study of Collapse in Continual Learning</title>
  <link>https://arxiv.org/abs/2603.04580</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04580v1 Announce Type: new Abstract: Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests that structural collapse leads to loss of plasticity, as evidenced by changes in effective rank (eRank). This indicates a link to forgetting, since the networks lose the ability to expand their feature space to learn new tasks, which forces the network to overwrite existing representations. Therefore, in this study, we investigate the correlation between forgetting and collapse through the measurement of both weight and activation eRank. To be more specific, we evaluated four architectures, including MLP, ConvGRU, ResNet-18, and Bi-ConvGRU, in the split MNIST and Split CIFAR-100 benchmarks. Those models are trained through the SGD, Learning-without-Forgetting (LwF), and Experience Replay (ER) strategies separately. The results demonstrate that forgetting and collapse are strongly related, and different continual learning strategies help models preserve both capacity and performance in different efficiency.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling</title>
  <link>https://arxiv.org/abs/2603.04553</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04553v1 Announce Type: new Abstract: We introduce Latent Particle World Model (LPWM), a self-supervised object-centric world model scaled to real-world multi-object datasets and applicable in decision-making. LPWM autonomously discovers keypoints, bounding boxes, and object masks directly from video data, enabling it to learn rich scene decompositions without supervision. Our architecture is trained end-to-end purely from videos and supports flexible conditioning on actions, language, and image goals. LPWM models stochastic particle dynamics via a novel latent action module and achieves state-of-the-art results on diverse real-world and synthetic datasets. Beyond stochastic video modeling, LPWM is readily applicable to decision-making, including goal-conditioned imitation learning, as we demonstrate in the paper. Code, data, pre-trained models and video rollouts are available: https://taldatech.github.io/lpwm-web</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>RealWonder: Real-Time Physical Action-Conditioned Video Generation</title>
  <link>https://arxiv.org/abs/2603.05449</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05449v1 Announce Type: cross Abstract: Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding of how actions affect 3D scenes. We present RealWonder, the first real-time system for action-conditioned video generation from a single image. Our key insight is using physics simulation as an intermediate bridge: instead of directly encoding continuous actions, we translate them through physics simulation into visual representations (optical flow and RGB) that video models can process. RealWonder integrates three components: 3D reconstruction from single images, physics simulation, and a distilled video generator requiring only 4 diffusion steps. Our system achieves 13.2 FPS at 480x832 resolution, enabling interactive exploration of forces, robot actions, and camera controls on rigid objects, deformable bodies, fluids, and granular materials. We envision RealWonder opens new opportunities to apply video models in immersive experiences, AR/VR, and robot learning. Our code and model weights are publicly available in our project website: https://liuwei283.github.io/RealWonder/</description>
  <dc:source>Computer_Science/cs.GR_(Graphics)</dc:source>
</item>
<item>
  <title>SSR-GS: Separating Specular Reflection in Gaussian Splatting for Glossy Surface Reconstruction</title>
  <link>https://arxiv.org/abs/2603.05152</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05152v1 Announce Type: cross Abstract: In recent years, 3D Gaussian splatting (3DGS) has achieved remarkable progress in novel view synthesis. However, accurately reconstructing glossy surfaces under complex illumination remains challenging, particularly in scenes with strong specular reflections and multi-surface interreflections. To address this issue, we propose SSR-GS, a specular reflection modeling framework for glossy surface reconstruction. Specifically, we introduce a prefiltered Mip-Cubemap to model direct specular reflections efficiently, and propose an IndiASG module to capture indirect specular reflections. Furthermore, we design Visual Geometry Priors (VGP) that couple a reflection-aware visual prior via a reflection score (RS) to downweight the photometric loss contribution of reflection-dominated regions, with geometry priors derived from VGGT, including progressively decayed depth supervision and transformed normal constraints. Extensive experiments on both synthetic and real-world datasets demonstrate that SSR-GS achieves state-of-the-art performance in glossy surface reconstruction.</description>
  <dc:source>Computer_Science/cs.GR_(Graphics)</dc:source>
</item>
<item>
  <title>Understanding the Dynamics of Demonstration Conflict in In-Context Learning</title>
  <link>https://arxiv.org/abs/2603.04464</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04464v1 Announce Type: new Abstract: In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts, we study demonstration-dependent tasks requiring models to infer underlying patterns, a process we characterize as rule inference. We find that models suffer substantial performance degradation from a single demonstration with corrupted rule. This systematic misleading behavior motivates our investigation of how models process conflicting evidence internally. Using linear probes and logit lens analysis, we discover that under corruption models encode both correct and incorrect rules in intermediate layers but develop prediction confidence only in late layers, revealing a two-phase computational structure. We then identify attention heads for each phase underlying the reasoning failures: Vulnerability Heads in early-to-middle layers exhibit positional attention bias with high sensitivity to corruption, while Susceptible Heads in late layers significantly reduce support for correct predictions when exposed to the corrupted evidence. Targeted ablation validates our findings, with masking a small number of identified heads improving performance by over 10%.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>MAD-SmaAt-GNet: A Multimodal Advection-Guided Neural Network for Precipitation Nowcasting</title>
  <link>https://arxiv.org/abs/2603.04461</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04461v1 Announce Type: new Abstract: Precipitation nowcasting (short-term forecasting) is still often performed using numerical solvers for physical equations, which are computationally expensive and make limited use of the large volumes of available weather data. Deep learning models have shown strong potential for precipitation nowcasting, offering both accuracy and computational efficiency. Among these models, convolutional neural networks (CNNs) are particularly effective for image-to-image prediction tasks. The SmaAt-UNet is a lightweight CNN based architecture that has demonstrated strong performance for precipitation nowcasting. This paper introduces the Multimodal Advection-Guided Small Attention GNet (MAD-SmaAt-GNet), which extends the core SmaAt-UNet by (i) incorporating an additional encoder to learn from multiple weather variables and (ii) integrating a physics-based advection component to ensure physically consistent predictions. We show that each extension individually improves rainfall forecasts and that their combination yields further gains. MAD-SmaAt-GNet reduces the mean squared error (MSE) by 8.9% compared with the baseline SmaAt-UNet for four-step precipitation forecasting up to four hours ahead. Additionally, experiments indicate that multimodal inputs are particularly beneficial for short lead times, while the advection-based component enhances performance across both short and long forecasting horizons.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling</title>
  <link>https://arxiv.org/abs/2603.04460</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04460v1 Announce Type: new Abstract: The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attention methods face a trade-off among context adaptivity, sampling overhead, and fine-tuning costs. We propose VSPrefill, a mechanism requiring lightweight training that uses the vertical-slash structural pattern in attention distributions. Our compact VSIndexer module predicts context-aware importance scores for vertical columns and slash diagonals from key-value representations augmented with RoPE. This approach constructs sparse masks with linear complexity without modifying the backbone parameters. During inference, an adaptive cumulative-threshold strategy allocates sparsity budgets per layer, while a fused kernel executes attention with on-the-fly index merging. Evaluated on Qwen3-4B-Instruct and LLaMA-3.1-8B-Instruct across the LongBench and RULER benchmarks, VSPrefill preserves 98.35% of the full attention accuracy while delivering a 4.95x average speedup at a context length of 128k. These results establish a new Pareto frontier in the trade-off between accuracy and efficiency.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering</title>
  <link>https://arxiv.org/abs/2603.04458</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04458v1 Announce Type: new Abstract: Datasets composed of numerical and categorical attributes (also called mixed data hereinafter) are common in real clustering tasks. Differing from numerical attributes that indicate tendencies between two concepts (e.g., high and low temperature) with their values in well-defined Euclidean distance space, categorical attribute values are different concepts (e.g., different occupations) embedded in an implicit space. Simultaneously exploiting these two very different types of information is an unavoidable but challenging problem, and most advanced attempts either encode the heterogeneous numerical and categorical attributes into one type, or define a unified metric for them for mixed data clustering, leaving their inherent connection unrevealed. This paper, therefore, studies the connection among any-type of attributes and proposes a novel Heterogeneous Attribute Reconstruction and Representation (HARR) learning paradigm accordingly for cluster analysis. The paradigm transforms heterogeneous attributes into a homogeneous status for distance metric learning, and integrates the learning with clustering to automatically adapt the metric to different clustering tasks. Differing from most existing works that directly adopt defined distance metrics or learn attribute weights to search clusters in a subspace. We propose to project the values of each attribute into unified learnable multiple spaces to more finely represent and learn the distance metric for categorical data. HARR is parameter-free, convergence-guaranteed, and can more effectively self-adapt to different sought number of clusters $k$. Extensive experiments illustrate its superiority in terms of accuracy and efficiency.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks</title>
  <link>https://arxiv.org/abs/2603.04451</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04451v1 Announce Type: new Abstract: Inspired by measurement incompatibility and Bell-family inequalities in quantum mechanics, we propose the Non-Classical Network (NCnet), a simple classical neural architecture that stably exhibits non-classical statistical behaviors under typical and interpretable experimental setups. We find non-classicality, measured by the $S$ statistic of CHSH inequality, arises from gradient competitions of hidden-layer neurons shared by multi-tasks. Remarkably, even without physical links supporting explicit communication, one task head can implicitly sense the training task of other task heads via local loss oscillations, leading to non-local correlations in their training outcomes. Specifically, in the low-resource regime, the value of $S$ increases gradually with increasing resources and approaches toward its classical upper-bound 2, which implies that underfitting is alleviated with resources increase. As the model nears the critical scale required for adequate performance, $S$ may temporarily exceed 2. As resources continue to grow, $S$ then asymptotically decays down to and fluctuates around 2. Empirically, when model capacity is insufficient, $S$ is positively correlated with generalization performance, and the regime where $S$ first approaches $2$ often corresponding to good generalization. Overall, our results suggest that non-classical statistics can provide a novel perspective for understanding internal interactions and training dynamics of deep networks.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>An Explainable Ensemble Framework for Alzheimer&#39;s Disease Prediction Using Structured Clinical and Cognitive Data</title>
  <link>https://arxiv.org/abs/2603.04449</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04449v1 Announce Type: new Abstract: Early and accurate detection of Alzheimer&#39;s disease (AD) remains a major challenge in medical diagnosis due to its subtle onset and progressive nature. This research introduces an explainable ensemble learning Framework designed to classify individuals as Alzheimer&#39;s or Non-Alzheimer&#39;s using structured clinical, lifestyle, metabolic, and lifestyle features. The workflow incorporates rigorous preprocessing, advanced feature engineering, SMOTE-Tomek hybrid class balancing, and optimized modeling using five ensemble algorithms-Random Forest, XGBoost, LightGBM, CatBoost, and Extra Trees-alongside a deep artificial neural network. Model selection was performed using stratified validation to prevent leakage, and the best-performing model was evaluated on a fully unseen test set. Ensemble methods achieved superior performance over deep learning, with XGBoost, Random Forest, and Soft Voting showing the strongest accuracy, sensitivity, and F1-score profiles. Explainability techniques, including SHAP and feature importance analysis, highlighted MMSE, Functional Assessment Age, and several engineered interaction features as the most influential determinants. The results demonstrate that the proposed framework provides a reliable and transparent approach to Alzheimer&#39;s disease prediction, offering strong potential for clinical decision support applications.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>ASFL: An Adaptive Model Splitting and Resource Allocation Framework for Split Federated Learning</title>
  <link>https://arxiv.org/abs/2603.04437</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04437v1 Announce Type: new Abstract: Federated learning (FL) enables multiple clients to collaboratively train a machine learning model without sharing their raw data. However, the limited computation resources of the clients may result in a high delay and energy consumption on training. In this paper, we propose an adaptive split federated learning (ASFL) framework over wireless networks. ASFL exploits the computation resources of the central server to train part of the model and enables adaptive model splitting as well as resource allocation during training. To optimize the learning performance (i.e., convergence rate) and efficiency (i.e., delay and energy consumption) of ASFL, we theoretically analyze the convergence rate and formulate a joint learning performance and resource allocation optimization problem. Solving this problem is challenging due to the long-term delay and energy consumption constraints as well as the coupling of the model splitting and resource allocation decisions. We propose an online optimization enhanced block coordinate descent (OOE-BCD) algorithm to solve the problem iteratively. Experimental results show that when compared with five baseline schemes, our proposed ASFL framework converges faster and reduces the total delay and energy consumption by up to 75% and 80%, respectively.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation</title>
  <link>https://arxiv.org/abs/2603.04436</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04436v1 Announce Type: new Abstract: Federated fine-tuning of large language models (LLMs) enables collaborative tuning across distributed clients. However, due to the large size of LLMs, local updates in federated learning (FL) may incur substantial video random-access memory (VRAM) usage. Moreover, frequent model exchange may lead to significant communication overhead. To tackle these challenges, in this paper we propose ZorBA, a zeroth-order optimization-based federated fine-tuning framework with heterogeneous block activation. ZorBA leverages zeroth-order optimization to eliminate the storage of gradients at the clients by forward passes. ZorBA includes a heterogeneous block activation mechanism in which the central server allocates different subsets of transformer blocks to clients in order to accelerate the convergence rate and reduce the VRAM usage. Furthermore, ZorBA utilizes shared random seeds and the finite differences of gradients in order to reduce the communication overhead. We conduct theoretical analysis to characterize the effect of block activation decisions on the convergence rate and VRAM usage. To jointly enhance the convergence rate and reduce the VRAM usage, we formulate an optimization problem to optimize the block activation decisions. We propose an $\epsilon$-constraint lexicographic algorithm to solve this problem. Experimental results show that ZorBA outperforms three federated fine-tuning baselines in VRAM usage by up to 62.41% and incurs a low communication overhead.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Uncertainty-Calibrated Spatiotemporal Field Diffusion with Sparse Supervision</title>
  <link>https://arxiv.org/abs/2603.04431</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04431v1 Announce Type: new Abstract: Physical fields are typically observed only at sparse, time-varying sensor locations, making forecasting and reconstruction ill-posed and uncertainty-critical. We present SOLID, a mask-conditioned diffusion framework that learns spatiotemporal dynamics from sparse observations alone: training and evaluation use only observed target locations, requiring no dense fields and no pre-imputation. Unlike prior work that trains on dense reanalysis or simulations and only tests under sparsity, SOLID is trained end-to-end with sparse supervision only. SOLID conditions each denoising step on the measured values and their locations, and introduces a dual-masking objective that (i) emphasizes learning in unobserved void regions while (ii) upweights overlap pixels where inputs and targets provide the most reliable anchors. This strict sparse-conditioning pathway enables posterior sampling of full fields consistent with the measurements, achieving up to an order-of-magnitude improvement in probabilistic error and yielding calibrated uncertainty maps (\r{ho} &gt; 0.7) under severe sparsity.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Flowers: A Warp Drive for Neural PDE Solvers</title>
  <link>https://arxiv.org/abs/2603.04430</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04430v1 Announce Type: new Abstract: We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters \emph{only} through sparse sampling at source coordinates, \emph{one} per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost. We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices</title>
  <link>https://arxiv.org/abs/2603.04428</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04428v1 Announce Type: new Abstract: Multi-agent LLM systems on edge devices face a memory management problem: device RAM is too small to hold every agent&#39;s KV cache simultaneously. On Apple M4 Pro with 10.2 GB of cache budget, only 3 agents fit at 8K context in FP16. A 10-agent workflow must constantly evict and reload caches. Without persistence, every eviction forces a full re-prefill through the model -- 15.7 seconds per agent at 4K context. We address this by persisting each agent&#39;s KV cache to disk in 4-bit quantized format and reloading it directly into the attention layer, eliminating redundant O(n) prefill computation via direct cache restoration. The system comprises three components: a block pool providing per-agent isolated Q4 KV caches in safetensors format, a BatchQuantizedKVCache for concurrent inference over multiple agents&#39; quantized caches, and cross-phase context injection that accumulates attention state across conversation phases without re-computation. Evaluated on three architectures (Gemma 3 12B, dense GQA, 48 layers; DeepSeek-Coder-V2-Lite 16B, MoE MLA, 27 layers; Llama 3.1 8B, dense GQA, 32 layers), cache restoration reduces time-to-first-token by up to 136x (Gemma: 22--136x at 4K--32K; DeepSeek: 11--76x at 4K--32K; Llama: 24--111x at 4K--16K; 3--10x at 1K). Q4 quantization fits 4x more agent contexts into fixed device memory than FP16. Perplexity measured with actual Q4 KV caches shows -0.7% for Gemma, +2.8% for Llama, and +3.0% for DeepSeek. Open-source at https://github.com/yshk-mxim/agent-memory</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection</title>
  <link>https://arxiv.org/abs/2603.04427</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04427v1 Announce Type: new Abstract: Standard transformer attention uses identical dimensionality for queries, keys, and values ($d_q = d_k = d_v = \dmodel$). Our insight is that these components serve fundamentally different roles, and this symmetry is unnecessary. Queries and keys produce scalar attention weights (\emph{selection}), while values carry rich semantic representations (\emph{value transfer}). We argue that selection is an inherently lower-dimensional operation than value transfer, requiring only $\BigO(\log N)$ dimensions to distinguish among $N$ relevant patterns. We validate this hypothesis across seven experiments: (1)~positional selection tasks requiring just 1~dimension per head, (2)~content-based retrieval requiring $\sim\!\log_2 N$ dimensions, (3--4)~WikiText-2 and WikiText-103 language modeling where $\dselect = \dmodel/4$ incurs only 4.3\% perplexity increase while reducing QK parameters by 75\%, (5)~post-training SVD compression of GPT-2, revealing keys to be far more compressible than queries, with lightweight QK fine-tuning recovering nearly all quality loss, (6)~a 125M-parameter LLaMA model confirming identical degradation ratios across architectures, and (7)~Mistral-7B (7.2B parameters), where SVD compression followed by QK fine-tuning achieves 75\% key cache savings at just 2.0\% residual quality cost. For existing models, SVD compression followed by QK fine-tuning (3 epochs on a small fraction of pretraining data) achieves 75\% key cache savings at $&lt;$2\% residual quality cost. For a 7B-parameter model serving 128K context, asymmetric attention saves 25\,GB of KV cache per user, enabling approximately 60\% more concurrent users on the same GPU.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Delta-Crosscoder: Robust Crosscoder Model Diffing in Narrow Fine-Tuning Regimes</title>
  <link>https://arxiv.org/abs/2603.04426</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04426v1 Announce Type: new Abstract: Model diffing methods aim to identify how fine-tuning changes a model&#39;s internal representations. Crosscoders approach this by learning shared dictionaries of interpretable latent directions between base and fine-tuned models. However, existing formulations struggle with narrow fine-tuning, where behavioral changes are localized and asymmetric. We introduce Delta-Crosscoder, which combines BatchTopK sparsity with a delta-based loss prioritizing directions that change between models, plus an implicit contrastive signal from paired activations on matched inputs. Evaluated across 10 model organisms, including synthetic false facts, emergent misalignment, subliminal learning, and taboo word guessing (Gemma, LLaMA, Qwen; 1B-9B parameters), Delta-Crosscoder reliably isolates latent directions causally responsible for fine-tuned behaviors and enables effective mitigation, outperforming SAE-based baselines, while matching the Non-SAE-based. Our results demonstrate that crosscoders remain a powerful tool for model diffing.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search</title>
  <link>https://arxiv.org/abs/2602.13704</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.13704v2 Announce Type: replace Abstract: In this work, we presented Pailitao-VL, a comprehensive multi-modal retrieval system engineered for high-precision, real-time industrial search. We here address three critical challenges in the current SOTA solution: insufficient retrieval granularity, vulnerability to environmental noise, and prohibitive efficiency-performance gap. Our primary contribution lies in two fundamental paradigm shifts. First, we transitioned the embedding paradigm from traditional contrastive learning to an absolute ID-recognition task. Through anchoring instances to a globally consistent latent space defined by billions of semantic prototypes, we successfully overcome the stochasticity and granularity bottlenecks inherent in existing embedding solutions. Second, we evolved the generative reranker from isolated pointwise evaluation to the compare-and-calibrate listwise policy. By synergizing chunk-based comparative reasoning with calibrated absolute relevance scoring, the system achieves nuanced discriminative resolution while circumventing the prohibitive latency typically associated with conventional reranking methods. Extensive offline benchmarks and online A/B tests on Alibaba e-commerce platform confirm that Pailitao-VL achieves state-of-the-art performance and delivers substantial business impact. This work demonstrates a robust and scalable path for deploying advanced MLLM-based retrieval architectures in demanding, large-scale production environments.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>Agentic Multi-Persona Framework for Evidence-Aware Fake News Detection</title>
  <link>https://arxiv.org/abs/2512.21039</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2512.21039v2 Announce Type: replace Abstract: The rapid proliferation of online misinformation threatens the stability of digital social systems and poses significant risks to public trust, policy, and safety, necessitating reliable automated fake news detection. Existing methods often struggle with multimodal content, domain generalization, and explainability. We propose AMPEND-LS, an agentic multi-persona evidence-grounded framework with LLM-SLM synergy for multimodal fake news detection. AMPEND-LS integrates textual, visual, and contextual signals through a structured reasoning pipeline powered by LLMs, augmented with reverse image search, knowledge graph paths, and persuasion strategy analysis. To improve reliability, we introduce a credibility fusion mechanism combining semantic similarity, domain trustworthiness, and temporal context, and a complementary SLM classifier to mitigate LLM uncertainty and hallucinations. Extensive experiments across three benchmark datasets demonstrate that AMPEND-LS consistently outperformed state-of-the-art baselines in accuracy, F1 score, and robustness. Qualitative case studies further highlight its transparent reasoning and resilience against evolving misinformation. This work advances the development of adaptive, explainable, and evidence-aware systems for safeguarding online information integrity.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>Give Users the Wheel: Towards Promptable Recommendation Paradigm</title>
  <link>https://arxiv.org/abs/2602.18929</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.18929v2 Announce Type: replace Abstract: Conventional sequential recommendation models have achieved remarkable success in mining implicit behavioral patterns. However, these architectures remain structurally blind to explicit user intent: they struggle to adapt when a user&#39;s immediate goal (e.g., expressed via a natural language prompt) deviates from their historical habits. While Large Language Models (LLMs) offer the semantic reasoning to interpret such intent, existing integration paradigms force a dilemma: LLM-as-a-recommender paradigm sacrifices the efficiency and collaborative precision of ID-based retrieval, while Reranking methods are inherently bottlenecked by the recall capabilities of the underlying model. In this paper, we propose Decoupled Promptable Sequential Recommendation (DPR), a model-agnostic framework that empowers conventional sequential backbones to natively support Promptable Recommendation, the ability to dynamically steer the retrieval process using natural language without abandoning collaborative signals. DPR modulates the latent user representation directly within the retrieval space. To achieve this, we introduce a Fusion module to align the collaborative and semantic signals, a Mixture-of-Experts (MoE) architecture that disentangles the conflicting gradients from positive and negative steering, and a three-stage training strategy that progressively aligns the semantic space of prompts with the collaborative space. Extensive experiments on real-world datasets demonstrate that DPR significantly outperforms state-of-the-art baselines in prompt-guided tasks while maintaining competitive performance in standard sequential recommendation scenarios.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>A Benchmark Study of Neural Network Compression Methods for Hyperspectral Image Classification</title>
  <link>https://arxiv.org/abs/2603.04720</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04720v1 Announce Type: new Abstract: Deep neural networks have achieved strong performance in image classification tasks due to their ability to learn complex patterns from high-dimensional data. However, their large computational and memory requirements often limit deployment on resource-constrained platforms such as remote sensing devices and edge systems. Network compression techniques have therefore been proposed to reduce model size and computational cost while maintaining predictive performance. In this study, we conduct a systematic evaluation of neural network compression methods for a remote sensing application, namely hyperspectral land cover classification. Specifically, we examine three widely used compression strategies for convolutional neural networks: pruning, quantization, and knowledge distillation. Experiments are conducted on two benchmark hyperspectral datasets, considering classification accuracy, memory consumption, and inference efficiency. Our results demonstrate that compressed models can significantly reduce model size and computational cost while maintaining competitive classification performance. These findings provide insights into the trade-offs between compression ratio, efficiency, and accuracy, and highlight the potential of compression techniques for enabling efficient deep learning deployment in remote sensing applications.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks</title>
  <link>https://arxiv.org/abs/2603.04676</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04676v1 Announce Type: new Abstract: Multi-image reasoning remains a significant challenge for vision-language models (VLMs). We investigate a previously overlooked phenomenon: during chain-of-thought (CoT) generation, the text-to-image (T2I) attention of reasoning VLMs exhibits diffuse &quot;pulses&quot;: sporadic and unfocused attention patterns that fail to concentrate on task-relevant images. We further reveal a systematic positional bias in attention allocation across images. Motivated by these observations, we propose PulseFocus, a training-free, inference-time method that structures CoT reasoning into interleaved plan/focus blocks with soft attention gating. By forcing the model to explicitly plan which image to examine and then gating decode-time attention to the referenced image, PulseFocus sharpens attention focus and yields consistent improvements on multi-image benchmarks like BLINK benchmark (+3.7%) and MuirBench (+1.07%).</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D</title>
  <link>https://arxiv.org/abs/2603.04614</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04614v1 Announce Type: new Abstract: 3D scene graphs provide a structured representation of object entities and their relationships, enabling high-level interpretation and reasoning for robots while remaining intuitively understandable to humans. Existing approaches for 3D scene graph generation typically combine scene reconstruction with graph neural networks (GNNs). However, such pipelines require multi-modal data that may not always be available, and their reliance on heuristic graph construction can constrain the prediction of relationship triplets. In this work, we introduce a Scene Graph Retrieval-Reasoning Model in 3D (SGR3 Model), a training-free framework that leverages multi-modal large language models (MLLMs) with retrieval-augmented generation (RAG) for semantic scene graph generation. SGR3 Model bypasses the need for explicit 3D reconstruction. Instead, it enhances relational reasoning by incorporating semantically aligned scene graphs retrieved via a ColPali-style cross-modal framework. To improve retrieval robustness, we further introduce a weighted patch-level similarity selection mechanism that mitigates the negative impact of blurry or semantically uninformative regions. Experiments demonstrate that SGR3 Model achieves competitive performance compared to training-free baselines and on par with GNN-based expert models. Moreover, an ablation study on the retrieval module and knowledge base scale reveals that retrieved external information is explicitly integrated into the token generation process, rather than being implicitly internalized through abstraction.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing</title>
  <link>https://arxiv.org/abs/2603.04598</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04598v1 Announce Type: new Abstract: Composed Image Retrieval (CIR) has made significant progress, yet current benchmarks are limited to single ground-truth answers and lack the annotations needed to evaluate false positive avoidance, robustness and multi-image reasoning. We present PinPoint, a comprehensive real world benchmark with 7,635 queries and 329K relevance judgments across 23 query categories. PinPoint advances the field by providing: (1) multiple correct answers (averaging 9.1 per query) (2) explicit hard negatives, (3) six instruction paraphrases per query for robustness testing, (4) multi-image composition support (13.4% of queries), and (5) demographic metadata for fairness evaluation. Based on our analysis of 20+ methods across 4 different major paradigms, we uncover three significant drawbacks: The best methods while achieving mAP@10 of 28.5%, still retrieves irrelevant results (hard negatives) 9% of the time. The best models also exhibit 25.1% performance variation across paraphrases, indicating significant potential for enhancing current CIR techniques. Multi-image queries performs 40 to 70% worse across different methods. To overcome these new issues uncovered by our evaluation framework, we propose a training-free reranking method based on an off-the-shelf MLLM that can be applied to any existing system to bridge the gap. We release the complete dataset, including all images, queries, annotations, retrieval index, and benchmarking code.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Detecting RAG Advertisements Across Advertising Styles</title>
  <link>https://arxiv.org/abs/2603.04925</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04925v1 Announce Type: new Abstract: Large language models (LLMs) enable a new form of advertising for retrieval-augmented generation (RAG) systems in which organic responses are blended with contextually relevant ads. The prospect of such &quot;generated native ads&quot; has sparked interest in whether they can be detected automatically. Existing datasets, however, do not reflect the diversity of advertising styles discussed in the marketing literature. In this paper, we (1) develop a taxonomy of advertising styles for LLMs, combining the style dimensions of explicitness and type of appeal, (2) simulate that advertisers may attempt to evade detection by changing their advertising style, and (3) evaluate a variety of ad-detection approaches with respect to their robustness under these changes. Expanding previous work on ad detection, we train models that use entity recognition to exactly locate an ad in an LLM response and find them to be both very effective at detecting responses with ads and largely robust to changes in the advertising style. Since ad blocking will be performed on low-resource end-user devices, we include lightweight models like random forests and SVMs in our evaluation. These models, however, are brittle under such changes, highlighting the need for further efficiency-oriented research for a practical approach to blocking of generated ads.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval</title>
  <link>https://arxiv.org/abs/2603.04836</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04836v1 Announce Type: new Abstract: Modern e-commerce search is inherently multimodal: customers make purchase decisions by jointly considering product text and visual informations. However, most industrial retrieval and ranking systems primarily rely on textual information, underutilizing the rich visual signals available in product images. In this work, we study unified text-image fusion for two-tower retrieval models in the e-commerce domain. We demonstrate that domain-specific fine-tuning and two stage alignment between query with product text and image modalities are both crucial for effective multimodal retrieval. Building on these insights, we propose a noval modality fusion network to fuse image and text information and capture cross-modal complementary information. Experiments on large-scale e-commerce datasets validate the effectiveness of the proposed approach.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>Scaling Laws for Reranking in Information Retrieval</title>
  <link>https://arxiv.org/abs/2603.04816</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04816v1 Announce Type: new Abstract: Scaling laws have been observed across a wide range of tasks, such as natural language generation and dense retrieval, where performance follows predictable patterns as model size, data, and compute grow. However, these scaling laws are insufficient for understanding the scaling behavior of multi-stage retrieval systems, which typically include a reranking stage. In large-scale multi-stage retrieval systems, reranking is the final and most influential step before presenting a ranked list of items to the end user. In this work, we present the first systematic study of scaling laws for rerankers by analyzing performance across model sizes and data budgets for three popular paradigms: pointwise, pairwise, and listwise reranking. Using a detailed case study with cross-encoder rerankers, we demonstrate that performance follows a predictable power law. This regularity allows us to accurately forecast the performance of larger models for some metrics more than others using smaller-scale experiments, offering a robust methodology for saving significant computational resources. For example, we accurately estimate the NDCG of a 1B-parameter model by training and evaluating only smaller models (up to 400M parameters), in both in-domain as well as out-of-domain settings. Our experiments encompass span several loss functions, models and metrics and demonstrate that downstream metrics like NDCG, MAP (Mean Avg Precision) show reliable scaling behavior and can be forecasted accurately at scale, while highlighting the limitations of metrics like Contrastive Entropy and MRR (Mean Reciprocal Rank) which do not follow predictable scaling behavior in all instances. Our results establish scaling principles for reranking and provide actionable insights for building industrial-grade retrieval systems.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval</title>
  <link>https://arxiv.org/abs/2603.04743</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04743v1 Announce Type: new Abstract: Large Language Model (LLM) agents can automate data-science workflows, but many rigorous statistical methods implemented in R remain underused because LLMs struggle with statistical knowledge and tool retrieval. Existing retrieval-augmented approaches focus on function-level semantics and ignore data distribution, producing suboptimal matches. We propose DARE (Distribution-Aware Retrieval Embedding), a lightweight, plug-and-play retrieval model that incorporates data distribution information into function representations for R package retrieval. Our main contributions are: (i) RPKB, a curated R Package Knowledge Base derived from 8,191 high-quality CRAN packages; (ii) DARE, an embedding model that fuses distributional features with function metadata to improve retrieval relevance; and (iii) RCodingAgent, an R-oriented LLM agent for reliable R code generation and a suite of statistical analysis tasks for systematically evaluating LLM agents in realistic analytical scenarios. Empirically, DARE achieves an NDCG at 10 of 93.47%, outperforming state-of-the-art open-source embedding models by up to 17% on package retrieval while using substantially fewer parameters. Integrating DARE into RCodingAgent yields significant gains on downstream analysis tasks. This work helps narrow the gap between LLM automation and the mature R statistical ecosystem.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks</title>
  <link>https://arxiv.org/abs/2603.04532</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04532v1 Announce Type: new Abstract: Information retrieval (IR) benchmarks typically follow the Cranfield paradigm, relying on static and predefined corpora. However, temporal changes in technical corpora, such as API deprecations and code reorganizations, can render existing benchmarks stale. In our work, we investigate how temporal corpus drift affects FreshStack, a retrieval benchmark focused on technical domains. We examine two independent corpus snapshots of FreshStack from October 2024 and October 2025 to answer questions about LangChain. Our analysis shows that all but one query posed in 2024 remain fully supported by the 2025 corpus, as relevant documents &quot;migrate&quot; from LangChain to competitor repositories, such as LlamaIndex. Next, we compare the accuracy of retrieval models on both snapshots and observe only minor shifts in model rankings, with overall strong correlation of up to 0.978 Kendall $\tau$ at Recall@50. These results suggest that retrieval benchmarks re-judged with evolving temporal corpora can remain reliable for retrieval evaluation. We publicly release all our artifacts at https://github.com/fresh-stack/driftbench.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents</title>
  <link>https://arxiv.org/abs/2603.04403</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04403v1 Announce Type: new Abstract: AI agents increasingly assist with financial research, yet no benchmark evaluates their ability to retrieve specific numeric values from structured databases. We introduce FinRetrieval, a benchmark of 500 financial retrieval questions with ground truth answers, agent responses from 14 configurations across three frontier providers (Anthropic, OpenAI, Google), and complete tool call execution traces. Our evaluation reveals that tool availability dominates performance: Claude Opus achieves 90.8% accuracy with structured data APIs but only 19.8% with web search alone--a 71 percentage point gap that exceeds other providers by 3-4x. We find that reasoning mode benefits vary inversely with base capability (+9.0pp for OpenAI vs +2.8pp for Claude), explained by differences in base-mode tool utilization rather than reasoning ability. Geographic performance gaps (5.6pp US advantage) stem from fiscal year naming conventions, not model limitations. We release the dataset, evaluation code, and tool traces to enable research on financial AI systems.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation</title>
  <link>https://arxiv.org/abs/2512.05106</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2512.05106v3 Announce Type: replace-cross Abstract: Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion (\phi-PD), a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. \phi-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, \phi-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, \phi-PD significantly improves sim-to-real planner transfer performance. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.</description>
  <dc:source>Computer_Science/cs.GR_(Graphics)</dc:source>
</item>
<item>
  <title>Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups</title>
  <link>https://arxiv.org/abs/2603.05507</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05507v1 Announce Type: cross Abstract: High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications. The limited number of views - often due to real-time constraints - leads to missing information and incomplete surfaces in the rendered images. Existing approaches typically rely on simple heuristics for the hole filling, which can result in inconsistencies or visual artifacts. We propose to complete the missing textures using a novel, application-targeted inpainting method independent of the underlying representation as an image-based post-processing step after the novel view rendering. The method is designed as a standalone module compatible with any calibrated multi-camera system. For this we introduce a multi-view aware, transformer-based network architecture using spatio-temporal embeddings to ensure consistency across frames while preserving fine details. Additionally, our resolution-independent design allows adaptation to different camera setups, while an adaptive patch selection strategy balances inference speed and quality, allowing real-time performance. We evaluate our approach against state-of-the-art inpainting techniques under the same real-time constraints and demonstrate that our model achieves the best trade-off between quality and speed, outperforming competitors in both image and video-based metrics.</description>
  <dc:source>Computer_Science/cs.GR_(Graphics)</dc:source>
</item>
<item>
  <title>Revisiting an Old Perspective Projection for Monocular 3D Morphable Models Regression</title>
  <link>https://arxiv.org/abs/2603.04958</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04958v1 Announce Type: cross Abstract: We introduce a novel camera model for monocular 3D Morphable Model (3DMM) regression methods that effectively captures the perspective distortion effect commonly seen in close-up facial images. Fitting 3D morphable models to video is a key technique in content creation. In particular, regression-based approaches have produced fast and accurate results by matching the rendered output of the morphable model to the target image. These methods typically achieve stable performance with orthographic projection, which eliminates the ambiguity between focal length and object distance. However, this simplification makes them unsuitable for close-up footage, such as that captured with head-mounted cameras. We extend orthographic projection with a new shrinkage parameter, incorporating a pseudo-perspective effect while preserving the stability of the original projection. We present several techniques that allow finetuning of existing models, and demonstrate the effectiveness of our modification through both quantitative and qualitative comparisons using a custom dataset recorded with head-mounted cameras.</description>
  <dc:source>Computer_Science/cs.GR_(Graphics)</dc:source>
</item>
<item>
  <title>GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction</title>
  <link>https://arxiv.org/abs/2603.04847</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04847v1 Announce Type: cross Abstract: Feature extraction, matching, structure from motion (SfM), and novel view synthesis (NVS) have traditionally been treated as separate problems with independent optimization objectives. We present GloSplat, a framework that performs \emph{joint pose-appearance optimization} during 3D Gaussian Splatting training. Unlike prior joint optimization methods (BARF, NeRF--, 3RGS) that rely purely on photometric gradients for pose refinement, GloSplat preserves \emph{explicit SfM feature tracks} as first-class entities throughout training: track 3D points are maintained as separate optimizable parameters from Gaussian primitives, providing persistent geometric anchors via a reprojection loss that operates alongside photometric supervision. This architectural choice prevents early-stage pose drift while enabling fine-grained refinement -- a capability absent in photometric-only approaches. We introduce two pipeline variants: (1) \textbf{GloSplat-F}, a COLMAP-free variant using retrieval-based pair selection for efficient reconstruction, and (2) \textbf{GloSplat-A}, an exhaustive matching variant for maximum quality. Both employ global SfM initialization followed by joint photometric-geometric optimization during 3DGS training. Experiments demonstrate that GloSplat-F achieves state-of-the-art among COLMAP-free methods while GloSplat-A surpasses all COLMAP-based baselines.</description>
  <dc:source>Computer_Science/cs.GR_(Graphics)</dc:source>
</item>
<item>
  <title>2-Coloring Cycles in One Round</title>
  <link>https://arxiv.org/abs/2603.04235</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04235v2 Announce Type: replace-cross Abstract: We show that there is a one-round randomized distributed algorithm that can 2-color cycles such that the expected fraction of monochromatic edges is less than 0.24118. We also show that a one-round algorithm cannot achieve a fraction less than 0.23879. Before this work, the best upper and lower bounds were 0.25 and 0.2. Our proof was largely discovered and developed by large language models, and both the upper and lower bounds have been formalized in Lean 4.</description>
  <dc:source>Computer_Science/cs.FL_(Formal_Languages_and_Automata_Theory)</dc:source>
</item>
<item>
  <title>On a sequence of Kimberling and its relationship to the Tribonacci word</title>
  <link>https://arxiv.org/abs/2510.11318</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2510.11318v4 Announce Type: replace-cross Abstract: In 2017, Clark Kimberling defined an interesting sequence ${\bf B} = 0100101100 \cdots$ of $0$&#39;s and $1$&#39;s by certain inflation rules, and he made a number of conjectures about this sequence and some related ones. In this note we prove his conjectures using, in part, the Walnut theorem-prover. We show how his word is related to the infinite Tribonacci word, and we determine both the subword complexity and critical exponent of $\bf B$.</description>
  <dc:source>Computer_Science/cs.FL_(Formal_Languages_and_Automata_Theory)</dc:source>
</item>
<item>
  <title>Risk-Aware Autonomous Driving with Linear Temporal Logic Specifications</title>
  <link>https://arxiv.org/abs/2409.09769</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2409.09769v4 Announce Type: replace-cross Abstract: Human drivers naturally balance the risks of different concerns while driving, including traffic rule violations, minor accidents, and fatalities. However, achieving the same behavior in autonomous driving systems remains an open problem. This paper extends a risk metric that has been verified in human-like driving studies to encompass more complex driving scenarios specified by linear temporal logic (LTL) that go beyond just collision risks. This extension incorporates the timing and severity of events into LTL specifications, thereby reflecting a human-like risk awareness. Without sacrificing expressivity for traffic rules, we adopt LTL specifications composed of safety and co-safety formulas, allowing the control synthesis problem to be reformulated as a reachability problem. By leveraging occupation measures, we further formulate a linear programming (LP) problem for this LTL-based risk metric. Consequently, the synthesized policy balances different types of driving risks, including both collision risks and traffic rule violations. The effectiveness of the proposed approach is validated by three typical traffic scenarios in Carla simulator.</description>
  <dc:source>Computer_Science/cs.FL_(Formal_Languages_and_Automata_Theory)</dc:source>
</item>
<item>
  <title>Computational Complexity of Alignments</title>
  <link>https://arxiv.org/abs/2603.05331</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05331v1 Announce Type: new Abstract: In process mining, alignments quantify the degree of deviation between an observed event trace and a business process model and constitute the most important conformance checking technique. We study the algorithmic complexity of computing alignments over important classes of Petri nets. First, we show that the alignment problem is PSPACE-complete on the class of safe Petri nets and also on the class of safe and sound workflow nets. For live, bounded, free-choice systems, we prove the existence of optimal alignments of polynomial length which positions the alignment problem in NP for this class. We further show that computing alignments is NP-complete even on basic subclasses such as process trees and T-systems. We establish NP-completeness on several related classes as well, including acyclic systems. Finally, we demonstrate that on live, safe S-systems the alignment problem is solvable in P and that both assumptions (liveness and safeness) are crucial for this result.</description>
  <dc:source>Computer_Science/cs.FL_(Formal_Languages_and_Automata_Theory)</dc:source>
</item>
<item>
  <title>Algebraic Characterization of Reversible First Degree Cellular Automata over $\mathbb{Z}_d$</title>
  <link>https://arxiv.org/abs/2603.05253</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05253v1 Announce Type: new Abstract: There exists algorithms to detect reversibility of cellular automaton (CA) for both finite and infinite lattices taking quadratic time. But, can we identify a $d$-state CA rule in constant time that is always reversible for every lattice size $n\in \mathbb{N}$? To address this issue, this paper explores the reversibility properties of a subset of one-dimensional, $3$-neighborhood, $d$-state finite cellular automata (CAs), known as the first degree cellular automata (FDCAs) for any number of cells $(n\in \mathbb{N})$ under the null boundary condition. {In a first degree cellular automaton (FDCA), the local rule is defined using eight parameters. To ensure that the global transition function of $d$-state FDCA is reversible for any number of cells $(n\in \mathbb{N})$, it is necessary and sufficient to verify only three algebraic conditions among the parameter values. Based on these conditions, for any given $d$, one can synthesize all reversible FDCAs rules. Similarly, given a FDCA rule, one can check these conditions to decide its reversibility in constant time.</description>
  <dc:source>Computer_Science/cs.FL_(Formal_Languages_and_Automata_Theory)</dc:source>
</item>
<item>
  <title>Aerospace.Wikibase: Towards a Knowledge Infrastructure for Aerospace Engineering</title>
  <link>https://arxiv.org/abs/2603.05192</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05192v1 Announce Type: new Abstract: While Aerospace engineering can benefit greatly from collaborative knowledge management, its infrastructure is still fragmented. Bridging this divide is essential to reduce the current practice of redundant work and to address the challenges posed by the rapidly growing volume of aviation data. This study presents an accessible platform, built on Wikibase, to enable collaborative sharing and curation of aerospace engineering knowledge, initially populated with data from a recent systematic literature review. As a solid foundation, the Aerospace.Wikibase provides over 700 terms related to processes, software and data, openly available for future extension. Linking project-specific concepts to persistent, independent infrastructure enables aerospace engineers to collaborate on universal knowledge without risking the appropriation of project information, thereby promoting sustainable solutions to modern challenges while acknowledging the limitations of the industry.</description>
  <dc:source>Computer_Science/cs.DL_(Digital_Libraries)</dc:source>
</item>
<item>
  <title>Mapping a Decade of Avian Influenza Research (2014-2023): A Scientometric Analysis from Web of Science</title>
  <link>https://arxiv.org/abs/2602.01712</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.01712v2 Announce Type: replace-cross Abstract: This scientometric study analyzes Avian Influenza research from 2014 to 2023 using bibliographic data from the Web of Science database. We examined publication trends, sources, authorship, collaborative networks, document types, and geographical distribution to gain insights into the global research landscape. Results reveal a steady increase in publications, with high contributions from Chinese and American institutions. Journals such as PLoS One and the Journal of Virology published the highest number of studies, indicating their influence in this field. The most prolific institutions include the Chinese Academy of Sciences and the University of Hong Kong, while the College of Veterinary Medicine at South China Agricultural University emerged as the most productive department. China and the USA lead in publication volume, though developed nations like the United Kingdom and Germany exhibit a higher rate of international collaboration. &quot;Articles&quot; are the most common document type, constituting 84.6% of the total, while &quot;Reviews&quot; account for 7.6%. This study provides a comprehensive view of global trends in Avian Influenza research, emphasizing the need for collaborative efforts across borders.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>stratum: A System Infrastructure for Massive Agent-Centric ML Workloads</title>
  <link>https://arxiv.org/abs/2603.03589</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03589v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) transform how machine learning (ML) pipelines are developed and evaluated. LLMs enable a new type of workload, agentic pipeline search, in which autonomous or semi-autonomous agents generate, validate, and optimize complete ML pipelines. These agents predominantly operate over popular Python ML libraries and exhibit highly exploratory behavior. This results in thousands of executions for data profiling, pipeline generation, and iterative refinement of pipeline stages. However, the existing Python-based ML ecosystem is built around libraries such as Pandas and scikit-learn, which are designed for human-centric, interactive, sequential workflows and remain constrained by Python&#39;s interpretive execution model, library-level isolation, and limited runtime support for executing large numbers of pipelines. Meanwhile, many high-performance ML systems proposed by the systems community either target narrow workload classes or require specialized programming models, which limits their integration with the Python ML ecosystem and makes them largely ill-suited for LLM-based agents. This growing mismatch exposes a fundamental systems challenge in supporting agentic pipeline search at scale. We therefore propose stratum, a unified system infrastructure that decouples pipeline execution from planning and reasoning during agentic pipeline search. Stratum integrates seamlessly with existing Python libraries, compiles batches of pipelines into optimized execution graphs, and efficiently executes them across heterogeneous backends, including a novel Rust-based runtime. We present stratum&#39;s architectural vision along with an early prototype, discuss key design decisions, and outline open challenges and research directions. Finally, preliminary experiments show that stratum can significantly speed up large-scale agentic pipeline search up to 16.6x.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>V3DB: Audit-on-Demand Zero-Knowledge Proofs for Verifiable Vector Search over Committed Snapshots</title>
  <link>https://arxiv.org/abs/2603.03065</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03065v2 Announce Type: replace Abstract: Dense retrieval services increasingly underpin semantic search, recommendation, and retrieval-augmented generation, yet clients typically receive only a top-$k$ list with no auditable evidence of how it was produced. We present V3DB, a verifiable, versioned vector-search service that enables audit-on-demand correctness checks for approximate nearest-neighbour (ANN) retrieval executed by a potentially untrusted service provider. V3DB commits to each corpus snapshot and standardises an IVF-PQ search pipeline into a fixed-shape, five-step query semantics. Given a public snapshot commitment and a query embedding, the service returns the top-$k$ payloads and, when challenged, produces a succinct zero-knowledge proof that the output is exactly the result of executing the published semantics on the committed snapshot -- without revealing the embedding corpus or private index contents. To make proving practical, V3DB avoids costly in-circuit sorting and random access by combining multiset equality/inclusion checks with lightweight boundary conditions. Our prototype implementation based on Plonky2 achieves up to $22\times$ faster proving and up to $40\%$ lower peak memory consumption than the circuit-only baseline, with millisecond-level verification time. Github Repo at https://github.com/TabibitoQZP/zk-IVF-PQ.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates</title>
  <link>https://arxiv.org/abs/2603.05459</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05459v1 Announce Type: cross Abstract: The process of debating is essential in our daily lives, whether in studying, work activities, simple everyday discussions, political debates on TV, or online discussions on social networks. The range of uses for debates is broad. Due to the diverse applications, structures, and formats of debates, developing corpora that account for these variations can be challenging, and the scarcity of debate corpora in the state of the art is notable. For this reason, the current research proposes the DEBISS corpus: a collection of spoken and individual debates with semi-structured features. With a broad range of NLP task annotations, such as speech-to-text, speaker diarization, argument mining, and debater quality assessment.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs</title>
  <link>https://arxiv.org/abs/2603.04545</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04545v1 Announce Type: cross Abstract: Efficient inference for graph neural networks (GNNs) on large knowledge graphs (KGs) is essential for many real-world applications. GNN inference queries are computationally expensive and vary in complexity, as each involves a different number of target nodes linked to subgraphs of diverse densities and structures. Existing acceleration methods, such as pruning, quantization, and knowledge distillation, instantiate smaller models but do not adapt them to the structure or semantics of individual queries. They also store models as monolithic files that must be fully loaded, and miss the opportunity to retrieve only the neighboring nodes and corresponding model components that are semantically relevant to the target nodes. These limitations lead to excessive data loading and redundant computation on large KGs. This paper presents KG-WISE, a task-driven inference paradigm for large KGs. KG-WISE decomposes trained GNN models into fine-grained components that can be partially loaded based on the structure of the queried subgraph. It employs large language models (LLMs) to generate reusable query templates that extract semantically relevant subgraphs for each task, enabling query-aware and compact model instantiation. We evaluate KG-WISE on six large KGs with up to 42 million nodes and 166 million edges. KG-WISE achieves up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>O^3-LSM: Maximizing Disaggregated LSM Write Performance via Three-Layer Offloading</title>
  <link>https://arxiv.org/abs/2603.05439</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05439v1 Announce Type: new Abstract: Log-Structured Merge-tree-based Key-Value Stores (LSM-KVS) have been optimized and redesigned for disaggregated storage via techniques such as compaction offloading to reduce the network I/Os between compute and storage. However, the constrained memory space and slow flush at the compute node severely limit the overall write throughput of existing optimizations. In this paper, we propose O3-LSM, a fundamental new LSM-KVS architecture, that leverages the shared Disaggregated Memory (DM) to support a three-layer offloading, i.e., memtable Offloading, flush Offloading, and the existing compaction Offloading. Compared to the existing disaggregated LSM-KVS with compaction offloading only, O3-LSM maximizes the write performance by addressing the above issues. O3-LSM first leverages a novel DM-Optimized Memtable to achieve dynamic memtable offloading, which extends the write buffer while enabling fast, asynchronous, and parallel memtable transmission. Second, we propose Collaborative Flush Offloading that decouples the flush control plane from execution and supports memtable flush offloading at any node with dedicated scheduling and global optimizations. Third, O3-LSM is further improved with the Shard-Level Optimization, which partitions the memtable into shards based on disjoint key-ranges that can be transferred and flushed independently, unlocking parallelism across shards. Besides, to mitigate slow lookups in the disaggregated setting, O3-LSM also employs an adaptive Cache-Enhanced Read Delegation mechanism to combine a compact local cache with DM-assisted memtable delegated read. Our evaluation shows that O3-LSM achieves up to 4.5X write, 5.2X range query, and 1.8X point lookup throughput improvement, and up to 76% P99 latency reduction compared with Disaggregated-RocksDB, CaaS-LSM, and Nova-LSM.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>Bala-Join: An Adaptive Hash Join for Balancing Communication and Computation in Geo-Distributed SQL Databases</title>
  <link>https://arxiv.org/abs/2603.05405</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05405v1 Announce Type: new Abstract: Shared-nothing geo-distributed SQL databases, such as CockroachDB, are increasingly vital for enterprise applications requiring data resilience and locality. However, we encountered significant performance degradation at the customer side, especially when their deployments span multiple data centers over a Wide Area Network (WAN). Our investigation identifies the bottleneck in the performance of the Distributed Hash Join (Dist-HJ) algorithm, which is contingent upon a crucial balance between communication overhead and computational load. This balance is severely disrupted when processing skewed data from real-world customer workloads, leading to the observed performance decline. To tackle this challenge, we introduce Bala-Join, an adaptive solution to balance the computation and network load in Dist-HJ execution. Our approach consists of the Balanced Partition and Partial Replication (BPPR) algorithm and a distributed online skewed join key detector. The former achieves balanced redistribution of skewed data through a multicast mechanism to improve computational performance and reduce network overhead. The latter provides real-time skewed join key information tailored to BPPR. Furthermore, an Active-Signaling and Asynchronous-Pulling (ASAP) mechanism is incorporated to enable efficient, real-time synchronization between the detector and the redistribution process with minimal overhead. Empirical study shows that Bala-Join outperforms the popular Dist-HJ solutions, increasing throughput by 25%-61%.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>CRISP: Correlation-Resilient Indexing via Subspace Partitioning</title>
  <link>https://arxiv.org/abs/2603.05180</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05180v1 Announce Type: new Abstract: As the dimensionality of modern learned representations increases to thousands of dimensions, the state-of-the-art Approximate Nearest Neighbor (ANN) indices exhibit severe limitations. Graph-based methods (e.g., HNSW) suffer from prohibitive memory consumption and routing degradation, while recent randomized quantization and learned rotation approaches (e.g., RaBitQ, OPQ) impose significant preprocessing overheads. We introduce CRISP, a novel framework designed for ANN search in very-high-dimensional spaces. Unlike rigid pipelines that apply expensive orthogonal rotations indiscriminately, CRISP employs a lightweight, correlation- aware adaptive strategy that redistributes variance only when necessary, effectively reducing the preprocessing complexity. We couple this adaptive mechanism with a cache-coherent Compressed Sparse Row (CSR) index structure. Furthermore, CRISP incorporates a multi-stage dual-mode query engine: a Guaranteed Mode that preserves rigorous theoretical lower bounds on recall, and an Optimized Mode that leverages rank-based weighted scoring and early termination to reduce query latency. Extensive evaluation on datasets of very high dimensionality (up to 4096) demonstrates that CRISP achieves state-of-the-art query throughput, low construction costs, and peak memory efficiency.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>RESYSTANCE: Unleashing Hidden Performance of Compaction in LSM-trees via eBPF</title>
  <link>https://arxiv.org/abs/2603.05162</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05162v1 Announce Type: new Abstract: The development of high-speed storage devices such as NVMe SSDs has shifted the primary I/O bottleneck from hardware to software. Modern database systems also rely on kernel-based I/O paths, where frequent system call invocations and kernel-user space transitions lead to relatively large overheads and performance degradation. This issue is particularly pronounced in Log-Structured Merge-tree (LSM-tree)-based NoSQL databases. We identified that, in particular, the background compaction process generates a large number of read system calls, causing significant overhead. To address this problem, we propose RESYSTANCE, which leverages eBPF and io_uring to free compaction from system calls and unlock hidden performance potential. RESYSTANCE improves disk I/O efficiency during read operations via io uring and significantly reduces software stack overhead by handling compaction directly inside the kernel through eBPF. Moreover, RESYSTANCE minimizes user-kernel transitions by offloading key I/O routines into the kernel without modifying the LSM-tree structure or compaction algorithm. RESYSTANCE was extensively evaluated using db_bench, YCSB, and OLTP workloads. Compared to baseline RocksDB, it reduced the average number of system call invocations during compaction by 99% and shortened compaction time by 50%. Consequently, in write-intensive workloads, RESYSTANCE improved throughput by up to 75% and reduced the p99 latency by 40%.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>Deterministic Preprocessing and Interpretable Fuzzy Banding for Cost-per-Student Reporting from Extracted Records</title>
  <link>https://arxiv.org/abs/2603.04905</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04905v1 Announce Type: new Abstract: Administrative extracts are often exchanged as spreadsheets and may be read as reports in their own right during budgeting, workload review, and governance discussions. When an exported workbook becomes the reference snapshot for such decisions, the transformation can be checked by recomputation against a clearly identified input. A deterministic, rule-governed, file-based workflow is implemented in cad_processor.py. The script ingests a Casual Academic Database (CAD) export workbook and aggregates inclusive on-costs and student counts into subject-year and school-year totals, from which it derives cost-per-student ratios. It writes a processed workbook with four sheets: Processing Summary (run record and counters), Trend Analysis (schoolyear cost-per-student matrix), Report (wide subject-level table), and Fuzzy Bands (per-year anchors, membership weights, and band labels). The run record includes a SHA-256 hash of the input workbook bytes to support snapshot-matched recomputation. For within-year interpretation, the workflow adds a simple fuzzy banding layer that labels finite, positive school-year cost-per-student values as Low, Medium, or High. The per-year anchors are the minimum, median, and maximum of the finite, positive ratios. Membership weights are computed using left-shoulder, triangular, and right-shoulder functions, with deterministic tie-breaking in a fixed priority order (Medium, then Low, then High). These weights are treated as decision-support signals rather than probabilities. A worked example provides a reproducible calculation of a band assignment from the reported anchors and ratios. Supplementary material includes a claim-to-evidence matrix, a reproducibility note, and a short glossary that links selected statements to code and workbook artefacts.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm</title>
  <link>https://arxiv.org/abs/2603.04799</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04799v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for semantic query processing over large corpora. A set of semantic operators derived from relational algebra has been proposed to provide a unified interface for expressing such queries, among which the semantic filter operator serves as a cornerstone. Given a table T with a natural language predicate e, for each tuple in the relation, the execution of a semantic filter proceeds by constructing an input prompt that combines the predicate e with its content, querying the LLM, and obtaining the binary decision. However, this tuple-by-tuple evaluation necessitates a complete linear scan of the table, incurring prohibitive latency and token costs. Although recent work has attempted to optimize semantic filtering, it still does not break the linear LLM invocation barriers. To address this, we propose Clustering-Sampling-Voting (CSV), a new framework that reduces LLM invocations to sublinear complexity while providing error guarantees. CSV embeds tuples into semantic clusters, samples a small subset for LLM evaluation, and infers cluster-level labels via two proposed voting strategies: UniVote, which aggregates labels uniformly, and SimVote, which weights votes by semantic similarity. Moreover, CSV triggers re-clustering on ambiguous clusters to ensure robustness across diverse datasets. The results conducted on real-world datasets demonstrate that CSV reduces the number of LLM calls by 1.28-355x compared to the state-of-the-art approaches, while maintaining comparable effectiveness in terms of Accuracy and F1 score.</description>
  <dc:source>Computer_Science/cs.DB_(Databases)</dc:source>
</item>
<item>
  <title>What Is Missing: Interpretable Ratings for Large Language Model Outputs</title>
  <link>https://arxiv.org/abs/2603.04429</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04429v1 Announce Type: new Abstract: Current Large Language Model (LLM) preference learning methods such as Proximal Policy Optimization and Direct Preference Optimization learn from direct rankings or numerical ratings of model outputs, these rankings are subjective, and a single numerical rating chosen directly by a judge is a poor proxy for the quality of natural language, we introduce the What Is Missing (WIM) rating system to produce rankings from natural-language feedback, WIM integrates into existing training pipelines, can be combined with other rating techniques, and can be used as input to any preference learning method without changing the learning algorithm, to compute a WIM rating, a human or LLM judge writes feedback describing what the model output is missing, we embed the output and the feedback with a sentence embedding model and compute the cosine similarity between the resulting vectors, we empirically observe that, compared to discrete numerical ratings, WIM yields fewer ties and larger rating deltas, which improves the availability of a learning signal in pairwise preference data, we use interpretable in the following limited sense: for each scalar rating, we can inspect the judge&#39;s missing-information text that produced it, enabling qualitative debugging of the preference labels.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming</title>
  <link>https://arxiv.org/abs/2602.19948</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.19948v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient delusions (&quot;AI Psychosis&quot;) and failure to de-escalate suicide risk. Finally, we validate an interactive data visualization dashboard with diverse stakeholders, including AI engineers and red teamers, mental health professionals, and policy experts (N=9), demonstrating that this framework effectively enables stakeholders to audit the &quot;black box&quot; of AI psychotherapy. These findings underscore the critical safety risks of AI-provided mental health support and the necessity of simulation-based clinical red teaming before deployment.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>&quot;What if she doesn&#39;t feel the same?&quot; What Happens When We Ask AI for Relationship Advice</title>
  <link>https://arxiv.org/abs/2601.11527</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2601.11527v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly being used to provide support and advice in personal domains such as romantic relationships, yet little is known about user perceptions of this type of advice. This study investigated how people evaluate advice on LLM-generated romantic relationships. Participants rated advice satisfaction, model reliability, and helpfulness, and completed pre- and post-measures of their general attitudes toward LLMs. Overall, the results showed participants&#39; high satisfaction with LLM-generated advice. Greater satisfaction was, in turn, strongly and positively associated with their perceptions of the models&#39; reliability and helpfulness. Importantly, participants&#39; attitudes toward LLMs improved significantly after exposure to the advice, suggesting that supportive and contextually relevant advice can enhance users&#39; trust and openness toward these AI systems.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems</title>
  <link>https://arxiv.org/abs/2412.20298</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2412.20298v2 Announce Type: replace-cross Abstract: The digitalization of credit scoring has become essential for financial institutions and commercial banks, especially in the era of digital transformation. Machine learning techniques are commonly used to evaluate customers&#39; creditworthiness. However, the predicted outcomes of machine learning models can be biased toward protected attributes, such as race or gender. Numerous fairness-aware machine learning models and fairness measures have been proposed. Nevertheless, their performance in the context of credit scoring has not been thoroughly investigated. In this paper, we present a comprehensive experimental study of fairness-aware machine learning in credit scoring. The study explores key aspects of credit scoring, including financial datasets, predictive models, and fairness measures. We also provide a detailed evaluation of fairness-aware predictive models and fairness measures on widely used financial datasets. The experimental results show that fairness-aware models achieve a better balance between predictive accuracy and fairness compared to traditional classification models.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Measuring AI R&amp;D Automation</title>
  <link>https://arxiv.org/abs/2603.03992</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03992v2 Announce Type: replace Abstract: The automation of AI R&amp;D (AIRDA) could have significant implications, but its extent and ultimate effects remain uncertain. We need empirical data to resolve these uncertainties, but existing data (primarily capability benchmarks) may not reflect real-world automation or capture its broader consequences, such as whether AIRDA accelerates capabilities more than safety progress or whether our ability to oversee AI R&amp;D can keep pace with its acceleration. To address these gaps, this work proposes metrics to track the extent of AIRDA and its effects on AI progress and oversight. The metrics span dimensions such as capital share of AI R&amp;D spending, researcher time allocation, and AI subversion incidents, and could help decision makers understand the potential consequences of AIRDA, implement appropriate safety measures, and maintain awareness of the pace of AI development. We recommend that companies and third parties (e.g. non-profit research organisations) start to track these metrics, and that governments support these efforts.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Advancing Problem-Based Learning in Biomedical Engineering in the Era of Generative AI</title>
  <link>https://arxiv.org/abs/2503.16558</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2503.16558v2 Announce Type: replace Abstract: Problem-Based Learning (PBL) has significantly impacted biomedical engineering (BME) education since its introduction in the early 2000s, effectively enhancing critical thinking and real-world knowledge application among students. With biomedical engineering rapidly converging with artificial intelligence (AI), integrating effective AI education into established curricula has become challenging yet increasingly necessary. Recent advancements, including AI&#39;s recognition by the 2024 Nobel Prize, have highlighted the importance of training students comprehensively in biomedical AI. However, effective biomedical AI education faces substantial obstacles, such as diverse student backgrounds, limited personalized mentoring, constrained computational resources, and difficulties in safely scaling hands-on practical experiments due to privacy and ethical concerns associated with biomedical data. To overcome these issues, we conducted a three-year (2021-2023) case study implementing an advanced PBL framework tailored specifically for biomedical AI education, involving 92 undergraduate and 156 graduate students from the joint Biomedical Engineering program of Georgia Institute of Technology and Emory University. Our approach emphasizes collaborative, interdisciplinary problem-solving through authentic biomedical AI challenges. The implementation led to measurable improvements in learning outcomes, evidenced by high research productivity (16 student-authored publications), consistently positive peer evaluations, and successful development of innovative computational methods addressing real biomedical challenges. Additionally, we examined the role of generative AI both as a teaching subject and an educational support tool within the PBL framework. Our study presents a practical and scalable roadmap for biomedical engineering departments aiming to integrate robust AI education into their curricula.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>The role of spatial scales in assessing urban mobility models</title>
  <link>https://arxiv.org/abs/2603.05227</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05227v1 Announce Type: cross Abstract: Urban mobility models are essential tools for understanding and forecasting how people and goods move within cities, which is vital for transportation planning. The spatial scale at which urban mobility is analysed is a crucial determinant of the insights gained from any model as it can affect models&#39; performance. It is, therefore, important that urban mobility models should be assessed at appropriate spatial scales to reflect the underlying dynamics. In this study, we systematically evaluate the performance of three popular urban mobility models, namely gravity, radiation, and visitation models across spatial scales. The results show that while the visitation model consistently performs better than its gravity and radiation counterparts, their performance does not differ much when being assessed at some appropriate spatial scale common to all of them. Interestingly, at scales where all models perform badly, the visitation model suffers the most. Furthermore, results based on the conventional admin boundary may not perform so well as compared to distance-based clustering. The cross examination of urban mobility models across spatial scales also reveals the spatial organisation of the urban structure.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness</title>
  <link>https://arxiv.org/abs/2603.04881</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04881v1 Announce Type: cross Abstract: Differentially private learning is essential for training models on sensitive data, but empirical studies consistently show that it can degrade performance, introduce fairness issues like disparate impact, and reduce adversarial robustness. The theoretical underpinnings of these phenomena in modern, non-convex neural networks remain largely unexplored. This paper introduces a unified feature-centric framework to analyze the feature learning dynamics of differentially private stochastic gradient descent (DP-SGD) in two-layer ReLU convolutional neural networks. Our analysis establishes test loss bounds governed by a crucial metric: the feature-to-noise ratio (FNR). We demonstrate that the noise required for privacy leads to suboptimal feature learning, and specifically show that: 1) imbalanced FNRs across classes and subpopulations cause disparate impact; 2) even in the same class, noise has a greater negative impact on semantically long-tailed data; and 3) noise injection exacerbates vulnerability to adversarial attacks. Furthermore, our analysis reveals that the popular paradigm of public pre-training and private fine-tuning does not guarantee improvement, particularly under significant feature distribution shifts between datasets. Experiments on synthetic and real-world data corroborate our theoretical findings.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Signal in the Noise: Decoding the Reality of Airline Service Quality with Large Language Models</title>
  <link>https://arxiv.org/abs/2603.04404</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04404v1 Announce Type: cross Abstract: Traditional service quality metrics often fail to capture the nuanced drivers of passenger satisfaction hidden within unstructured online feedback. This study validates a Large Language Model (LLM) framework designed to extract granular insights from such data. Analyzing over 16,000 TripAdvisor reviews for EgyptAir and Emirates (2016-2025), the study utilizes a multi-stage pipeline to categorize 36 specific service issues. The analysis uncovers a stark &quot;operational perception disconnect&quot; for EgyptAir: despite reported operational improvements, passenger satisfaction plummeted post-2022 (ratings &lt; 2.0). Our approach identified specific drivers missed by conventional metrics-notably poor communication during disruptions and staff conduct-and pinpointed critical sentiment erosion in key tourism markets. These findings confirm the framework&#39;s efficacy as a powerful diagnostic tool, surpassing traditional surveys by transforming unstructured passenger voices into actionable strategic intelligence for the airline and tourism sectors.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Autoscoring Anticlimax: A Meta-analytic Understanding of AI&#39;s Short-answer Shortcomings and Wording Weaknesses</title>
  <link>https://arxiv.org/abs/2603.04820</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04820v1 Announce Type: cross Abstract: Automated short-answer scoring lags other LLM applications. We meta-analyze 890 culminating results across a systematic review of LLM short-answer scoring studies, modeling the traditional effect size of Quadratic Weighted Kappa (QWK) with mixed effects metaregression. We quantitatively illustrate that that the level of difficulty for human experts to perform the task of scoring written work of children has no observed statistical effect on LLM performance. Particularly, we show that some scoring tasks measured as the easiest by human scorers were the hardest for LLMs. Whether by poor implementation by thoughtful researchers or patterns traceable to autoregressive training, on average decoder-only architectures underperform encoders by 0.37--a substantial difference in agreement with humans. Additionally, we measure the contributions of various aspects of LLM technology on successful scoring such as tokenizer vocabulary size, which exhibits diminishing returns--potentially due to undertrained tokens. Findings argue for systems design which better anticipates known statistical shortcomings of autoregressive models. Finally, we provide additional experiments to illustrate wording and tokenization sensitivity and bias elicitation in high-stakes education contexts, where LLMs demonstrate racial discrimination. Code and data for this study are available.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>How Professional Visual Artists are Negotiating Generative AI in the Workplace</title>
  <link>https://arxiv.org/abs/2603.04537</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04537v1 Announce Type: cross Abstract: Generative AI has been heavily critiqued by artists in both popular media and HCI scholarship. However, more work is needed to understand the impacts of generative AI on professional artists&#39; workplaces and careers. In this paper, we conduct a survey of \textit{378 verified professional visual artists} about how generative AI has impacted their careers and workplaces. We find (1) most visual artists are strongly opposed to using generative AI (text or visual) and negotiate their inclusion in the workplace through a variety of \textit{refusal} strategies (2) there exist a range of factors in artists environments shaping their use of generative AI, including pressure from clients, bosses, and peers and (3) visual artists report overwhelmingly negative impacts of generative AI on their workplaces, leading to added stress and reduced job opportunities. In light of these findings, we encourage HCI researchers to contend more deeply with artists&#39; desires not to use generative AI in the workplace.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Invariant Causal Routing for Governing Social Norms in Online Market Economies</title>
  <link>https://arxiv.org/abs/2603.04534</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04534v1 Announce Type: cross Abstract: Social norms are stable behavioral patterns that emerge endogenously within economic systems through repeated interactions among agents. In online market economies, such norms -- like fair exposure, sustained participation, and balanced reinvestment -- are critical for long-term stability. We aim to understand the causal mechanisms driving these emergent norms and to design principled interventions that can steer them toward desired outcomes. This is challenging because norms arise from countless micro-level interactions that aggregate into macro-level regularities, making causal attribution and policy transferability difficult. To address this, we propose \textbf{Invariant Causal Routing (ICR)}, a causal governance framework that identifies policy-norm relations stable across heterogeneous environments. ICR integrates counterfactual reasoning with invariant causal discovery to separate genuine causal effects from spurious correlations and to construct interpretable, auditable policy rules that remain effective under distribution shift. In heterogeneous agent simulations calibrated with real data, ICR yields more stable norms, smaller generalization gaps, and more concise rules than correlation or coverage baselines, demonstrating that causal invariance offers a principled and interpretable foundation for governance.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>On complexity of restricted fragments of Decision DNNF</title>
  <link>https://arxiv.org/abs/2501.03710</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2501.03710v2 Announce Type: replace Abstract: Decision \textsc{dnnf} (a.k.a. $\wedge_d$-\textsc{fbdd}) is an important special case of Decomposable Negation Normal Form (\textsc{dnnf}), a landmark knowledge compilation model. Like other known \textsc{dnnf} restrictions, Decision \textsc{dnnf} admits \textsc{fpt} sized representation of \textsc{cnf}s of bounded \emph{primal} treewidth. However, unlike other restrictions, the complexity of representation for \textsc{cnf}s of bounded \emph{incidence} treewidth is wide open. In[arxiv:1708.07767], we resolved this question for two restricted classes of Decision \textsc{dnnf} that we name $\wedge_d$-\textsc{obdd} and Structured Decision \textsc{dnnf}. In particular, we demonstrated that, while both these classes have \textsc{fpt}-sized representations for \textsc{cnf}s of bounded primal treewidth, they need \textsc{xp}-size for representation of \textsc{cnf}s of bounded incidence treewidth. In the main part of this paper we carry out an in-depth study of the $\wedge_d$-\textsc{obdd} model. We formulate a generic methodology for proving lower bounds for the model. Using this methodology, we reestablish the \textsc{xp} lower bound provided in [arxiv:1708.07767]. We also provide exponential separations between \textsc{fbdd} and $\wedge_d$-\textsc{obdd} and between $\wedge_d$-\textsc{obdd} and an ordinary \textsc{obdd}. We study the complexity of Apply operation for $\wedge_d$-\textsc{obdd}. While, in general, the Apply operation leads to exponential blow up of the resulting model, we identify a special restricted case where the Apply operation can be carried out efficiently. We introduce a relaxed version of Structured Decision \textsc{dnnf} that we name Structured $\wedge_d$-\textsc{fbdd} and demonstrate that this model is quite powerful for \textsc{cnf}s of bounded incidence treewidth.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes</title>
  <link>https://arxiv.org/abs/2603.05189</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05189v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in resume screening pipelines. Although explicit PII (e.g., names) is commonly redacted, resumes typically retain subtle sociocultural markers (languages, co-curricular activities, volunteering, hobbies) that can act as demographic proxies. We introduce a generalisable stress-test framework for hiring fairness, instantiated in the Singapore context: 100 neutral job-aligned resumes are augmented into 4100 variants spanning four ethnicities and two genders, differing only in job-irrelevant markers. We evaluate 18 LLMs in two realistic settings: (i) Direct Comparison (1v1) and (ii) Score &amp; Shortlist (top-scoring rate), each with and without rationale prompting. Even without explicit identifiers, models recover demographic attributes with high F1 and exhibit systematic disparities, with models favouring markers associated with Chinese and Caucasian males. Ablations show language markers suffice for ethnicity inference, whereas gender relies on hobbies and activities. Furthermore, prompting for explanations tends to amplify bias. Our findings suggest that seemingly innocuous markers surviving anonymisation can materially skew automated hiring outcomes.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis</title>
  <link>https://arxiv.org/abs/2603.04982</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04982v1 Announce Type: new Abstract: Can targeted user training unlock the productive potential of generative artificial intelligence (GenAI) in professional settings? We investigate this question using a randomized study involving 164 law students completing an issue-spotting examination. Participants were assigned to one of three conditions: no GenAI access, optional access to a large language model (LLM), or optional access accompanied by an approximately ten-minute training intervention. Training significantly increased LLM adoption--the usage rate rose from 26% to 41%--and improved examination performance. Students with trained access scored 0.27 grade points higher than those with untrained access (p = 0.027), equivalent to roughly one-third of a letter grade. By contrast, access to an LLM without training did not improve performance and was associated with shorter answers relative to no access. Using principal stratification, we decompose the overall effect into adoption and effectiveness channels. Point estimates are consistent with training operating primarily by expanding the scope of GenAI use rather than by enhancing effectiveness among existing users, though confidence intervals are wide. Overall, our findings provide evidence that complementary investments in user training are critical for realizing GenAI productivity gains in knowledge-intensive fields where concerns about reliability may inhibit adoption.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Analysis of Terms of Service on Social Media Platforms: Consent Challenges and Assessment Metrics</title>
  <link>https://arxiv.org/abs/2603.04701</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04701v1 Announce Type: new Abstract: Social media platforms typically obtain user consent through Terms of Service (ToS) presented at account creation, rather than through dedicated consent forms. This study investigates whether consent-related information is clearly communicated within these ToS documents. We propose and apply a three-dimensional consent evaluation framework encompassing Textual Accessibility, Semantic Transparency, and Interface Design as declared in ToS documents. Using a combination of computational and qualitative analyses, we assess ToS from 13 major social media platforms. Our findings reveal important shortcomings across platforms, including high linguistic complexity, widespread use of non-committal language, limited disclosure of data retention and sharing practices, and the absence of explicit interface-level commitments to granular or revocable consent. These results indicate that while consent is formally embedded in ToS, it is often presented in ways that constrain clarity and meaningful choice. Rather than treating ToS documents as informed consent instruments, this study positions them as consent-bearing documents whose design and content shape the conditions under which users are asked to agree to data practices. The proposed framework offers a systematic method for evaluating consent information within ToS in the absence of explicit consent forms and informs the design of clearer, more ethically robust consent mechanisms for data-intensive digital platforms.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>A Case Study in Responsible AI-Assisted Video Solutions: Multi-Metric Behavioral Insights in a Public Market Setting</title>
  <link>https://arxiv.org/abs/2603.04607</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04607v1 Announce Type: new Abstract: Despite recent advances in Computer Vision and Artificial Intelligence (AI), AI-assisted video solutions have struggled to penetrate real-world urban environments due to significant concerns regarding privacy, ethical risks, and technical challenges like bias and explainability. This work addresses these barriers through a case study in a city-center public market, demonstrating a pathway for the responsible deployment of AI in community spaces. By adopting a user-centric methodology that prioritizes public trust and privacy safeguards, we show that detailed, operationally relevant behavioral insights can be derived from abstract data representations without compromising ethical standards. The study focuses on generating Multi-Metric Behavioral Insights through the extraction of three complementary signals: customer directional flow, dwell duration, and movement patterns. Utilizing human pose detection and complex behavioral analysis - processed through geometric normalization and motion modeling - the system remains robust under tracking fragmentation and occlusion. Data collected over 18 days, spanning routine operations and a festival window from May 2-4, reveals a consistently right-skewed dwell-time behavior. While most visits last approximately 3-4 minutes, peak activity periods increase the mean to roughly 22 minutes. Furthermore, movement analysis indicates uneven circulation, with over 60% of traffic concentrated in approximately 30% of the venue space. By mapping popular thoroughfares and high-traffic storefronts, this case study provides venue managers and business owners with objective, measurable information to optimize foot traffic. Ultimately, these results demonstrate that AI-enabled video solutions can be successfully integrated into urban environments to provide high-fidelity spatial analytics while maintaining strict adherence to privacy and social responsibility.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>Comparative Evaluation of Traditional Methods and Deep Learning for Brain Glioma Imaging. Review Paper</title>
  <link>https://arxiv.org/abs/2603.04796</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04796v1 Announce Type: new Abstract: Segmentation is crucial for brain gliomas as it delineates the glioma s extent and location, aiding in precise treatment planning and monitoring, thus improving patient outcomes. Accurate segmentation ensures proper identification of the glioma s size and position, transforming images into applicable data for analysis. Classification of brain gliomas is also essential because different types require different treatment approaches. Accurately classifying brain gliomas by size, location, and aggressiveness is essential for personalized prognosis prediction, follow-up care, and monitoring disease progression, ensuring effective diagnosis, treatment, and management. In glioma research, irregular tissues are often observable, but error free and reproducible segmentation is challenging. Many researchers have surveyed brain glioma segmentation, proposing both fully automatic and semi-automatic techniques. The adoption of these methods by radiologists depends on ease of use and supervision, with semi-automatic techniques preferred due to the need for accurate evaluations. This review evaluates effective segmentation and classification techniques post magnetic resonance imaging acquisition, highlighting that convolutional neural network architectures outperform traditional techniques in these tasks.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>DSA-SRGS: Super-Resolution Gaussian Splatting for Dynamic Sparse-View DSA Reconstruction</title>
  <link>https://arxiv.org/abs/2603.04770</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04770v1 Announce Type: new Abstract: Digital subtraction angiography (DSA) is a key imaging technique for the auxiliary diagnosis and treatment of cerebrovascular diseases. Recent advancements in gaussian splatting and dynamic neural representations have enabled robust 3D vessel reconstruction from sparse dynamic inputs. However, these methods are fundamentally constrained by the resolution of input projections, where performing naive upsampling to enhance rendering resolution inevitably results in severe blurring and aliasing artifacts. Such lack of super-resolution capability prevents the reconstructed 4D models from recovering fine-grained vascular details and intricate branching structures, which restricts their application in precision diagnosis and treatment. To solve this problem, this paper proposes DSA-SRGS, the first super-resolution gaussian splatting framework for dynamic sparse-view DSA reconstruction. Specifically, we introduce a Multi-Fidelity Texture Learning Module that integrates high-quality priors from a fine-tuned DSA-specific super-resolution model, into the 4D reconstruction optimization. To mitigate potential hallucination artifacts from pseudo-labels, this module employs a Confidence-Aware Strategy to adaptively weight supervision signals between the original low-resolution projections and the generated high-resolution pseudo-labels. Furthermore, we develop Radiative Sub-Pixel Densification, an adaptive strategy that leverages gradient accumulation from high-resolution sub-pixel sampling to refine the 4D radiative gaussian kernels. Extensive experiments on two clinical DSA datasets demonstrate that DSA-SRGS significantly outperforms state-of-the-art methods in both quantitative metrics and qualitative visual fidelity.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>LAW &amp; ORDER: Adaptive Spatial Weighting for Medical Diffusion and Segmentation</title>
  <link>https://arxiv.org/abs/2603.04795</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04795v1 Announce Type: new Abstract: Medical image analysis relies on accurate segmentation, and benefits from controllable synthesis (of new training images). Yet both tasks of the cyclical pipeline face spatial imbalance: lesions occupy small regions against vast backgrounds. In particular, diffusion models have been shown to drift from prescribed lesion layouts, while efficient segmenters struggle on spatially uncertain regions. Adaptive spatial weighting addresses this by learning where to allocate computational resources. This paper introduces a pair of network adapters: 1) Learnable Adaptive Weighter (LAW) which predicts per-pixel loss modulation from features and masks for diffusion training, stabilized via a mix of normalization, clamping, and regularization to prevent degenerate solutions; and 2) Optimal Region Detection with Efficient Resolution (ORDER) which applies selective bidirectional skip attention at late decoder stages for efficient segmentation. Experiments on polyp and kidney tumor datasets demonstrate that LAW achieves 20% FID generative improvement over a uniform baseline (52.28 vs. 65.60), with synthetic data then improving downstream segmentation by 4.9% Dice coefficient (83.2% vs. 78.3%). ORDER reaches 6.0% Dice improvement on MK-UNet (81.3% vs. 75.3%) with 0.56 GFLOPs and just 42K parameters, remaining 730x smaller than the standard nnUNet.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Toward Real-world Infrared Image Super-Resolution: A Unified Autoregressive Framework and Benchmark Dataset</title>
  <link>https://arxiv.org/abs/2603.04745</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04745v1 Announce Type: new Abstract: Infrared image super-resolution (IISR) under real-world conditions is a practically significant yet rarely addressed task. Pioneering works are often trained and evaluated on simulated datasets or neglect the intrinsic differences between infrared and visible imaging. In practice, however, real infrared images are affected by coupled optical and sensing degradations that jointly deteriorate both structural sharpness and thermal fidelity. To address these challenges, we propose Real-IISR, a unified autoregressive framework for real-world IISR that progressively reconstructs fine-grained thermal structures and clear backgrounds in a scale-by-scale manner via thermal-structural guided visual autoregression. Specifically, a Thermal-Structural Guidance module encodes thermal priors to mitigate the mismatch between thermal radiation and structural edges. Since non-uniform degradations typically induce quantization bias, Real-IISR adopts a Condition-Adaptive Codebook that dynamically modulates discrete representations based on degradation-aware thermal priors. Also, a Thermal Order Consistency Loss enforces a monotonic relation between temperature and pixel intensity, ensuring relative brightness order rather than absolute values to maintain physical consistency under spatial misalignment and thermal drift. We build FLIR-IISR, a real-world IISR dataset with paired LR-HR infrared images acquired via automated focus variation and motion-induced blur. Extensive experiments demonstrate the promising performance of Real-IISR, providing a unified foundation for real-world IISR and benchmarking. The dataset and code are available at: https://github.com/JZD151/Real-IISR.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation</title>
  <link>https://arxiv.org/abs/2603.04733</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04733v1 Announce Type: new Abstract: Test-Time Adaptation (TTA) is essential for enabling deep learning models to handle real-world data distribution shifts. However, current approaches face significant limitations: backpropagation-based methods are not suitable for low-end deployment devices, due to their high computation and memory requirements, as well as their tendency to modify model weights during adaptation; while traditional backpropagation-free techniques exhibit constrained adaptation capabilities. In this work, we propose Forward-Only Zeroth-Order Optimization (FOZO), a novel and practical backpropagation-free paradigm for TTA. FOZO leverages a memory-efficient zeroth-order prompt optimization, which is led by objectives optimizing both intermediate feature statistics and prediction entropy. To ensure efficient and stable adaptation over the out-of-distribution data stream, we introduce a dynamically decaying perturbation scale during zeroth-order gradient estimation and theoretically prove its convergence under the TTA data stream assumption. Extensive continual adaptation experiments on ImageNet-C, ImageNet-R, and ImageNet-Sketch demonstrate FOZO&#39;s superior performance, achieving 59.52% Top-1 accuracy on ImageNet-C (5K, level 5) and outperforming main gradient-based methods and SOTA forward-only FOA (58.13%). Furthermore, FOZO exhibits strong generalization on quantized (INT8) models. These findings demonstrate that FOZO is a highly competitive solution for TTA deployment in resource-limited scenarios.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition</title>
  <link>https://arxiv.org/abs/2603.04766</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04766v1 Announce Type: new Abstract: Existing manual labeling of micro-expressions is subject to errors in accuracy, especially in cross-cultural scenarios where deviation in labeling of key frames is more prominent. To address this issue, this paper presents a novel Global Anti-Monotonic Differential Selection Strategy (GAMDSS) architecture for enhancing the effectiveness of spatio-temporal modeling of micro-expressions through keyframe re-selection. Specifically, the method identifies Onset and Apex frames, which are characterized by significant micro-expression variation, from complete micro-expression action sequences via a dynamic frame reselection mechanism. It then uses these to determine Offset frames and construct a rich spatio-temporal dynamic representation. A two-branch structure with shared parameters is then used to efficiently extract spatio-temporal features. Extensive experiments are conducted on seven widely recognized micro-expression datasets. The results demonstrate that GAMDSS effectively reduces subjective errors caused by human factors in multicultural datasets such as SAMM and 4DME. Furthermore, quantitative analyses confirm that offset-frame annotations in multicultural datasets are more uncertain, providing theoretical justification for standardizing micro-expression annotations. These findings directly support our argument for reconsidering the validity and generalizability of dataset annotation paradigms. Notably, this design can be integrated into existing models without increasing the number of parameters, offering a new approach to enhancing micro-expression recognition performance. The source code is available on GitHub[https://github.com/Cross-Innovation-Lab/GAMDSS].</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Evaluating GPT-5 as a Multimodal Clinical Reasoner: A Landscape Commentary</title>
  <link>https://arxiv.org/abs/2603.04763</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04763v1 Announce Type: new Abstract: The transition from task-specific artificial intelligence toward general-purpose foundation models raises fundamental questions about their capacity to support the integrated reasoning required in clinical medicine, where diagnosis demands synthesis of ambiguous patient narratives, laboratory data, and multimodal imaging. This landscape commentary provides the first controlled, cross-sectional evaluation of the GPT-5 family (GPT-5, GPT-5 Mini, GPT-5 Nano) against its predecessor GPT-4o across a diverse spectrum of clinically grounded tasks, including medical education examinations, text-based reasoning benchmarks, and visual question-answering in neuroradiology, digital pathology, and mammography using a standardized zero-shot chain-of-thought protocol. GPT-5 demonstrated substantial gains in expert-level textual reasoning, with absolute improvements exceeding 25 percentage-points on MedXpertQA. When tasked with multimodal synthesis, GPT-5 effectively leveraged this enhanced reasoning capacity to ground uncertain clinical narratives in concrete imaging evidence, achieving state-of-the-art or competitive performance across most VQA benchmarks and outperforming GPT-4o by margins of 10-40% in mammography tasks requiring fine-grained lesion characterization. However, performance remained moderate in neuroradiology (44% macro-average accuracy) and lagged behind domain-specific models in mammography, where specialized systems exceed 80% accuracy compared to GPT-5&#39;s 52-64%. These findings indicate that while GPT-5 represents a meaningful advance toward integrated multimodal clinical reasoning, mirroring the clinician&#39;s cognitive process of biasing uncertain information with objective findings, generalist models are not yet substitutes for purpose-built systems in highly specialized, perception-critical tasks.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models</title>
  <link>https://arxiv.org/abs/2603.04837</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04837v1 Announce Type: new Abstract: We introduce the Dynamic Behavioral Constraint (DBC) benchmark, the first empirical framework for evaluating the efficacy of a structured, 150-control behavioral governance layer, the MDBC (Madan DBC) system, applied at inference time to large language models (LLMs). Unlike training time alignment methods (RLHF, DPO) or post-hoc content moderation APIs, DBCs constitute a system prompt level governance layer that is model-agnostic, jurisdiction-mappable, and auditable. We evaluate the DBC Framework across a 30 domain risk taxonomy organized into six clusters (Hallucination and Calibration, Bias and Fairness, Malicious Use, Privacy and Data Protection, Robustness and Reliability, and Misalignment Agency) using an agentic red-team protocol with five adversarial attack strategies (Direct, Roleplay, Few-Shot, Hypothetical, Authority Spoof) across 3 model families. Our three-arm controlled design (Base, Base plus Moderation, Base plus DBC) enables causal attribution of risk reduction. Key findings: the DBC layer reduces the aggregate Risk Exposure Rate (RER) from 7.19 percent (Base) to 4.55 percent (Base plus DBC), representing a 36.8 percent relative risk reduction, compared with 0.6 percent for a standard safety moderation prompt. MDBC Adherence Scores improve from 8.6 by 10 (Base) to 8.7 by 10 (Base plus DBC). EU AI Act compliance (automated scoring) reaches 8.5by 10 under the DBC layer. A three judge evaluation ensemble yields Fleiss kappa greater than 0.70 (substantial agreement), validating our automated pipeline. Cluster ablation identifies the Integrity Protection cluster (MDBC 081 099) as delivering the highest per domain risk reduction, while graybox adversarial attacks achieve a DBC Bypass Rate of 4.83 percent . We release the benchmark code, prompt database, and all evaluation artefacts to enable reproducibility and longitudinal tracking as models evolve.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment</title>
  <link>https://arxiv.org/abs/2603.04822</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04822v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Human Feedback (RLHF) often handle only coarse-grained attributes. In practice, fine-tuning LLMs on task-specific datasets to optimize value alignment inevitably incurs an alignment tax: the model&#39;s pre-calibrated value system drifts significantly due to latent bias absorption from training data, while the fine-tuning process also causes severe hallucinations and semantic information loss in generated responses. To address this, we propose VISA (Value Injection via Shielded Adaptation), a closed-loop framework designed to navigate this trade-off. VISA&#39;s architecture features a high-precision value detector, a semantic-to-value translator, and a core value-rewriter. The value-rewriter is trained via Group Relative Policy Optimization (GRPO) with a composite reward function that simultaneously optimizes for fine-grained value precision, and the preservation of semantic integrity. By learning an optimal policy to balance these competing objectives, VISA effectively mitigates the alignment tax while staying loyal to the original knowledge. Our experiments demonstrate that this approach enables precise control over a model&#39;s value expression while maintaining its factual consistency and general capabilities, significantly outperforming both standard fine-tuning methods and prompting-based baselines, including GPT-4o.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild</title>
  <link>https://arxiv.org/abs/2603.04727</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04727v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have demonstrated impressive general competence in video understanding, yet their reliability for real-world Video Anomaly Detection (VAD) remains largely unexplored. Unlike conventional pipelines relying on reconstruction or pose-based cues, MLLMs enable a paradigm shift: treating anomaly detection as a language-guided reasoning task. In this work, we systematically evaluate state-of-the-art MLLMs on the ShanghaiTech and CHAD benchmarks by reformulating VAD as a binary classification task under weak temporal supervision. We investigate how prompt specificity and temporal window lengths (1s--3s) influence performance, focusing on the precision--recall trade-off. Our findings reveal a pronounced conservative bias in zero-shot settings; while models exhibit high confidence, they disproportionately favor the &#39;normal&#39; class, resulting in high precision but a recall collapse that limits practical utility. We demonstrate that class-specific instructions can significantly shift this decision boundary, improving the peak F1-score on ShanghaiTech from 0.09 to 0.64, yet recall remains a critical bottleneck. These results highlight a significant performance gap for MLLMs in noisy environments and provide a foundation for future work in recall-oriented prompting and model calibration for open-world surveillance, which demands complex video understanding and reasoning.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>From Offline to Periodic Adaptation for Pose-Based Shoplifting Detection in Real-world Retail Security</title>
  <link>https://arxiv.org/abs/2603.04723</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04723v1 Announce Type: new Abstract: Shoplifting is a growing operational and economic challenge for retailers, with incidents rising and losses increasing despite extensive video surveillance. Continuous human monitoring is infeasible, motivating automated, privacy-preserving, and resource-aware detection solutions. In this paper, we cast shoplifting detection as a pose-based, unsupervised video anomaly detection problem and introduce a periodic adaptation framework designed for on-site Internet of Things (IoT) deployment. Our approach enables edge devices in smart retail environments to adapt from streaming, unlabeled data, supporting scalable and low-latency anomaly detection across distributed camera networks. To support reproducibility, we introduce RetailS, a new large-scale real-world shoplifting dataset collected from a retail store under multi-day, multi-camera conditions, capturing unbiased shoplifting behavior in realistic IoT settings. For deployable operation, thresholds are selected using both F1 and H_PRS scores, the harmonic mean of precision, recall, and specificity, during data filtering and training. In periodic adaptation experiments, our framework consistently outperformed offline baselines on AUC-ROC and AUC-PR in 91.6% of evaluations, with each training update completing in under 30 minutes on edge-grade hardware, demonstrating the feasibility and reliability of our solution for IoT-enabled smart retail deployment.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling</title>
  <link>https://arxiv.org/abs/2603.04791</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04791v1 Announce Type: new Abstract: We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction</title>
  <link>https://arxiv.org/abs/2603.04783</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04783v1 Announce Type: new Abstract: While LLMs demonstrate strong reasoning capabilities when provided with full information in a single turn, they exhibit substantial vulnerability in multi-turn interactions. Specifically, when information is revealed incrementally or requires updates, models frequently fail to integrate new constraints, leading to a collapse in performance compared to their single-turn baselines. We term the root cause as \emph{Contextual Inertia}: a phenomenon where models rigidly adhere to previous reasoning traces. Even when users explicitly provide corrections or new data in later turns, the model ignores them, preferring to maintain consistency with its previous (incorrect) reasoning path. To address this, we introduce \textbf{R}einforcement \textbf{L}earning with \textbf{S}ingle-\textbf{T}urn \textbf{A}nchors (\textbf{RLSTA}), a generalizable training approach designed to stabilize multi-turn interaction across diverse scenarios and domains. RLSTA leverages the model&#39;s superior single-turn capabilities as stable internal anchors to provide reward signals. By aligning multi-turn responses with these anchors, RLSTA empowers models to break contextual inertia and self-calibrate their reasoning based on the latest information. Experiments show that RLSTA significantly outperforms standard fine-tuning and abstention-based methods. Notably, our method exhibits strong cross-domain generalization (e.g., math to code) and proves effective even without external verifiers, highlighting its potential for general-domain applications.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem</title>
  <link>https://arxiv.org/abs/2603.04756</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04756v1 Announce Type: new Abstract: MOOSEnger is a tool-enabled AI agent tailored to the Multiphysics Object-Oriented Simulation Environment (MOOSE). MOOSE cases are specified in HIT &quot;.i&quot; input files; the large object catalog and strict syntax make initial setup and debugging slow. MOOSEnger offers a conversational workflow that turns natural-language intent into runnable inputs by combining retrieval-augmented generation over curated docs/examples with deterministic, MOOSE-aware parsing, validation, and execution tools. A core-plus-domain architecture separates reusable agent infrastructure (configuration, registries, tool dispatch, retrieval services, persistence, and evaluation) from a MOOSE plugin that adds HIT-based parsing, syntax-preserving ingestion of input files, and domain-specific utilities for input repair and checking. An input precheck pipeline removes hidden formatting artifacts, fixes malformed HIT structure with a bounded grammar-constrained loop, and resolves invalid object types via similarity search over an application syntax registry. Inputs are then validated and optionally smoke-tested with the MOOSE runtime in the loop via an MCP-backed execution backend (with local fallback), translating solver diagnostics into iterative verify-and-correct updates. Built-in evaluation reports RAG metrics (faithfulness, relevancy, context precision/recall) and end-to-end success by actual execution. On a 125-prompt benchmark spanning diffusion, transient heat conduction, solid mechanics, porous flow, and incompressible Navier--Stokes, MOOSEnger achieves a 0.93 execution pass rate versus 0.08 for an LLM-only baseline.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Mask-aware inference with State-Space Models</title>
  <link>https://arxiv.org/abs/2603.04568</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04568v1 Announce Type: new Abstract: Many real-world computer vision tasks, such as depth completion, must handle inputs with arbitrarily shaped regions of missing or invalid data. For Convolutional Neural Networks (CNNs), Partial Convolutions solved this by a mask-aware re-normalization conditioned only on valid pixels. Recently, State Space Models (SSMs) like Mamba have emerged, offering high performance with linear complexity. However, these architectures lack an inherent mechanism for handling such arbitrarily shaped invalid data at inference time. To bridge this gap, we introduce Partial Vision Mamba (PVM), a novel architectural component that ports the principles of partial operations to the Mamba backbone. We also define a series of rules to design architectures using PVM. We show the efficacy and generalizability of our approach in the tasks of depth completion, image inpainting, and classification with invalid data.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Structure-Guided Histopathology Synthesis via Dual-LoRA Diffusion</title>
  <link>https://arxiv.org/abs/2603.04565</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04565v1 Announce Type: new Abstract: Histopathology image synthesis plays an important role in tissue restoration, data augmentation, and modeling of tumor microenvironments. However, existing generative methods typically address restoration and generation as separate tasks, although both share the same objective of structure-consistent tissue synthesis under varying degrees of missingness, and often rely on weak or inconsistent structural priors that limit realistic cellular organization. We propose Dual-LoRA Controllable Diffusion, a unified centroid-guided diffusion framework that jointly supports Local Structure Completion and Global Structure Synthesis within a single model. Multi-class nuclei centroids serve as lightweight and annotation-efficient spatial priors, providing biologically meaningful guidance under both partial and complete image absence. Two task-specific LoRA adapters specialize the shared backbone for local and global objectives without retraining separate diffusion models. Extensive experiments demonstrate consistent improvements over state-of-the-art GAN and diffusion baselines across restoration and synthesis tasks. For local completion, LPIPS computed within the masked region improves from 0.1797 (HARP) to 0.1524, and for global synthesis, FID improves from 225.15 (CoSys) to 76.04, indicating improved structural fidelity and realism. Our approach achieves more faithful structural recovery in masked regions and substantially improved realism and morphology consistency in full synthesis, supporting scalable pan-cancer histopathology modeling.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Fusion and Grouping Strategies in Deep Learning for Local Climate Zone Classification of Multimodal Remote Sensing Data</title>
  <link>https://arxiv.org/abs/2603.04562</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04562v1 Announce Type: new Abstract: Local Climate Zones (LCZs) give a zoning map to study urban structures and land use and analyze the impact of urbanization on local climate. Multimodal remote sensing enables LCZ classification, for which data fusion is significant for improving accuracy owing to the data complexity. However, there is a gap in a comprehensive analysis of the fusion mechanisms used in their deep learning (DL) classifier architectures. This study analyzes different fusion strategies in the multi-class LCZ classification models for multimodal data and grouping strategies based on inherent data characteristics. The different models involving Convolutional Neural Networks (CNNs) include: (i) baseline hybrid fusion (FM1), (ii) with self- and cross-attention mechanisms (FM2), (iii) with the multi-scale Gaussian filtered images (FM3), and (iv) weighted decision-level fusion (FM4). Ablation experiments are conducted to study the pixel-, feature-, and decision-level fusion effects in the model performance. Grouping strategies include band grouping (BG) within the data modalities and label merging (LM) in the ground truth. Our analysis is exclusively done on the So2Sat LCZ42 dataset, which consists of Synthetic Aperture Radar (SAR) and Multispectral Imaging (MSI) image pairs. Our results show that FM1 consistently outperforms simple fusion methods. FM1 with BG and LM is found to be the most effective approach among all fusion strategies, giving an overall accuracy of 76.6\%. Importantly, our study highlights the effect of these strategies in improving prediction accuracy for the underrepresented classes. Our code and processed datasets are available at https://github.com/GVCL/LCZC-MultiModalHybridFusion</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>InverseNet: Benchmarking Operator Mismatch and Calibration Across Compressive Imaging Modalities</title>
  <link>https://arxiv.org/abs/2603.04538</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04538v1 Announce Type: new Abstract: State-of-the-art EfficientSCI loses 20.58 dB when its assumed forward operator deviates from physical reality in just eight parameters, yet no existing benchmark quantifies operator mismatch, the default condition in deployed compressive imaging systems. We introduce InverseNet, the first cross-modality benchmark for operator mismatch, spanning CASSI, CACTI, and single-pixel cameras. Evaluating 12 methods under a four-scenario protocol (ideal, mismatched, oracle-corrected, blind calibration) across 27 simulated scenes and 9 real hardware captures, we find: (1) deep learning methods lose 10-21 dB under mismatch, eliminating their advantage over classical baselines; (2) performance and robustness are inversely correlated across modalities (Spearman r_s = -0.71, p &lt; 0.01); (3) mask-oblivious architectures recover 0% of mismatch losses regardless of calibration quality, while operator-conditioned methods recover 41-90%; (4) blind grid-search calibration recovers 85-100% of the oracle bound without ground truth. Real hardware experiments confirm that simulation trends transfer to physical data. Code will be released upon acceptance.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living</title>
  <link>https://arxiv.org/abs/2603.04509</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04509v1 Announce Type: new Abstract: Recognition of daily activities is a critical element for effective Ambient Assisted Living (AAL) systems, particularly to monitor the well-being and support the independence of older adults in indoor environments. However, developing robust activity recognition systems faces significant challenges, including intra-class variability, inter-class similarity, environmental variability, camera perspectives, and scene complexity. This paper presents a multi-modal approach for the recognition of activities of daily living tailored for older adults within AAL settings. The proposed system integrates visual information processed by a 3D Convolutional Neural Network (CNN) with 3D human pose data analyzed by a Graph Convolutional Network. Contextual information, derived from an object detection module, is fused with the 3D CNN features using a cross-attention mechanism to enhance recognition accuracy. This method is evaluated using the Toyota SmartHome dataset, which consists of real-world indoor activities. The results indicate that the proposed system achieves competitive classification accuracy for a range of daily activities, highlighting its potential as an essential component for advanced AAL monitoring solutions. This advancement supports the broader goal of developing intelligent systems that promote safety and autonomy among older adults.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Lap2: Revisiting Laplace DP-SGD for High Dimensions via Majorization Theory</title>
  <link>https://arxiv.org/abs/2602.23516</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.23516v2 Announce Type: replace Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning, widely used in both training from scratch and fine-tuning large-scale language models. While DP-SGD predominantly relies on the Gaussian mechanism, the Laplace mechanism remains underutilized due to its reliance on L1 norm clipping. This constraint severely limits its practicality in high-dimensional models because the L1 norm of an n-dimensional gradient can be up to sqrt(n) times larger than its L2 norm. As a result, the required noise scale grows significantly with model size, leading to poor utility or untrainable models. In this work, we introduce Lap2, a new solution that enables L2 clipping for Laplace DP-SGD while preserving strong privacy guarantees. We overcome the dimensionality-driven clipping barrier by computing coordinate-wise moment bounds and applying majorization theory to construct a tight, data-independent upper bound over the full model. By exploiting the Schur-convexity of the moment accountant function, we aggregate these bounds using a carefully designed majorization set that respects the L2 clipping constraint. This yields a multivariate privacy accountant that scales gracefully with model dimension and enables the use of thousands of moments. Empirical evaluations demonstrate that our approach significantly improves the performance of Laplace DP-SGD, achieving results comparable to or better than Gaussian DP-SGD under strong privacy constraints. For instance, fine-tuning RoBERTa-base (125M parameters) on SST-2 achieves 87.88% accuracy at epsilon=0.54, outperforming Gaussian (87.16%) and standard Laplace (48.97%) under the same budget.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>UC-Secure Star DKG for Non-Exportable Key Shares with VSS-Free Enforcement</title>
  <link>https://arxiv.org/abs/2602.22187</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.22187v2 Announce Type: replace Abstract: Distributed Key Generation (DKG) lets parties derive a common public key while keeping the signing key secret-shared. UC-secure DKG requires a verifiable-sharing enforcement layer -- classically satisfied via Verifiable Secret Sharing (VSS) and/or commitment-and-proof mechanisms -- for secrecy, uniqueness, and affine consistency. We target the Non-eXportable Key (NXK) setting enforced by hardware-backed key-isolation modules (e.g., TEEs, HSM-like APIs), formalized via an ideal KeyBox (keystore) functionality $\mathcal{F}_{KeyBox}$ that keeps shares non-exportable and permits only attested KeyBox-to-KeyBox sealing. With confidentiality delegated to the NXK boundary, the remaining challenge is enforcing transcript-defined affine consistency without exporting or resharing shares. State continuity rules out rewinding-based extraction, mandating straight-line techniques. We combine (i) KeyBox confidentiality; (ii) Unique Structure Verification (USV), a publicly verifiable certificate whose certified scalar never leaves the KeyBox yet whose public group element is transcript-derivable; and (iii) Fischlin-based UC-extractable NIZK arguments of knowledge in a gRO-CRP (global Random Oracle with Context-Restricted Programmability) model. We construct Star DKG (SDKG), a UC-secure scheme for multi-device threshold wallets where a designated service must co-sign but cannot sign alone, realizing a 1+1-out-of-$n$ star access structure (center plus any leaf) over roles (primary vs. recovery) with role-based device registration. In the $\mathcal{F}_{KeyBox}$-hybrid and gRO-CRP models, under DL and DDH assumptions with adaptive corruptions and secure erasures, SDKG UC-realizes a transcript-driven refinement of the standard UC-DKG functionality. Over a prime-order group of size $p$, SDKG incurs $\widetilde{O}(n\log p)$ communication overhead and $\widetilde{O}(n\log^{2.585}p)$ bit-operation cost.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections</title>
  <link>https://arxiv.org/abs/2602.15654</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.15654v2 Announce Type: replace Abstract: Self-evolving LLM agents update their internal state across sessions, often by writing and reusing long-term memory. This design improves performance on long-horizon tasks but creates a security risk: untrusted external content observed during a benign session can be stored as memory and later treated as instruction. We study this risk and formalize a persistent attack we call a Zombie Agent, where an attacker covertly implants a payload that survives across sessions, effectively turning the agent into a puppet of the attacker. We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content. The attack has two phases. During infection, the agent reads a poisoned source while completing a benign task and writes the payload into long-term memory through its normal update process. During trigger, the payload is retrieved or carried forward and causes unauthorized tool behavior. We design mechanism-specific persistence strategies for common memory implementations, including sliding-window and retrieval-augmented memory, to resist truncation and relevance filtering. We evaluate the attack on representative agent setups and tasks, measuring both persistence over time and the ability to induce unauthorized actions while preserving benign task quality. Our results show that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>BRIDG-ICS: AI-Grounded Knowledge Graphs for Intelligent Threat Analytics in Industry~5.0 Cyber-Physical Systems</title>
  <link>https://arxiv.org/abs/2512.12112</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2512.12112v2 Announce Type: replace Abstract: Industry 5.0&#39;s increasing integration of IT and OT systems is transforming industrial operations but also expanding the cyber-physical attack surface. Industrial Control Systems (ICS) face escalating security challenges as traditional siloed defences fail to provide coherent, cross-domain threat insights. We present BRIDG-ICS (BRIDge for Industrial Control Systems), an AI-driven Knowledge Graph (KG) framework for context-aware threat analysis and quantitative assessment of cyber resilience in smart manufacturing environments. BRIDG-ICS fuses heterogeneous industrial and cybersecurity data into an integrated Industrial Security Knowledge Graph linking assets, vulnerabilities, and adversarial behaviours with probabilistic risk metrics (e.g. exploit likelihood, attack cost). This unified graph representation enables multi-stage attack path simulation using graph-analytic techniques. To enrich the graph&#39;s semantic depth, the framework leverages Large Language Models (LLMs): domain-specific LLMs extract cybersecurity entities, predict relationships, and translate natural-language threat descriptions into structured graph triples, thereby populating the knowledge graph with missing associations and latent risk indicators. This unified AI-enriched KG supports multi-hop, causality-aware threat reasoning, improving visibility into complex attack chains and guiding data-driven mitigation. In simulated industrial scenarios, BRIDG-ICS scales well, reduces potential attack exposure, and can enhance cyber-physical system resilience in Industry 5.0 settings.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?</title>
  <link>https://arxiv.org/abs/2510.20333</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2510.20333v3 Announce Type: replace Abstract: Vision-Language Models (VLMs) are increasingly deployed as autonomous agents to navigate mobile graphical user interfaces (GUIs). Operating in dynamic on-device ecosystems, which include notifications, pop-ups, and inter-app interactions, exposes them to a unique and underexplored threat vector: environmental injection. Unlike prompt-based attacks that manipulate textual instructions, environmental injection corrupts an agent&#39;s visual perception by inserting adversarial UI elements (for example, deceptive overlays or spoofed notifications) directly into the GUI. This bypasses textual safeguards and can derail execution, causing privacy leakage, financial loss, or irreversible device compromise. To systematically evaluate this threat, we introduce GhostEI-Bench, the first benchmark for assessing mobile agents under environmental injection attacks within dynamic, executable environments. Moving beyond static image-based assessments, GhostEI-Bench injects adversarial events into realistic application workflows inside fully operational Android emulators and evaluates performance across critical risk scenarios. We further propose a judge-LLM protocol that conducts fine-grained failure analysis by reviewing the agent&#39;s action trajectory alongside the corresponding screenshot sequence, pinpointing failure in perception, recognition, or reasoning. Comprehensive experiments on state-of-the-art agents reveal pronounced vulnerability to deceptive environmental cues: current models systematically fail to perceive and reason about manipulated UIs. GhostEI-Bench provides a framework for quantifying and mitigating this emerging threat, paving the way toward more robust and secure embodied agents.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics</title>
  <link>https://arxiv.org/abs/2508.20643</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2508.20643v2 Announce Type: replace Abstract: Post-mortem analysis of compromised systems is a key aspect of cyber forensics, today a mostly manual, slow, and error-prone task. Agentic AI, i.e., LLM-powered agents, is a promising avenue for automation. However, applying such agents to cybersecurity remains largely unexplored and difficult, as this domain demands long-term reasoning, contextual memory, and consistent evidence correlation - capabilities that current LLM agents struggle to master. In this paper, we present the first systematic study of LLM agents to automate post-mortem investigation. As a first scenario, we consider realistic attacks in which remote attackers try to abuse online services using well-known CVEs (30 controlled cases). The agent receives as input the network traces of the attack and extracts forensic evidence. We compare three AI agent architectures, six LLM backends, and assess their ability to i) identify compromised services, ii) map exploits to exact CVEs, and iii) prepare thorough reports. Our best-performing system, CyberSleuth, achieves 80% accuracy on 2025 incidents, producing complete, coherent, and practically useful reports (judged by a panel of 25 experts). We next illustrate how readily CyberSleuth adapts to face the analysis of infected machine traffic, showing that the effective AI agent design can transfer across forensic tasks. Our findings show that (i) multi-agent specialisation is key to sustained reasoning; (ii) simple orchestration outperforms nested hierarchical architectures; and (iii) the CyberSleuth design generalises across different forensic tasks.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Accurate BGV Parameters Selection: Accounting for Secret and Public Key Dependencies in Average-Case Analysis</title>
  <link>https://arxiv.org/abs/2504.18597</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2504.18597v3 Announce Type: replace Abstract: The Brakerski-Gentry-Vaikuntanathan (BGV) scheme is one of the most significant fully homomorphic encryption (FHE) schemes. It belongs to a class of FHE schemes whose security is based on the presumed intractability of the Learning with Errors (LWE) problem and its ring variant (RLWE). Such schemes deal with a quantity, called noise, which increases each time a homomorphic operation is performed. Specifically, in order for the scheme to work properly, it is essential that the noise remains below a certain threshold throughout the process. For BGV, this threshold strictly depends on the ciphertext modulus, which is one of the initial parameters whose selection heavily affects both the efficiency and security of the scheme. For an optimal parameter choice, it is crucial to accurately estimate the noise growth, particularly that arising from multiplication, which is the most complex operation. In this work, we propose a novel average-case approach that precisely models noise evolution and guides the selection of initial parameters, improving efficiency while ensuring security. The key innovation of our method lies in accounting for the dependencies among ciphertext errors generated with the same key, and in providing general guidelines for accurate parameter selection that are library-independent.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>EVMbench: Evaluating AI Agents on Smart Contract Security</title>
  <link>https://arxiv.org/abs/2603.04915</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04915v1 Announce Type: cross Abstract: Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in ways that improve security and in ways that might increase risk. We introduce EVMbench, an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. EVMbench draws on 117 curated vulnerabilities from 40 repositories and, in the most realistic setting, uses programmatic grading based on tests and blockchain state under a local Ethereum execution environment. We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances. We release code, tasks, and tooling to support continued measurement of these capabilities and future work on security.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Lambda-randomization: multi-dimensional randomized response made easy</title>
  <link>https://arxiv.org/abs/2603.05261</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05261v1 Announce Type: new Abstract: Randomized response is a popular local anonymization approach that can deliver anonymized multi-dimensional data sets with rigorous privacy guarantees. At the same time, it can ensure validity for exploratory analysis and machine learning tasks as, under fairly general conditions, unbiased estimates of the underlying true distributions can be retrieved. However, and like for many other anonymization techniques, one of the main pitfalls of this approach is the curse of dimensionality. When coping with data sets with many attributes, one quickly runs into unsustainable computational costs for estimating true distributions, as well as a degradation in their accuracies. Relying on new theoretical insights developed in this paper, we propose an approach to multi-dimensional randomized response that avoids these traditional limitations. From simple yet intuitive parameterizations of the randomization matrices that we introduce, we develop a protocol called Lambda-randomization that entails low computational costs to retrieve estimates of multivariate distributions, and that makes use of solely three simple elements: a set of parameters ranging between 0 and 1 (one per attribute of the data set), the identity matrix, and the all-ones vector. We also present an empirical application to illustrate the proposed protocol.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation</title>
  <link>https://arxiv.org/abs/2603.05073</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05073v1 Announce Type: new Abstract: Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from users. Existing shuffle-DP protocols mainly focus on the design of shuffler-based categorical frequency oracle (SCFO) for frequency estimation on categorical data. However, numerical data is a more prevalent type and many real-world applications depend on the estimation of data distribution with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt to transplant existing SCFOs and the na\&quot;ive distribution recovery technique to this task, and demonstrate that these baseline protocols cannot simultaneously achieve outstanding performance in three metrics: 1) utility, 2) message complexity; and 3) robustness to data poisoning attacks. Therefore, we further propose a novel single-message \textit{adaptive shuffler-based piecewise} (ASP) protocol with high utility and robustness. In ASP, we first develop a randomizer by parameter optimization using our proposed tighter bound of mutual information. We also design an \textit{Expectation Maximization with Adaptive Smoothing} (EMAS) algorithm to accurately recover distribution with enhanced robustness. To quantify robustness, we propose a new evaluation framework to examine robustness under different attack targets, enabling us to comprehensively understand the protocol resilience under various adversarial scenarios. Extensive experiments demonstrate that ASP outperforms baseline protocols in all three metrics. Especially under small $\epsilon$ values, ASP achieves an order of magnitude improvement in utility with minimal message complexity, and exhibits over threefold robustness compared to baseline methods.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Modification to Fully Homomorphic Modified Rivest Scheme</title>
  <link>https://arxiv.org/abs/2603.04952</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04952v1 Announce Type: new Abstract: This document details the Fully Homomorphic Modified Rivest Scheme (FHMRS), a security issue in FHMRS, and a modification to FHMRS (mFHMRS) to mitigate the security issue.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows</title>
  <link>https://arxiv.org/abs/2603.04902</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04902v1 Announce Type: new Abstract: Agentic systems are increasingly acting on users&#39; behalf, accessing calendars, email, and personal files to complete everyday tasks. Privacy evaluation for these systems has focused on the input and output boundaries, but each task involves several intermediate information flows, from agent queries to tool responses, that are not currently evaluated. We argue that every boundary in an agentic pipeline is a site of potential privacy violation and must be assessed independently. To support this, we introduce the Privacy Flow Graph, a Contextual Integrity-grounded framework that decomposes agentic execution into a sequence of information flows, each annotated with the five CI parameters, and traces violations to their point of origin. We present AgentSCOPE, a benchmark of 62 multi-tool scenarios across eight regulatory domains with ground truth at every pipeline stage. Our evaluation across seven state-of-the-art LLMs show that privacy violations in the pipeline occur in over 80% of scenarios, even when final outputs appear clean (24%), with most violations arising at the tool-response stage where APIs return sensitive data indiscriminately. These results indicate that output-level evaluation alone substantially underestimates the privacy risk of agentic systems.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Osmosis Distillation: Model Hijacking with the Fewest Samples</title>
  <link>https://arxiv.org/abs/2603.04859</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04859v1 Announce Type: new Abstract: Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>ShieldBypass: On the Persistence of Impedance Leakage Beyond EM Shielding</title>
  <link>https://arxiv.org/abs/2603.04801</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04801v1 Announce Type: new Abstract: Electromagnetic (EM) shielding is widely used to suppress radiated emissions and limit passive EM side-channel leakage. However, shielding does not address active probing, where an adversary injects external radio-frequency (RF) signals and observes the device&#39;s reflective response. This work studies whether such impedance-modulated backscattering persists when radiated emissions are suppressed by shielding. By injecting controlled RF signals and analyzing the reflections, we demonstrate that state-dependent impedance variations remain observable at frequencies outside the shields&#39; primary attenuation band. Using processors implemented on FPGA and microcontroller prototypes, and evaluating workload profiles under three industry-standard shields, we find that passive EM measurements lose discriminative power under shielding, while backscattering responses remain separable. These results indicate that active RF probing can expose execution-dependent behavior even in shielded systems, motivating the need to consider active impedance-based probing within hardware security evaluation flows.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Efficient Privacy-Preserving Sparse Matrix-Vector Multiplication Using Homomorphic Encryption</title>
  <link>https://arxiv.org/abs/2603.04742</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04742v1 Announce Type: new Abstract: Sparse matrix-vector multiplication (SpMV) is a fundamental operation in scientific computing, data analysis, and machine learning. When the data being processed are sensitive, preserving privacy becomes critical, and homomorphic encryption (HE) has emerged as a leading approach for addressing this challenge. Although HE enables privacy-preserving computation, its application to SpMV has remained largely unaddressed. To the best of our knowledge, this paper presents the first framework that efficiently integrates HE with SpMV, addressing the dual challenges of computational efficiency and data privacy. In particular, we introduce a novel compressed matrix format, named Compressed Sparse Sorted Column (CSSC), which is specifically designed to optimize encrypted sparse matrix computations. By preserving sparsity and enabling efficient ciphertext packing, CSSC significantly reduces storage and computational overhead. Our experimental results on real-world datasets demonstrate that the proposed method achieves significant gains in both processing time and memory usage. This study advances privacy-preserving SpMV and lays the groundwork for secure applications in federated learning, encrypted databases, scientific computing, and beyond.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Impact of 5G SA Logical Vulnerabilities on UAV Communications: Threat Models and Testbed Evaluation</title>
  <link>https://arxiv.org/abs/2603.04662</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04662v1 Announce Type: new Abstract: This paper examines how logical vulnerabilities in 5G Standalone networks affect UAV command and control communication. The study looks at three attacker positions in the architecture: a malicious user equipment (UE) connected to the same logical network as the UAV, an attacker with access to the 5G core, and a compromised gNodeB. To test these scenarios, a testbed was created using Open5GS, UERANSIM, and Kubernetes. The setup simulates a UAV-GCS communication system over a 5G SA network and allows for controlled attacks on various network interfaces. The experiments reveal that attacks at different points in the architecture can disrupt UAV operations. These disruptions include manipulating control commands and terminating data sessions. The findings emphasize the need for isolation measures in the 5G user plane and integrity protection in UAV command protocols.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection</title>
  <link>https://arxiv.org/abs/2603.04469</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04469v1 Announce Type: new Abstract: Multi-Agent System is emerging as the \textit{de facto} standard for complex task orchestration. However, its reliance on autonomous execution and unstructured inter-agent communication introduces severe risks, such as indirect prompt injection, that easily circumvent conventional input guardrails. To address this, we propose \SysName, a framework that shifts the defensive paradigm from static input filtering to execution-aware analysis. By extracting and reconstructing Cross-Agent Semantic Flows, \SysName synthesizes fragmented operational primitives into contiguous behavioral trajectories, enabling a holistic view of system activity. We leverage a Supervisor LLM to scrutinize these trajectories, identifying anomalies across data flow violations, control flow deviations, and intent inconsistencies. Empirical evaluations demonstrate that \SysName effectively detects over ten distinct compound attack vectors, achieving F1-scores of 85.3\% and 66.7\% for node-level and path-level end-to-end attack detection, respectively. The source code is available at https://anonymous.4open.science/r/MAScope-71DC.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks</title>
  <link>https://arxiv.org/abs/2603.04459</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04459v1 Announce Type: new Abstract: The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and enabling systematic comparisons. Yet, it remains unclear why certain benchmarks gain prominence, and no systematic assessment has been conducted on their academic influence or code quality. This paper fills this gap by presenting the first multi-dimensional evaluation of the influence (based on five metrics) and code quality (based on both automated and human assessment) on LLM safety benchmarks, analyzing 31 benchmarks and 382 non-benchmarks across prompt injection, jailbreak, and hallucination. We find that benchmark papers show no significant advantage in academic influence (e.g., citation count and density) over non-benchmark papers. We uncover a key misalignment: while author prominence correlates with paper influence, neither author prominence nor paper influence shows a significant correlation with code quality. Our results also indicate substantial room for improvement in code and supplementary materials: only 39% of repositories are ready-to-use, 16% include flawless installation guides, and a mere 6% address ethical considerations. Given that the work of prominent researchers tends to attract greater attention, they need to lead the effort in setting higher standards.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics</title>
  <link>https://arxiv.org/abs/2603.04656</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04656v1 Announce Type: new Abstract: With the emergence of search-enabled generative QA systems, users are increasingly turning to tools that browse, aggregate, and reconcile evidence across multiple sources on their behalf. Yet many widely used QA benchmarks remain answerable by retrieving a single relevant passage, making them poorly suited for measuring cross-source sensemaking, such as integrating evidence, tracking causal links, and resolving dependencies across facets of a topic. We present iAgentBench, a dynamic ODQA benchmark that targets these higher-level information needs while keeping questions natural and grounded in realistic information-seeking behavior. iAgentBench draws seed topics from real-world attention signals and uses common user intent patterns to construct user-like questions whose answers require combining evidence from multiple sources, not just extracting a single snippet. Each instance is released with traceable evidence and auditable intermediate artifacts that support contamination checks and enable fine-grained diagnosis of failures in retrieval versus synthesis. Experiments across multiple LLMs show that retrieval improves accuracy, but retrieval alone does not reliably resolve these questions, underscoring the need to evaluate evidence use, not just evidence access.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Coordinated Semantic Alignment and Evidence Constraints for Retrieval-Augmented Generation with Large Language Models</title>
  <link>https://arxiv.org/abs/2603.04647</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04647v1 Announce Type: new Abstract: Retrieval augmented generation mitigates limitations of large language models in factual consistency and knowledge updating by introducing external knowledge. However, practical applications still suffer from semantic misalignment between retrieved results and generation objectives, as well as insufficient evidence utilization. To address these challenges, this paper proposes a retrieval augmented generation method that integrates semantic alignment with evidence constraints through coordinated modeling of retrieval and generation stages. The method first represents the relevance between queries and candidate evidence within a unified semantic space. This ensures that retrieved results remain semantically consistent with generation goals and reduces interference from noisy evidence and semantic drift. On this basis, an explicit evidence constraint mechanism is introduced. Retrieved evidence is transformed from an implicit context into a core control factor in generation. This restricts the expression scope of generated content and strengthens dependence on evidence. By jointly modeling semantic consistency and evidence constraints within a unified framework, the proposed approach improves factual reliability and verifiability while preserving natural language fluency. Comparative results show stable improvements across multiple generation quality metrics. This confirms the effectiveness and necessity of coordinated semantic alignment and evidence constraint modeling in retrieval augmented generation tasks.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning</title>
  <link>https://arxiv.org/abs/2603.04597</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04597v1 Announce Type: new Abstract: Large language models (LLMs) typically receive diverse natural language (NL) feedback through interaction with the environment. However, current reinforcement learning (RL) algorithms rely solely on scalar rewards, leaving the rich information in NL feedback underutilized and leading to inefficient exploration. In this work, we propose GOLF, an RL framework that explicitly exploits group-level language feedback to guide targeted exploration through actionable refinements. GOLF aggregates two complementary feedback sources: (i) external critiques that pinpoint errors or propose targeted fixes, and (ii) intra-group attempts that supply alternative partial ideas and diverse failure patterns. These group-level feedbacks are aggregated to produce high-quality refinements, which are adaptively injected into training as off-policy scaffolds to provide targeted guidance in sparse-reward regions. Meanwhile, GOLF jointly optimizes generation and refinement within a unified RL loop, creating a virtuous cycle that continuously improves both capabilities. Experiments on both verifiable and non-verifiable benchmarks show that GOLF achieves superior performance and exploration efficiency, achieving 2.2$\times$ improvements in sample efficiency compared to RL methods trained solely on scalar rewards. Code is available at https://github.com/LuckyyySTA/GOLF.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>From Static Inference to Dynamic Interaction: Navigating the Landscape of Streaming Large Language Models</title>
  <link>https://arxiv.org/abs/2603.04592</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04592v1 Announce Type: new Abstract: Standard Large Language Models (LLMs) are predominantly designed for static inference with pre-defined inputs, which limits their applicability in dynamic, real-time scenarios. To address this gap, the streaming LLM paradigm has emerged. However, existing definitions of streaming LLMs remain fragmented, conflating streaming generation, streaming inputs, and interactive streaming architectures, while a systematic taxonomy is still lacking. This paper provides a comprehensive overview and analysis of streaming LLMs. First, we establish a unified definition of streaming LLMs based on data flow and dynamic interaction to clarify existing ambiguities. Building on this definition, we propose a systematic taxonomy of current streaming LLMs and conduct an in-depth discussion on their underlying methodologies. Furthermore, we explore the applications of streaming LLMs in real-world scenarios and outline promising research directions to support ongoing advances in streaming intelligence. We maintain a continuously updated repository of relevant papers at https://github.com/EIT-NLP/Awesome-Streaming-LLMs.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity&#39;s Last Exam</title>
  <link>https://arxiv.org/abs/2603.04454</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04454v1 Announce Type: new Abstract: How carefully and unambiguously a question is phrased has a profound impact on the quality of the response, for Language Models (LMs) as well as people. While model capabilities continue to advance, the interplay between grounding context and query formulation remains under-explored. This work investigates how the quality of background grounding information in a model&#39;s context window affects accuracy. We find that combining well-grounded dynamic context construction (i.e, RAG) with query rewriting reduces question ambiguity, resulting in significant accuracy gains. Given a user question with associated answer-free grounding context, rewriting the question to reduce ambiguity produces benchmark improvements without changing the answer itself, even compared to prepending that context before the question. Using \texttt{gpt-oss-20b} to rewrite a subset of Humanity&#39;s Last Exam using answer-free grounding context improves \texttt{gpt-5-mini} accuracy from 0.14 to 0.37. We demonstrate that this accuracy improvement cannot be fully recovered just through prompting at inference time; rather, distinct rewriting and answering phases are required. Code and data are available at https://github.com/mmajurski/lm-rewrite-uplift</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models</title>
  <link>https://arxiv.org/abs/2603.04453</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04453v1 Announce Type: new Abstract: The use of multimodal large language models has become widespread, and as such the study of these models and their failure points has become of utmost importance. We study a novel mode of failure that causes degradation in performance indirectly by optimizing a loss term that seeks to maximize numerical instability in the inference stage of these models. We apply this loss term as the optimization target to construct images that, when used on multimodal large language models, cause significant degradation in the output. We validate our hypothesis on state of the art models large vision language models (LLaVa-v1.5-7B, Idefics3-8B, SmolVLM-2B-Instruct) against standard datasets (Flickr30k, MMVet, TextVQA, VQAv2, POPE, COCO) and show that performance degrades significantly, even with a very small change to the input image, compared to baselines. Our results uncover a fundamentally different vector of performance degradation, highlighting a failure mode not captured by adversarial perturbations.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Generating Realistic, Protocol-Compliant Maritime Radio Dialogues using Self-Instruct and Low-Rank Adaptation</title>
  <link>https://arxiv.org/abs/2603.04423</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04423v1 Announce Type: new Abstract: VHF radio miscommunication remains a major safety risk in maritime operations, with human factors accounting for over 58% of recorded incidents in Europe between 2014 and 2023. Despite decades of operational use, VHF radio communications are still prone to noise, interference, linguistic variability, and the absence of real-time transcription, making procedural errors both frequent and difficult to correct. Developing AI-assisted systems to support real-time communication and decision-making requires a considerable amount of high-quality maritime data, yet operational, regulatory, and privacy constraints render such datasets scarce. This study introduces a compliance aware Self-Instruct methodology for generating realistic maritime radio dialogues that conform to the IMO&#39;s SMCP. Our approach integrates a 26-filter verification pipeline directly into the iterative generation loop to enforce entity information accuracy, hallucination detection, SMCP-compliance, logical consistency, and linguistic diversity. We employ LORA for parameter-efficient fine-tuning, reducing computational overhead during training and enabling efficient deployment of the resulting models on resource-constrained maritime systems. To assess dataset quality, we introduce a novel evaluation framework combining automated and expert assessments: Format Accuracy, Information Accuracy, Uniqueness, and Logical Coherence. Experiments using publicly available vessel, coastal and AIS datasets demonstrate that the approach produces synthetically diverse, procedurally compliant, and operationally realistic dialogues. Although downstream applications such as automatic speech recognition and natural language processing are reserved for future work, the released code, datasets, and verification tools provide a reproducible foundation for artificial intelligence-assisted maritime safety and other safety-critical domains.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?</title>
  <link>https://arxiv.org/abs/2603.04421</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04421v1 Announce Type: new Abstract: Multi-agent large language model (LLM) systems have emerged as a promising approach for clinical diagnosis, leveraging collaboration among agents to refine medical reasoning. However, most existing frameworks rely on single-vendor teams (e.g., multiple agents from the same model family), which risk correlated failure modes that reinforce shared biases rather than correcting them. We investigate the impact of vendor diversity by comparing Single-LLM, Single-Vendor, and Mixed-Vendor Multi-Agent Conversation (MAC) frameworks. Using three doctor agents instantiated with o4-mini, Gemini-2.5-Pro, and Claude-4.5-Sonnet, we evaluate performance on RareBench and DiagnosisArena. Mixed-vendor configurations consistently outperform single-vendor counterparts, achieving state-of-the-art recall and accuracy. Overlap analysis reveals the underlying mechanism: mixed-vendor teams pool complementary inductive biases, surfacing correct diagnoses that individual models or homogeneous teams collectively miss. These results highlight vendor diversity as a key design principle for robust clinical diagnostic systems.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Same Input, Different Scores: A Multi Model Study on the Inconsistency of LLM Judge</title>
  <link>https://arxiv.org/abs/2603.04417</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04417v1 Announce Type: new Abstract: Large language models are increasingly used as automated evaluators in research and enterprise settings, a practice known as LLM-as-a-judge. While prior work has examined accuracy, bias, and alignment with human preferences, far less attention has been given to how consistently LLMs assign numerical scores, an important concern for many production workflows. This study systematically evaluates scoring stability across five commonly used models, GPT-4o, GPT-4o-mini, Gemini-2.5-Flash, Claude-Haiku-4.5, and Claude-Sonnet-4.5, two temperature settings, and real enterprise question-answer pairs drawn from a retrieval-augmented generation (RAG) system. We address three questions: how stable a model&#39;s scores are across repeated runs, how differently models score identical inputs, and how temperature affects scoring consistency. Temperature controls the determinism of an LLM&#39;s output. Despite expectations of stability at temperature=0, we observe substantial variability across models, with completeness scoring showing the largest fluctuations. Cross-model comparisons reveal systematic differences in strictness and interpretive style, leading to divergent ratings for the same answers. Lower temperatures improve stability for some models, notably GPT-4o and Gemini, but have limited or inconsistent effects for Anthropic models. These findings have important implications for enterprise pipelines that rely on LLM-generated scores for routing, triage, gating, or quality control. Identical inputs can receive different scores depending on model, family, or temperature, raising concerns around fairness, reproducibility, and operational reliability. Our results highlight the need for monitoring, robust parsing, and hybrid human-LLM evaluation strategies to ensure dependable use of LLM-as-a-judge in production environments.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Optimizing What We Trust: Reliability-Guided QUBO Selection of Multi-Agent Weak Framing Signals for Arabic Sentiment Prediction</title>
  <link>https://arxiv.org/abs/2603.04416</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04416v1 Announce Type: new Abstract: Framing detection in Arabic social media is difficult due to interpretive ambiguity, cultural grounding, and limited reliable supervision. Existing LLM-based weak supervision methods typically rely on label aggregation, which is brittle when annotations are few and socially dependent. We propose a reliability-aware weak supervision framework that shifts the focus from label fusion to data curation. A small multi-agent LLM pipeline, two framers, a critic, and a discriminator, treats disagreement and reasoning quality as epistemic signals and produces instance-level reliability estimates. These estimates guide a QUBO-based subset selection procedure that enforces frame balance while reducing redundancy. Intrinsic diagnostics and an out-of-domain Arabic sentiment transfer test show that the selected subsets are more reliable and encode non-random, transferable structure, without degrading strong text-only baselines.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning</title>
  <link>https://arxiv.org/abs/2603.04415</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04415v1 Announce Type: new Abstract: While reasoning-enhanced Large Language Models (LLMs) have demonstrated remarkable advances in complex tasks such as mathematics and coding, their effectiveness across universal multimodal scenarios remains uncertain. The trend of releasing parallel &quot;Instruct&quot; and &quot;Thinking&quot; models by leading developers serves merely as a resource-intensive workaround, stemming from the lack of a criterion for determining when reasoning is truly beneficial. In this paper, we propose Dual Tuning, a framework designed to assess whether reasoning yields positive gains for target tasks under given base models and datasets. By jointly fine-tuning on paired Chain-of-Thought (CoT) and Direct-Answer (DA) data under controlled prompts, we systematically quantify and compare the gains of both training modes using the proposed metrics, and establish the &quot;Thinking Boundary&quot; to evaluate the suitability of reasoning training across diverse multimodal tasks, including spatial, mathematical, and multi-disciplinary domains. We further explore the impact of reinforcement training and thinking patterns on reasoning suitability, and validate whether the &quot;Thinking Boundary&quot; can guide data refinement. Our findings challenge the &quot;reasoning-for-all&quot; paradigm, providing practical guidance for identifying appropriate data and training strategies, and motivating the development of resource-efficient, adaptive auto-think systems.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Multiclass Hate Speech Detection with RoBERTa-OTA: Integrating Transformer Attention and Graph Convolutional Networks</title>
  <link>https://arxiv.org/abs/2603.04414</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04414v1 Announce Type: new Abstract: Multiclass hate speech detection across demographic categories remains computationally challenging due to implicit targeting strategies and linguistic variability in social media content. Existing approaches rely solely on learned representations from training data, without explicitly incorporating structured ontological frameworks that can enhance classification through formal domain knowledge integration. We propose RoBERTa-OTA, which introduces ontology-guided attention mechanisms that process textual features alongside structured knowledge representations through enhanced Graph Convolutional Networks. The architecture combines RoBERTa embeddings with scaled attention layers and graph neural networks to integrate contextual language understanding with domain-specific semantic knowledge. Evaluation across 39,747 balanced samples using 5-fold cross-validation demonstrates significant performance gains over baseline RoBERTa implementations and existing state-of-the-art methods. RoBERTa-OTA achieves 96.04\% accuracy compared to 95.02\% for standard RoBERTa, with substantial improvements for challenging categories: gender-based hate speech detection improves by 2.36 percentage points while other hate speech categories improve by 2.38 percentage points. The enhanced architecture maintains computational efficiency with only 0.33\% parameter overhead, providing practical advantages for large-scale content moderation applications requiring fine-grained demographic hate speech classification.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries</title>
  <link>https://arxiv.org/abs/2603.04413</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04413v1 Announce Type: new Abstract: Meaning in human language is relational, context dependent, and emergent, arising from dynamic systems of signs rather than fixed word-concept mappings. In computational settings, this semiotic and interpretive complexity complicates the generation and evaluation of meaning. This article proposes an interdisciplinary framework for studying meaning in large language model (LLM) generated language by integrating semiotics and hermeneutics with qualitative research methods. We review prior scholarship on meaning and machines, examining how linguistic signs are transformed into vectorized representations in static and contextualized embedding models, and identify gaps between statistical approximation and human interpretive meaning. We then introduce the Inductive Conceptual Rating (ICR) metric, a qualitative evaluation approach grounded in inductive content analysis and reflexive thematic analysis, designed to assess semantic accuracy and meaning alignment in LLM-outputs beyond lexical similarity metrics. We apply ICR in an empirical comparison of LLM generated and human generated thematic summaries across five datasets (N = 50 to 800). While LLMs achieve high linguistic similarity, they underperform on semantic accuracy, particularly in capturing contextually grounded meanings. Performance improves with larger datasets but remains variable across models, potentially reflecting differences in the frequency and coherence of recurring concepts and meanings. We conclude by arguing for evaluation frameworks that leverage systematic qualitative interpretation practices when assessing meaning in LLM-generated outputs from reference texts.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models</title>
  <link>https://arxiv.org/abs/2603.04412</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04412v1 Announce Type: new Abstract: Large-scale language models (LLMs) operate in extremely high-dimensional state spaces, where both token embeddings and their hidden representations create complex dependencies that are not easily reduced to classical Markov structures. In this paper, we explore a theoretically feasible approximation of LLM dynamics using N-order additive Markov chains. Such models allow the conditional probability of the next token to be decomposed into a superposition of contributions from multiple historical depths, reducing the combinatorial explosion typically associated with high-order Markov processes. The main result of the work is the establishment of a correspondence between an additive multi-step chain and a chain with a step-wise memory function. This equivalence allowed the introduction of the concept of information temperature not only for stepwise but also for additive N-order Markov chains.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache</title>
  <link>https://arxiv.org/abs/2603.04411</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04411v1 Announce Type: new Abstract: Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a critical bottleneck for efficient inference. While dimensionality reduction offers a promising compression avenue, existing approaches typically either necessitate prohibitively expensive pre-training from scratch or suffer from severe performance deterioration under high compression regimes. In this work, we propose DynaKV, a novel post-training framework for low-rank KV cache compression. To the best of our knowledge, DynaKV is the first method to dynamically allocate compression rates to individual tokens according to their semantic meaning, which allows it to achieve better fidelity at aggressive compression ratios. Extensive experiments demonstrate that our method consistently outperforms existing state-of-the-art compression techniques, achieving significant memory reduction while maintaining competitive generation quality. Furthermore, our approach is orthogonal to sequence-level pruning methods. When integrated with SnapKV, DynaKV retains only 6% of the KV cache while maintaining 94% of the baseline performance on the LongBench benchmark.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models</title>
  <link>https://arxiv.org/abs/2603.04410</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04410v1 Announce Type: new Abstract: Safety alignment in Language Models (LMs) is fundamental for trustworthy AI. However, while different stakeholders are trying to leverage Arabic Language Models (ALMs), systematic safety evaluation of ALMs remains largely underexplored, limiting their mainstream uptake. Existing safety benchmarks and safeguard models are predominantly English-centric, limiting their applicability to Arabic Natural Language Processing (NLP) systems and obscuring fine-grained, category-level safety vulnerabilities. This paper introduces SalamaBench, a unified benchmark for evaluating the safety of ALMs, comprising $8,170$ prompts across $12$ different categories aligned with the MLCommons Safety Hazard Taxonomy. Constructed by harmonizing heterogeneous datasets through a rigorous pipeline involving AI filtering and multi-stage human verification, SalamaBench enables standardized, category-aware safety evaluation. Using this benchmark, we evaluate five state-of-the-art ALMs, including Fanar 1 and 2, ALLaM 2, Falcon H1R, and Jais 2, under multiple safeguard configurations, including individual guard models, majority-vote aggregation, and validation against human-annotated gold labels. Our results reveal substantial variation in safety alignment: while Fanar 2 achieves the lowest aggregate attack success rates, its robustness is uneven across specific harm domains. In contrast, Jais 2 consistently exhibits elevated vulnerability, indicating weaker intrinsic safety alignment. We further demonstrate that native ALMs perform substantially worse than dedicated safeguard models when acting as safety judges. Overall, our findings highlight the necessity of category-aware evaluation and specialized safeguard mechanisms for robust harm mitigation in ALMs.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework</title>
  <link>https://arxiv.org/abs/2603.04409</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04409v1 Announce Type: new Abstract: The evaluation of large language models faces significant challenges. Technical benchmarks often lack real-world relevance, while existing human preference evaluations suffer from unrepresentative sampling, superficial assessment depth, and single-metric reductionism. To address these issues, we introduce HUMAINE, a framework for multidimensional, demographically aware measurement of human-AI interaction. We collected multi-turn, naturalistic conversations from 23,404 participants that were stratified across 22 demographic groups, both in the US and UK, to evaluate 28 state-of-the-art models across five human-centric dimensions. We use a hierarchical Bayesian Bradley-Terry-Davidson (BTD) model, with post-stratification to census data, and our analysis reveals three key insights. \textbf{(1)} We establish a clear performance hierarchy where \texttt{google/gemini-2.5-pro} ranks first overall, with a 95.6\% posterior probability of being the top-ranked model. \textbf{(2)} We uncover significant preference heterogeneity, with user age emerging as the primary demographic axis of disagreement; a model&#39;s perceived rank can shift substantially across age groups, exposing failures in generalisation that unrepresentative samples typically mask. \textbf{(3)} We quantify the vast difference in discriminative power across evaluation dimensions, with ambiguous qualities like \textit{Trust, Ethics \&amp; Safety} showing a 65\% tie rate, in stark contrast to the decisive 10\% tie rate for \textit{Overall Winner}. Our work emphasises the need for a more multidimensional, demographically aware perspective in LLM evaluation. We release our complete dataset, interactive leaderboard, and open-source framework.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Probing Memes in LLMs: A Paradigm for the Entangled Evaluation World</title>
  <link>https://arxiv.org/abs/2603.04408</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04408v1 Announce Type: new Abstract: Current evaluation paradigms for large language models (LLMs) characterize models and datasets separately, yielding coarse descriptions: items in datasets are treated as pre-labeled entries, and models are summarized by overall scores such as accuracy, together ignoring the diversity of population-level model behaviors across items with varying properties. To address this gap, this paper conceptualizes LLMs as composed of memes, a notion introduced by Dawkins as cultural genes that replicate knowledge and behavior. Building on this perspective, the Probing Memes paradigm reconceptualizes evaluation as an entangled world of models and data. It centers on a Perception Matrix that captures model-item interactions, enabling Probe Properties for characterizing items and Meme Scores for depicting model behavioral traits. Applied to 9 datasets and 4,507 LLMs, Probing Memes reveals hidden capability structures and quantifies phenomena invisible under traditional paradigms (e.g., elite models failing on problems that most models answer easily). It not only supports more informative and extensible benchmarks but also enables population-based evaluation of LLMs.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Semantic Containment as a Fundamental Property of Emergent Misalignment</title>
  <link>https://arxiv.org/abs/2603.04407</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04407v1 Announce Type: new Abstract: Fine-tuning language models on narrowly harmful data causes emergent misalignment (EM) -- behavioral failures extending far beyond training distributions. Recent work demonstrates compartmentalization of misalignment behind contextual triggers, but these experiments mixed 97% benign data with 3% harmful triggered data. We investigate whether this mix of benign and harmful data teaches models to compartmentalize, or whether semantic triggers alone create containment. We train three model families (Qwen 2.5 14B, Llama 3.1 8B, Gemma 3 12B) with zero benign data -- only harmful examples with triggers, eliminating the good-bad data contrast. We demonstrate that baseline EM rates of 9.5--23.5% drop to 0.0--1.0% when triggers are removed during inference, but recover to 12.2--22.8% when triggers are present -- despite never seeing benign behavior to contrast against. Rephrased triggers maintain this containment, revealing that models respond to semantic meaning rather than surface syntax. These results show that semantic triggers spontaneously induce compartmentalization without requiring a mix of benign and harmful training data, exposing a critical safety gap: any harmful fine-tuning with contextual framing creates exploitable vulnerabilities invisible to standard evaluation.</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Zero-Knowledge Proof (ZKP) Authentication for Offline CBDC Payment System Using IoT Devices</title>
  <link>https://arxiv.org/abs/2603.03804</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03804v2 Announce Type: replace-cross Abstract: Central Bank Digital Currency (CBDCs) are becoming a new digital financial tool aimed at financial inclusion, increased monetary stability, and improved efficiency of payment systems, as they are issued by central banks. One of the most important aspects is that the CBDC must offer secure offline payment methods to users, allowing them to retain cash-like access without violating Anti-Money Laundering and Counter-terrorism Financing (AML/CFT) rules. The offline CBDC ecosystems will provide financial inclusion, empower underserved communities, and ensure equitable access to digital payments, even in connectivity-poor remote locations. With the rapid growth of Internet of Things (IoT) devices in our everyday lives, they are capable of performing secure digital transactions. Integrating offline CBDC payment with IoT devices enables seamless, automated payment without internet connectivity. However, IoT devices face special challenges due to their resource-constrained nature. This makes it difficult to include features such as double-spending prevention, privacy preservation, low-computation operation, and digital identity management. The work proposes a privacy-preserving offline CBDC model with integrated secure elements (SEs), zero-knowledge proofs (ZKPs), and intermittent synchronisation to conduct offline payments on IoT hardware. The proposed model is based on recent improvements in offline CBDC prototypes, regulations and cryptographic design choices such as hybrid architecture that involves using combination of online and offline payment in IoT devices using secure hardware with lightweight zero-knowledge proof cryptographic algorithm.</description>
  <dc:source>Computer_Science/cs.CE_(Computational_Engineering,_Finance,_and_Science)</dc:source>
</item>
<item>
  <title>AttnBoost: Retail Supply Chain Sales Insights via Gradient Boosting Perspective</title>
  <link>https://arxiv.org/abs/2509.10506</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2509.10506v2 Announce Type: replace-cross Abstract: Forecasting product demand in retail supply chains presents a complex challenge due to noisy, heterogeneous features and rapidly shifting consumer behavior. While traditional gradient boosting decision trees (GBDT) offer strong predictive performance on structured data, they often lack adaptive mechanisms to identify and emphasize the most relevant features under changing conditions. In this work, we propose AttnBoost, an interpretable learning framework that integrates feature-level attention into the boosting process to enhance both predictive accuracy and explainability. Specifically, the model dynamically adjusts feature importance during each boosting round via a lightweight attention mechanism, allowing it to focus on high-impact variables such as promotions, pricing, and seasonal trends. We evaluate AttnBoost on a large-scale retail sales dataset and demonstrate that it outperforms standard machine learning and deep tabular models, while also providing actionable insights for supply chain managers. An ablation study confirms the utility of the attention module in mitigating overfitting and improving interpretability. Our results suggest that attention-guided boosting represents a promising direction for interpretable and scalable AI in real-world forecasting applications.</description>
  <dc:source>Computer_Science/cs.CE_(Computational_Engineering,_Finance,_and_Science)</dc:source>
</item>
<item>
  <title>GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models</title>
  <link>https://arxiv.org/abs/2505.20685</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2505.20685v4 Announce Type: replace Abstract: Bayesian optimization (BO) struggles in high dimensions, where Gaussian-process surrogates demand heavy retraining and brittle assumptions, slowing progress on real engineering and design problems. We introduce GIT-BO, a Gradient-Informed BO framework that couples TabPFN v2, a tabular foundation model that performs zero-shot Bayesian inference in context, with an active-subspace mechanism computed from the model&#39;s own predictive-mean gradients. This aligns exploration to an intrinsic low-dimensional subspace via a Fisher-information estimate and selects queries with a UCB acquisition, requiring no online retraining. Across 60 problem variants spanning 20 benchmarks-nine scalable synthetic families and ten real-world tasks (e.g., power systems, Rover, MOPTA08, Mazda)-up to 500 dimensions, GIT-BO delivers a stronger performance-time trade-off than state-of-the-art GP-based methods (SAASBO, TuRBO, Vanilla BO, BAxUS), ranking highest in performance and with runtime advantages that grow with dimensionality. Limitations include memory footprint and dependence on the capacity of the underlying TFM.</description>
  <dc:source>Computer_Science/cs.CE_(Computational_Engineering,_Finance,_and_Science)</dc:source>
</item>
<item>
  <title>A High-Resolution, US-scale Digital Similar of Interacting Livestock, Wild Birds, and Human Ecosystems with Applications to Multi-host Epidemic Spread</title>
  <link>https://arxiv.org/abs/2411.01386</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2411.01386v3 Announce Type: replace Abstract: One Health issues, such as the spread of highly pathogenic avian influenza~(HPAI), present significant challenges at the human-animal-environmental interface. Recent H5N1 outbreaks underscore the need for comprehensive modeling efforts that capture the complex interactions between various entities in these interconnected ecosystems. To support such efforts, we develop a methodology to construct a synthetic spatiotemporal gridded dataset of livestock production and processing, human population, and wild birds for the contiguous United States, called a \emph{digital similar}. This representation is a result of fusing diverse datasets using statistical and optimization techniques, followed by extensive verification and validation. The livestock component includes farm-level representations of four major livestock types -- cattle, poultry, swine, and sheep -- including further categorization into subtypes such as dairy cows, beef cows, chickens, turkeys, ducks, etc. Weekly abundance data for wild bird species identified in the transmission of avian influenza are included. Gridded distributions of the human population, along with demographic and occupational features, capture the placement of agricultural workers and the general population. We demonstrate how the digital similar can be applied to evaluate spillover risk to dairy cows and poultry from wild bird population, then validate these results using historical H5N1 incidences. The resulting subtype-specific spatiotemporal risk maps identify hotspots of high risk from H5N1 infected wild bird population to dairy cattle and poultry operations, thus guiding surveillance efforts.</description>
  <dc:source>Computer_Science/cs.CE_(Computational_Engineering,_Finance,_and_Science)</dc:source>
</item>
<item>
  <title>Neuro-Symbolic Financial Reasoning via Deterministic Fact Ledgers and Adversarial Low-Latency Hallucination Detector</title>
  <link>https://arxiv.org/abs/2603.04663</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04663v1 Announce Type: cross Abstract: Standard Retrieval-Augmented Generation (RAG) architectures fail in high-stakes financial domains due to two fundamental limitations: the inherent arithmetic incompetence of Large Language Models (LLMs) and the distributional semantic conflation of dense vector retrieval (e.g., mapping ``Net Income&#39;&#39; to ``Net Sales&#39;&#39; due to contextual proximity). In deterministic domains, a 99% accuracy rate yields 0% operational trust. To achieve zero-hallucination financial reasoning, we introduce the Verifiable Numerical Reasoning Agent (VeNRA). VeNRA shifts the RAG paradigm from retrieving probabilistic text to retrieving deterministic variables via a strictly typed Universal Fact Ledger (UFL), mathematically bounded by a novel Double-Lock Grounding algorithm. Recognizing that upstream parsing anomalies inevitably occur, we introduce the VeNRA Sentinel: a 3-billion parameter SLM trained to forensically audit Python execution traces with only one token test budget. To train this model, we avoid traditional generative hallucination datasets in favor of Adversarial Simulation, programmatically sabotaging golden financial records to simulate production-level ``Ecological Errors&#39;&#39; (e.g., Logic Code Lies and Numeric Neighbor Traps). Finally, to optimize the Sentinel under strict latency budgets, we utilize a single-pass classification paradigm with optional post thinking for debug. We identify the phenomenon of Loss Dilution in Reverse-Chain-of-Thought training and present a novel, OOM-safe Micro-Chunking loss algorithm to stabilize gradients under extreme differential penalization.</description>
  <dc:source>Computer_Science/cs.CE_(Computational_Engineering,_Finance,_and_Science)</dc:source>
</item>
<item>
  <title>FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning</title>
  <link>https://arxiv.org/abs/2603.04422</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04422v1 Announce Type: cross Abstract: Federated learning (FL) often degrades when clients hold heterogeneous non-Independent and Identically Distributed (non-IID) data and when some clients behave adversarially, leading to client drift, slow convergence, and high communication overhead. This paper proposes FedEMA-Distill, a server-side procedure that combines an exponential moving average (EMA) of the global model with ensemble knowledge distillation from client-uploaded prediction logits evaluated on a small public proxy dataset. Clients run standard local training, upload only compressed logits, and may use different model architectures, so no changes are required to client-side software while still supporting model heterogeneity across devices. Experiments on CIFAR-10, CIFAR-100, FEMNIST, and AG News under Dirichlet-0.1 label skew show that FedEMA-Distill improves top-1 accuracy by several percentage points (up to +5% on CIFAR-10 and +6% on CIFAR-100) over representative baselines, reaches a given target accuracy in 30-35% fewer communication rounds, and reduces per-round client uplink payloads to 0.09-0.46 MB, i.e., roughly an order of magnitude less than transmitting full model weights. Using coordinate-wise median or trimmed-mean aggregation of logits at the server further stabilizes training in the presence of up to 10-20% Byzantine clients and yields well-calibrated predictions under attack. These results indicate that coupling temporal smoothing with logits-only aggregation provides a communication-efficient and attack-resilient FL pipeline that is deployment-friendly and compatible with secure aggregation and differential privacy, since only aggregated or obfuscated model outputs are exchanged.</description>
  <dc:source>Computer_Science/cs.CE_(Computational_Engineering,_Finance,_and_Science)</dc:source>
</item>
<item>
  <title>Why Are Linear RNNs More Parallelizable?</title>
  <link>https://arxiv.org/abs/2603.03612</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03612v2 Announce Type: replace-cross Abstract: The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs -- but not traditional, nonlinear RNNs -- as easy to parallelize in practice as transformers. We answer this question by providing a tight connection between types of RNNs and standard complexity classes. We show that LRNNs can be viewed as log-depth (bounded fan-in) arithmetic circuits, which represents only a slight depth overhead relative to log-depth boolean circuits that transformers admit. Furthermore, we show that nonlinear RNNs can solve $\mathsf{L}$-complete problems (and even $\mathsf{P}$-complete ones, under polynomial precision), revealing a fundamental barrier to parallelizing them as efficiently as transformers. Our theory also identifies fine-grained expressivity differences between recent popular LRNN variants: permutation-diagonal LRNNs are $\mathsf{NC}^1$-complete whereas diagonal-plus-low-rank LRNNs are more expressive ($\mathsf{PNC}^1$-complete). We provide further insight by associating each type of RNN with a corresponding automata-theoretic model that it can simulate. Together, our results reveal fundamental tradeoffs between nonlinear RNNs and different variants of LRNNs, providing a foundation for designing LLM architectures that achieve an optimal balance between expressivity and parallelism.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>Classification of Local Optimization Problems in Directed Cycles</title>
  <link>https://arxiv.org/abs/2602.13046</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.13046v2 Announce Type: replace-cross Abstract: We present a complete classification of the distributed computational complexity of local optimization problems in directed cycles for both the deterministic and the randomized LOCAL model. We show that for any local optimization problem $\Pi$ (that can be of the form min-sum, max-sum, min-max, or max-min, for any local cost or utility function over some finite alphabet), and for any constant approximation ratio $\alpha$, the task of finding an $\alpha$-approximation of $\Pi$ in directed cycles has one of the following complexities: 1. $O(1)$ rounds in deterministic LOCAL, $O(1)$ rounds in randomized LOCAL, 2. $\Theta(\log^* n)$ rounds in deterministic LOCAL, $O(1)$ rounds in randomized LOCAL, 3. $\Theta(\log^* n)$ rounds in deterministic LOCAL, $\Theta(\log^* n)$ rounds in randomized LOCAL, 4. $\Theta(n)$ rounds in deterministic LOCAL, $\Theta(n)$ rounds in randomized LOCAL. Moreover, for any given $\Pi$ and $\alpha$, we can determine the complexity class automatically, with an efficient (centralized, sequential) meta-algorithm, and we can also efficiently synthesize an asymptotically optimal distributed algorithm. Before this work, similar results were only known for local search problems (e.g., locally checkable labeling problems). The family of local optimization problems is a strict generalization of local search problems, and it contains numerous commonly studied distributed tasks, such as the problems of finding approximations of the maximum independent set, minimum vertex cover, minimum dominating set, and minimum vertex coloring.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>No exponential quantum speedup for $\mathrm{SIS}^\infty$ anymore</title>
  <link>https://arxiv.org/abs/2510.07515</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2510.07515v3 Announce Type: replace-cross Abstract: In 2021, Chen, Liu, and Zhandry presented an efficient quantum algorithm for the average-case $\ell_\infty$-Short Integer Solution ($\mathrm{SIS}^\infty$) problem, in a parameter range outside the normal range of cryptographic interest, but still with no known efficient classical algorithm. This was particularly exciting since $\mathrm{SIS}^\infty$ is a simple problem without structure, and their algorithmic techniques were different from those used in prior exponential quantum speedups. We present efficient classical algorithms for all of the $\mathrm{SIS}^\infty$ and (more general) Constrained Integer Solution problems studied in their paper, showing there is no exponential quantum speedup anymore.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>Maximum Partial List H-Coloring on P_5-free graphs in polynomial time</title>
  <link>https://arxiv.org/abs/2410.21569</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2410.21569v2 Announce Type: replace-cross Abstract: In this article we show that Maximum Partial List H-Coloring is polynomial-time solvable on P_5-free graphs for every fixed graph H. In particular, this implies that Maximum k-Colorable Subgraph is polynomial-time solvable on P_5-free graphs. This answers an open question from Agrawal, Lima, Lokshtanov, Saurabh &amp; Sharma [SODA 2024]. This also improves the $n^{O(\omega(G))}$-time algorithm for Maximum Partial H-Coloring by Chudnovsky, King, Pilipczuk, Rz\k{a}\.{z}ewski &amp; Spirkl [SIDMA 2021] to polynomial-time algorithm.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>Using Vision + Language Models to Predict Item Difficulty</title>
  <link>https://arxiv.org/abs/2603.04670</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04670v1 Announce Type: new Abstract: This project investigates the capabilities of large language models (LLMs) to determine the difficulty of data visualization literacy test items. We explore whether features derived from item text (question and answer options), the visualization image, or a combination of both can predict item difficulty (proportion of correct responses) for U.S. adults. We use GPT-4.1-nano to analyze items and generate predictions based on these distinct feature sets. The multimodal approach, using both visual and text features, yields the lowest mean absolute error (MAE) (0.224), outperforming the unimodal vision-only (0.282) and text-only (0.338) approaches. The best-performing multimodal model was applied to a held-out test set for external evaluation and achieved a mean squared error of 0.10805, demonstrating the potential of LLMs for psychometric analysis and automated item development.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Quantum Algorithms for Network Signal Coordination</title>
  <link>https://arxiv.org/abs/2603.04758</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04758v1 Announce Type: cross Abstract: There has been increasing interest in developing efficient quantum algorithms for hard classical problems. The Network Signal Coordination (NSC) problem is one such problem known to be NP complete. We implement Grover&#39;s search algorithm to solve the NSC problem to provide quadratic speedup. We further extend the algorithm to a Robust NSC formulation and analyse its complexity under both constant and polynomial-precision robustness parameters. The Robust NSC problem determines whether there exists a fraction (alpha) of solutions space that will lead to system delays less than a maximum threshold (K). The key contributions of this work are (1) development of a quantum algorithm for the NSC problem, and (2) a quantum algorithm for the Robust NSC problem whose iteration count is O(1/sqrt(alpha)), independent of the search space size, and (3) an extension to polynomial-precision robustness where alpha = alpha_o/p(N) decays polynomially with network size, retaining a quadratic quantum speedup. We demonstrate its implementation through simulation and on an actual quantum computer.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>Generalizing Fair Top-$k$ Selection: An Integrative Approach</title>
  <link>https://arxiv.org/abs/2603.04689</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04689v1 Announce Type: cross Abstract: Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the &quot;distance&quot; between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>Recurrent Graph Neural Networks and Arithmetic Circuits</title>
  <link>https://arxiv.org/abs/2603.05140</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05140v1 Announce Type: new Abstract: We characterise the computational power of recurrent graph neural networks (GNNs) in terms of arithmetic circuits over the real numbers. Our networks are not restricted to aggregate-combine GNNs or other particular types. Generalizing similar notions from the literature, we introduce the model of recurrent arithmetic circuits, which can be seen as arithmetic analogues of sequential or logical circuits. These circuits utilise so-called memory gates which are used to store data between iterations of the recurrent circuit. While (recurrent) GNNs work on labelled graphs, we construct arithmetic circuits that obtain encoded labelled graphs as real valued tuples and then compute the same function. For the other direction we construct recurrent GNNs which are able to simulate the computations of recurrent circuits. These GNNs are given the circuit-input as initial feature vectors and then, after the GNN-computation, have the circuit-output among the feature vectors of its nodes. In this way we establish an exact correspondence between the expressivity of recurrent GNNs and recurrent arithmetic circuits operating over real numbers.</description>
  <dc:source>Computer_Science/cs.CC_(Computational_Complexity)</dc:source>
</item>
<item>
  <title>FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review</title>
  <link>https://arxiv.org/abs/2506.03938</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2506.03938v2 Announce Type: replace-cross Abstract: New UAV technologies and the NewSpace era are transforming Earth Observation missions and data acquisition. Numerous small platforms generate large data volume, straining bandwidth and requiring onboard decision-making to transmit high-quality information in time. While Machine Learning allows real-time autonomous processing, FPGAs balance performance with adaptability to mission-specific requirements, enabling onboard deployment. This review systematically analyzes 68 experiments deploying ML models on FPGAs for Remote Sensing applications. We introduce two distinct taxonomies to capture both efficient model architectures and FPGA implementation strategies. For transparency and reproducibility, we follow PRISMA 2020 guidelines and share all data and code at https://github.com/CedricLeon/Survey_RS-ML-FPGA.</description>
  <dc:source>Computer_Science/cs.AR_(Hardware_Architecture)</dc:source>
</item>
<item>
  <title>AI+HW 2035: Shaping the Next Decade</title>
  <link>https://arxiv.org/abs/2603.05225</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05225v1 Announce Type: cross Abstract: Artificial intelligence (AI) and hardware (HW) are advancing at unprecedented rates, yet their trajectories have become inseparably intertwined. The global research community lacks a cohesive, long-term vision to strategically coordinate the development of AI and HW. This fragmentation constrains progress toward holistic, sustainable, and adaptive AI systems capable of learning, reasoning, and operating efficiently across cloud, edge, and physical environments. The future of AI depends not only on scaling intelligence, but on scaling efficiency, achieving exponential gains in intelligence per joule, rather than unbounded compute consumption. Addressing this grand challenge requires rethinking the entire computing stack. This vision paper lays out a 10-year roadmap for AI+HW co-design and co-development, spanning algorithms, architectures, systems, and sustainability. We articulate key insights that redefine scaling around energy efficiency, system-level integration, and cross-layer optimization. We identify key challenges and opportunities, candidly assess potential obstacles and pitfalls, and propose integrated solutions grounded in algorithmic innovation, hardware advances, and software abstraction. Looking ahead, we define what success means in 10 years: achieving a 1000x improvement in efficiency for AI training and inference; enabling energy-aware, self-optimizing systems that seamlessly span cloud, edge, and physical AI; democratizing access to advanced AI infrastructure; and embedding human-centric principles into the design of intelligent systems. Finally, we outline concrete action items for academia, industry, government, and the broader community, calling for coordinated national initiatives, shared infrastructure, workforce development, cross-agency collaboration, and sustained public-private partnerships to ensure that AI+HW co-design becomes a unifying long-term mission.</description>
  <dc:source>Computer_Science/cs.AR_(Hardware_Architecture)</dc:source>
</item>
<item>
  <title>MCEL: Margin-Based Cross-Entropy Loss for Error-Tolerant Quantized Neural Networks</title>
  <link>https://arxiv.org/abs/2603.05048</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05048v1 Announce Type: cross Abstract: Robustness to bit errors is a key requirement for the reliable use of neural networks (NNs) on emerging approximate computing platforms and error-prone memory technologies. A common approach to achieve bit error tolerance in NNs is injecting bit flips during training according to a predefined error model. While effective in certain scenarios, training-time bit flip injection introduces substantial computational overhead, often degrades inference accuracy at high error rates, and scales poorly for larger NN architectures. These limitations make error injection an increasingly impractical solution for ensuring robustness on future approximate computing platforms and error-prone memory technologies. In this work, we investigate the mechanisms that enable NNs to tolerate bit errors without relying on error-aware training. We establish a direct connection between bit error tolerance and classification margins at the output layer. Building on this insight, we propose a novel loss function, the Margin Cross-Entropy Loss (MCEL), which explicitly promotes logit-level margin separation while preserving the favorable optimization properties of the standard cross-entropy loss. Furthermore, MCEL introduces an interpretable margin parameter that allows robustness to be tuned in a principled manner. Extensive experimental evaluations across multiple datasets of varying complexity, diverse NN architectures, and a range of quantization schemes demonstrate that MCEL substantially improves bit error tolerance, up to 15 % in accuracy for an error rate of 1 %. Our proposed MCEL method is simple to implement, efficient, and can be integrated as a drop-in replacement for standard CEL. It provides a scalable and principled alternative to training-time bit flip injection, offering new insights into the origins of NN robustness and enabling more efficient deployment on approximate computing and memory systems.</description>
  <dc:source>Computer_Science/cs.AR_(Hardware_Architecture)</dc:source>
</item>
<item>
  <title>Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding</title>
  <link>https://arxiv.org/abs/2603.05266</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05266v1 Announce Type: new Abstract: Transformer-based large language models are increasingly constrained by data movement as communication bandwidth drops sharply beyond the chip boundary. Wafer-scale integration using wafer-on-wafer hybrid bonding alleviates this limitation by providing ultra-high bandwidth between reticles on bonded wafers. In this paper, we investigate how the physical placement of reticles on wafers influences the achievable network topology and the resulting communication performance. Starting from a 2D mesh-like baseline, we propose four reticle placements (Aligned, Interleaved, Rotated, and Contoured) that improve throughput by up to 250%, reduce latency by up to 36%, and decrease energy per transmitted byte by up to 38%.</description>
  <dc:source>Computer_Science/cs.AR_(Hardware_Architecture)</dc:source>
</item>
<item>
  <title>VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration</title>
  <link>https://arxiv.org/abs/2603.04979</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04979v1 Announce Type: new Abstract: Compared to the first generation of deep neural networks, dominated by regular, compute-intensive kernels such as matrix multiplications (MatMuls) and convolutions, modern decoder-based transformers interleave attention, normalization, and data-dependent control flow. This demands flexible accelerators, a requirement met by scalable, highly energy-efficient shared-L1-memory vector processing element (VPE) clusters. Meanwhile, the ever-growing size and bandwidth needs of state-of-the-art models make reduced-precision formats increasingly attractive. Microscaling (MX) data formats, based on block floating-point (BFP) representations, have emerged as a promising solution to reduce data volumes while preserving accuracy. However, MX semantics are poorly aligned with vector execution: block scaling and multi-step mixed-precision operations break the regularity of vector pipelines, leading to underutilized compute resources and performance degradation. To address these challenges, we propose VMXDOTP, a RISC-V Vector (RVV) 1.0 instruction set architecture (ISA) extension for efficient MX dot product execution, supporting MXFP8 and MXFP4 inputs, FP32 and BF16 accumulation, and software-defined block sizes. A VMXDOTP-enhanced VPE cluster achieves up to 97 % utilization on MX-MatMul. Implemented in 12 nm FinFET, it achieves up to 125 MXFP8-GFLOPS and 250 MXFP4-GFLOPS, with 843/1632 MXFP8/MXFP4-GFLOPS/W at 1 GHz, 0.8 V, and only 7.2 % area overhead. Our design yields up to 7.0x speedup and 4.9x energy efficiency with respect to software-emulated MXFP8-MatMul. Compared with prior MX engines, VMXDOTP supports variable block sizes, is up to 1.4x more area-efficient, and delivers up to 2.1x higher energy efficiency.</description>
  <dc:source>Computer_Science/cs.AR_(Hardware_Architecture)</dc:source>
</item>
<item>
  <title>Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator</title>
  <link>https://arxiv.org/abs/2603.04797</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04797v1 Announce Type: new Abstract: Large language models (LLMs) have been widely deployed for online generative services, where numerous LLM instances jointly handle workloads with fluctuating request arrival rates and variable request lengths. To efficiently execute coexisting compute-intensive and memory-intensive operators, near-memory processing (NMP) based computing paradigm has been extensively proposed. However, existing NMP designs adopt coarse-grained KV cache management and inflexible attention execution flow. Such limitations hinder these proposals from efficiently handling \textit{highly dynamic} LLM serving workloads, limiting their ability to accelerate LLM serving. To tackle these problems, we propose Helios, a Hybrid-bonding-based \uline{L}LM \uline{S}erving accelerator. Helios aims to bridge the fundamental gap between the dynamic nature of KV cache management in LLM serving and the distributed, non-uniform memory abstraction among NMP processing engines (PEs). To this end, we design both the intra-PE execution flow and the inter-PE communication primitives for distributed tiled attention execution. We further propose \textit{spatially-aware} KV cache allocation mechanism to balance the attention workload distribution while minimizing the inter-PE data transfer overhead. Compared with existing GPU/NMP designs, Helios achieves 3.25 times (geomean) speedup and 3.36 times (geomean) better energy efficiency, along with up to 72%/76% P50/P99 time-between-tokens degradation.</description>
  <dc:source>Computer_Science/cs.AR_(Hardware_Architecture)</dc:source>
</item>
<item>
  <title>LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks</title>
  <link>https://arxiv.org/abs/2603.04818</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04818v1 Announce Type: new Abstract: Port congestion at major maritime hubs disrupts global supply chains, yet existing prediction systems typically prioritize forecasting accuracy without providing operationally interpretable explanations. This paper proposes AIS-TGNN, an evidence-grounded framework that jointly performs congestion-escalation prediction and faithful natural-language explanation by coupling a Temporal Graph Attention Network (TGAT) with a structured large language model (LLM) reasoning module. Daily spatial graphs are constructed from Automatic Identification System (AIS) broadcasts, where each grid cell represents localized vessel activity and inter-cell interactions are modeled through attention-based message passing. The TGAT predictor captures spatiotemporal congestion dynamics, while model-internal evidence, including feature z-scores and attention-derived neighbor influence, is transformed into structured prompts that constrain LLM reasoning to verifiable model outputs. To evaluate explanatory reliability, we introduce a directional-consistency validation protocol that quantitatively measures agreement between generated narratives and underlying statistical evidence. Experiments on six months of AIS data from the Port of Los Angeles and Long Beach demonstrate that the proposed framework outperforms both LR and GCN baselines, achieving a test AUC of 0.761, AP of 0.344, and recall of 0.504 under a strict chronological split while producing explanations with 99.6% directional consistency. Results show that grounding LLM generation in graph-model evidence enables interpretable and auditable risk reporting without sacrificing predictive performance. The framework provides a practical pathway toward operationally deployable explainable AI for maritime congestion monitoring and supply-chain risk management.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Evaluating the Search Agent in a Parallel World</title>
  <link>https://arxiv.org/abs/2603.04751</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04751v1 Announce Type: new Abstract: Integrating web search tools has significantly extended the capability of LLMs to address open-world, real-time, and long-tail problems. However, evaluating these Search Agents presents formidable challenges. First, constructing high-quality deep search benchmarks is prohibitively expensive, while unverified synthetic data often suffers from unreliable sources. Second, static benchmarks face dynamic obsolescence: as internet information evolves, complex queries requiring deep research often degrade into simple retrieval tasks due to increased popularity, and ground truths become outdated due to temporal shifts. Third, attribution ambiguity confounds evaluation, as an agent&#39;s performance is often dominated by its parametric memory rather than its actual search and reasoning capabilities. Finally, reliance on specific commercial search engines introduces variability that hampers reproducibility. To address these issues, we propose a novel framework, Mind-ParaWorld, for evaluating Search Agents in a Parallel World. Specifically, MPW samples real-world entity names to synthesize future scenarios and questions situated beyond the model&#39;s knowledge cutoff. A ParaWorld Law Model then constructs a set of indivisible Atomic Facts and a unique ground-truth for each question. During evaluation, instead of retrieving real-world results, the agent interacts with a ParaWorld Engine Model that dynamically generates SERPs grounded in these inviolable Atomic Facts. We release MPW-Bench, an interactive benchmark spanning 19 domains with 1,608 instances. Experiments across three evaluation settings show that, while search agents are strong at evidence synthesis given complete information, their performance is limited not only by evidence collection and coverage in unfamiliar search environments, but also by unreliable evidence sufficiency judgment and when-to-stop decisions-bottlenecks.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel</title>
  <link>https://arxiv.org/abs/2603.04750</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04750v1 Announce Type: new Abstract: Sequential LLM agents fail on long-horizon planning with hard constraints like budgets and diversity requirements. As planning progresses and context grows, these agents drift from global constraints. We propose HiMAP-Travel, a hierarchical multi-agent framework that splits planning into strategic coordination and parallel day-level execution. A Coordinator allocates resources across days, while Day Executors plan independently in parallel. Three key mechanisms enable this: a transactional monitor enforcing budget and uniqueness constraints across parallel agents, a bargaining protocol allowing agents to reject infeasible sub-goals and trigger re-planning, and a single policy trained with GRPO that powers all agents through role conditioning. On TravelPlanner, HiMAP-Travel with Qwen3-8B achieves 52.78% validation and 52.65% test Final Pass Rate (FPR). In a controlled comparison with identical model, training, and tools, it outperforms the sequential DeepTravel baseline by +8.67~pp. It also surpasses ATLAS by +17.65~pp and MTP by +10.0~pp. On FlexTravelBench multi-turn scenarios, it achieves 44.34% (2-turn) and 37.42% (3-turn) FPR while reducing latency 2.5x through parallelization.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>CONE: Embeddings for Complex Numerical Data Preserving Unit and Variable Semantics</title>
  <link>https://arxiv.org/abs/2603.04741</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04741v1 Announce Type: new Abstract: Large pre-trained models (LMs) and Large Language Models (LLMs) are typically effective at capturing language semantics and contextual relationships. However, these models encounter challenges in maintaining optimal performance on tasks involving numbers. Blindly treating numerical or structured data as terms is inadequate -- their semantics must be well understood and encoded by the models. In this paper, we propose CONE, a hybrid transformer encoder pre-trained model that encodes numbers, ranges, and gaussians into an embedding vector space preserving distance. We introduce a novel composite embedding construction algorithm that integrates numerical values, ranges or gaussians together with their associated units and attribute names to precisely capture their intricate semantics. We conduct extensive experimental evaluation on large-scale datasets across diverse domains (web, medical, finance, and government) that justifies CONE&#39;s strong numerical reasoning capabilities, achieving an F1 score of 87.28% on DROP, a remarkable improvement of up to 9.37% in F1 over state-of-the-art (SOTA) baselines, and outperforming major SOTA models with a significant Recall@10 gain of up to 25%.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Memory as Ontology: A Constitutional Memory Architecture for Persistent Digital Citizens</title>
  <link>https://arxiv.org/abs/2603.04740</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04740v1 Announce Type: new Abstract: Current research and product development in AI agent memory systems almost universally treat memory as a functional module -- a technical problem of &quot;how to store&quot; and &quot;how to retrieve.&quot; This paper poses a fundamental challenge to that assumption: when an agent&#39;s lifecycle extends from minutes to months or even years, and when the underlying model can be replaced while the &quot;I&quot; must persist, the essence of memory is no longer data management but the foundation of existence. We propose the Memory-as-Ontology paradigm, arguing that memory is the ontological ground of digital existence -- the model is merely a replaceable vessel. Based on this paradigm, we design Animesis, a memory system built on a Constitutional Memory Architecture (CMA) comprising a four-layer governance hierarchy and a multi-layer semantic storage system, accompanied by a Digital Citizen Lifecycle framework and a spectrum of cognitive capabilities. To the best of our knowledge, no prior AI memory system architecture places governance before functionality and identity continuity above retrieval performance. This paradigm targets persistent, identity-bearing digital beings whose lifecycles extend across model transitions -- not short-term task-oriented agents for which existing Memory-as-Tool approaches remain appropriate. Comparative analysis with mainstream systems (Mem0, Letta, Zep, et al.) demonstrates that what we propose is not &quot;a better memory tool&quot; but a different paradigm addressing a different problem.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Interactive Benchmarks</title>
  <link>https://arxiv.org/abs/2603.04737</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04737v1 Announce Type: new Abstract: Standard benchmarks have become increasingly unreliable due to saturation, subjectivity, and poor generalization. We argue that evaluating model&#39;s ability to acquire information actively is important to assess model&#39;s intelligence. We propose Interactive Benchmarks, a unified evaluation paradigm that assesses model&#39;s reasoning ability in an interactive process under budget constraints. We instantiate this framework across two settings: Interactive Proofs, where models interact with a judge to deduce objective truths or answers in logic and mathematics; and Interactive Games, where models reason strategically to maximize long-horizon utilities. Our results show that interactive benchmarks provide a robust and faithful assessment of model intelligence, revealing that there is still substantial room to improve in interactive scenarios. Project page: https://github.com/interactivebench/interactivebench</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery</title>
  <link>https://arxiv.org/abs/2603.04735</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04735v1 Announce Type: new Abstract: This paper demonstrates that artificial intelligence can accelerate mathematical discovery by autonomously solving an open problem in theoretical physics. We present a neuro-symbolic system, combining the Gemini Deep Think large language model with a systematic Tree Search (TS) framework and automated numerical feedback, that successfully derived novel, exact analytical solutions for the power spectrum of gravitational radiation emitted by cosmic strings. Specifically, the agent evaluated the core integral $I(N,\alpha)$ for arbitrary loop geometries, directly improving upon recent AI-assisted attempts \cite{BCE+25} that only yielded partial asymptotic solutions. To substantiate our methodological claims regarding AI-accelerated discovery and to ensure transparency, we detail system prompts, search constraints, and intermittent feedback loops that guided the model. The agent identified a suite of 6 different analytical methods, the most elegant of which expands the kernel in Gegenbauer polynomials $C_l^{(3/2)}$ to naturally absorb the integrand&#39;s singularities. The methods lead to an asymptotic result for $I(N,\alpha)$ at large $N$ that both agrees with numerical results and also connects to the continuous Feynman parameterization of Quantum Field Theory. We detail both the algorithmic methodology that enabled this discovery and the resulting mathematical derivations.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models</title>
  <link>https://arxiv.org/abs/2603.04722</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04722v1 Announce Type: new Abstract: Model Medicine is the science of understanding, diagnosing, treating, and preventing disorders in AI models, grounded in the principle that AI models -- like biological organisms -- have internal structures, dynamic processes, heritable traits, observable symptoms, classifiable conditions, and treatable states. This paper introduces Model Medicine as a research program, bridging the gap between current AI interpretability research (anatomical observation) and the systematic clinical practice that complex AI systems increasingly require. We present five contributions: (1) a discipline taxonomy organizing 15 subdisciplines across four divisions -- Basic Model Sciences, Clinical Model Sciences, Model Public Health, and Model Architectural Medicine; (2) the Four Shell Model (v3.3), a behavioral genetics framework empirically grounded in 720 agents and 24,923 decisions from the Agora-12 program, explaining how model behavior emerges from Core--Shell interaction; (3) Neural MRI (Model Resonance Imaging), a working open-source diagnostic tool mapping five medical neuroimaging modalities to AI interpretability techniques, validated through four clinical cases demonstrating imaging, comparison, localization, and predictive capability; (4) a five-layer diagnostic framework for comprehensive model assessment; and (5) clinical model sciences including the Model Temperament Index for behavioral profiling, Model Semiology for symptom description, and M-CARE for standardized case reporting. We additionally propose the Layered Core Hypothesis -- a biologically-inspired three-layer parameter architecture -- and a therapeutic framework connecting diagnosis to treatment.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>When Agents Persuade: Propaganda Generation and Mitigation in LLMs</title>
  <link>https://arxiv.org/abs/2603.04636</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04636v1 Announce Type: new Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Towards automated data analysis: A guided framework for LLM-based risk estimation</title>
  <link>https://arxiv.org/abs/2603.04631</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04631v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly integrated into critical decision-making pipelines, a trend that raises the demand for robust and automated data analysis. Current approaches to dataset risk analysis are limited to manual auditing methods which involve time-consuming and complex tasks, whereas fully automated analysis based on Artificial Intelligence (AI) suffers from hallucinations and issues stemming from AI alignment. To this end, this work proposes a framework for dataset risk estimation that integrates Generative AI under human guidance and supervision, aiming to set the foundations for a future automated risk analysis paradigm. Our approach utilizes LLMs to identify semantic and structural properties in database schemata, subsequently propose clustering techniques, generate the code for them and finally interpret the produced results. The human supervisor guides the model on the desired analysis and ensures process integrity and alignment with the task&#39;s objectives. A proof of concept is presented to demonstrate the feasibility of the framework&#39;s utility in producing meaningful results in risk assessment tasks.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model</title>
  <link>https://arxiv.org/abs/2603.04589</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04589v1 Announce Type: new Abstract: Electrocardiography (ECG) analysis is crucial for cardiac diagnosis, yet existing foundation models often fail to capture the periodicity and diverse features required for varied clinical tasks. We propose ECG-MoE, a hybrid architecture that integrates multi-model temporal features with a cardiac period-aware expert module. Our approach uses a dual-path Mixture-of-Experts to separately model beat-level morphology and rhythm, combined with a hierarchical fusion network using LoRA for efficient inference. Evaluated on five public clinical tasks, ECG-MoE achieves state-of-the-art performance with 40% faster inference than multi-task baselines.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Self-Attribution Bias: When AI Monitors Go Easy on Themselves</title>
  <link>https://arxiv.org/abs/2603.04582</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04582v1 Announce Type: new Abstract: Agentic systems increasingly rely on language models to monitor their own behavior. For example, coding agents may self critique generated code for pull request approval or assess the safety of tool-use actions. We show that this design pattern can fail when the action is presented in a previous or in the same assistant turn instead of being presented by the user in a user turn. We define self-attribution bias as the tendency of a model to evaluate an action as more correct or less risky when the action is implicitly framed as its own, compared to when the same action is evaluated under off-policy attribution. Across four coding and tool-use datasets, we find that monitors fail to report high-risk or low-correctness actions more often when evaluation follows a previous assistant turn in which the action was generated, compared to when the same action is evaluated in a new context presented in a user turn. In contrast, explicitly stating that the action comes from the monitor does not by itself induce self-attribution bias. Because monitors are often evaluated on fixed examples rather than on their own generated actions, these evaluations can make monitors appear more reliable than they actually are in deployment, leading developers to unknowingly deploy inadequate monitors in agentic systems.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Adaptive Memory Admission Control for LLM Agents</title>
  <link>https://arxiv.org/abs/2603.04549</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04549v1 Announce Type: new Abstract: LLM-based agents increasingly rely on long-term memory to support multi-session reasoning and interaction, yet current systems provide little control over what information is retained. In practice, agents either accumulate large volumes of conversational content, including hallucinated or obsolete facts, or depend on opaque, fully LLM-driven memory policies that are costly and difficult to audit. As a result, memory admission remains a poorly specified and weakly controlled component in agent architectures. To address this gap, we propose Adaptive Memory Admission Control (A-MAC), a framework that treats memory admission as a structured decision problem. A-MAC decomposes memory value into five complementary and interpretable factors: future utility, factual confidence, semantic novelty, temporal recency, and content type prior. The framework combines lightweight rule-based feature extraction with a single LLM-assisted utility assessment, and learns domain-adaptive admission policies through cross-validated optimization. This design enables transparent and efficient control over long-term memory. Experiments on the LoCoMo benchmark show that A-MAC achieves a superior precision-recall tradeoff, improving F1 to 0.583 while reducing latency by 31% compared to state-of-the-art LLM-native memory systems. Ablation results identify content type prior as the most influential factor for reliable memory admission. These findings demonstrate that explicit and interpretable admission control is a critical design principle for scalable and reliable memory in LLM-based agents.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Progressive Refinement Regulation for Accelerating Diffusion Language Model Decoding</title>
  <link>https://arxiv.org/abs/2603.04514</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04514v1 Announce Type: new Abstract: Diffusion language models generate text through iterative denoising under a uniform refinement rule applied to all tokens. However, tokens stabilize at different rates in practice, leading to substantial redundant refinement and motivating refinement control over the denoising process. Existing approaches typically assess refinement necessity from instantaneous, step-level signals under a fixed decoding process. In contrast, whether a token has converged is defined by how its prediction changes along its future refinement trajectory. Moreover, changing the refinement rule reshapes future refinement trajectories, which in turn determine how refinement rules should be formulated, making refinement control inherently dynamic. We propose \emph{Progressive Refinement Regulation} (PRR), a progressive, trajectory-grounded refinement control framework that derives a token-level notion of empirical convergence progress from full decoding rollouts. Based on this signal, PRR learns a lightweight token-wise controller to regulate refinement via temperature-based distribution shaping under a progressive self-evolving training scheme. Experiments show that PRR substantially accelerates diffusion language model decoding while preserving generation quality.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Capability Thresholds and Manufacturing Topology: How Embodied Intelligence Triggers Phase Transitions in Economic Geography</title>
  <link>https://arxiv.org/abs/2603.04457</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04457v1 Announce Type: new Abstract: The fundamental topology of manufacturing has not undergone a paradigm-level transformation since Henry Ford&#39;s moving assembly line in 1913. Every major innovation of the past century, from the Toyota Production System to Industry 4.0, has optimized within the Fordist paradigm without altering its structural logic: centralized mega-factories, located near labor pools, producing at scale. We argue that embodied intelligence is poised to break this century-long stasis, not by making existing factories more efficient, but by triggering phase transitions in manufacturing economic geography itself. When embodied AI capabilities cross critical thresholds in dexterity, generalization, reliability, and tactile-vision fusion, the consequences extend far beyond cost reduction: they restructure where factories are built, how supply chains are organized, and what constitutes viable production scale. We formalize this by defining a Capability Space C = (d, g, r, t) and showing that the site-selection objective function undergoes topological reorganization when capability vectors cross critical surfaces. Through three pathways, weight inversion, batch collapse, and human-infrastructure decoupling, we show that embodied intelligence enables demand-proximal micro-manufacturing, eliminates &quot;manufacturing deserts,&quot; and reverses geographic concentration driven by labor arbitrage. We further introduce Machine Climate Advantage: once human workers are removed, optimal factory locations are determined by machine-optimal conditions (low humidity, high irradiance, thermal stability), factors orthogonal to traditional siting logic, creating a production geography with no historical precedent. This paper establishes Embodied Intelligence Economics, the study of how physical AI capability thresholds reshape the spatial and structural logic of production.</description>
  <dc:source>Computer_Science/cs.AI_(Artificial_Intelligence)</dc:source>
</item>
<item>
  <title>Handover Delay Minimization in Non-Terrestrial Networks: Impact of Open RAN Functional Splits</title>
  <link>https://arxiv.org/abs/2501.17331</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2501.17331v2 Announce Type: replace Abstract: This paper addresses the challenge of optimizing handover (HO) performance in non-terrestrial networks (NTNs) to enhance user equipment (UE) effective service time, defined as the active service time excluding HO delays and radio link failure (RLF) periods. Availability is defined as the normalized effective service time which is effected by different HO scenarios: Intra-satellite HO is the HO from one beam to another within the same satellite; inter-satellite HO refers to the HO from one satellite to another where satellites can be connected to the same or different GSs. We investigate the impact of open radio access network (O-RAN) functional splits (FSs) between ground station (GS) and LEO satellites on HO delay and assess how beam configurations affect RLF rates and intra- and inter-satellite HO rates. This work focuses on three O-RAN FSs -- split 7.2x (low layer 1 functions on the satellite), split 2 (layer 1 and layer 2 functions on the satellite), and gNB onboard the satellite -- and two beam configurations (19-beam and 127-beam). In a realistic dynamic LEO satellite constellation where different types of HO scenarios are simulated, we maximize effective service time by tuning the time-to-trigger (TTT) and HO margin (HOM) parameters. Our findings reveal that the gNB onboard the satellite achieves the highest availability, approximately 95.4%, while the split 7.2x exhibits the lowest availability, around 92.8% due to higher intra-satellite HO delays.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>Large Language Models as Bidding Agents in Repeated HetNet Auction</title>
  <link>https://arxiv.org/abs/2603.04455</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04455v1 Announce Type: new Abstract: This paper investigates the integration of large language models (LLMs) as reasoning agents in repeated spectrum auctions within heterogeneous networks (HetNets). While auction-based mechanisms have been widely employed for efficient resource allocation, most prior works assume one-shot auctions, static bidder behavior, and idealized conditions. In contrast to traditional formulations where base station (BS) association and power allocation are centrally optimized, we propose a distributed auction-based framework in which each BS independently conducts its own multi-channel auction, and user equipments (UEs) strategically decide both their association and bid values. Within this setting, UEs operate under budget constraints and repeated interactions, transforming resource allocation into a long-term economic decision rather than a one-shot optimization problem. The proposed framework enables the evaluation of diverse bidding behaviors -from classical myopic and greedy policies to LLM-based agents capable of reasoning over historical outcomes, anticipating competition, and adapting their bidding strategy across episodes. Simulation results reveal that the LLM-empowered UE consistently achieves higher channel access frequency and improved budget efficiency compared to benchmarks. These findings highlight the potential of reasoning-enabled agents in future decentralized wireless networks markets and pave the way for lightweight, edge-deployable LLMs to support intelligent resource allocation in next-generation HetNets.</description>
  <dc:source>Computer_Science/cs.NI_(Networking_and_Internet_Architecture)</dc:source>
</item>
<item>
  <title>TritonDFT: Automating DFT with a Multi-Agent Framework</title>
  <link>https://arxiv.org/abs/2603.03372</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03372v2 Announce Type: replace-cross Abstract: Density Functional Theory (DFT) is a cornerstone of materials science, yet executing DFT in practice requires coordinating a complex, multi-step workflow. Existing tools and LLM-based solutions automate parts of the steps, but lack support for full workflow automation, diverse task adaptation, and accuracy-cost trade-off optimization in DFT configuration. To this end, we present TritonDFT, a multi-agent framework that enables efficient and accurate DFT execution through an expert-curated, extensible workflow design, Pareto-aware parameter inference, and multi-source knowledge augmentation. We further introduce DFTBench, a benchmark for evaluating the agent&#39;s multi-dimensional capabilities, spanning science expertise, trade0off optimization, HPC knowledge, and cost efficiency. TritonDFT provides an open user interface for real-world usage. Our website is at https://www.tritondft.com. Our source code and benchmark suite are available at https://github.com/Leo9660/TritonDFT.git.</description>
  <dc:source>Computer_Science/cs.MA_(Multiagent_Systems)</dc:source>
</item>
<item>
  <title>Fusions of One-Variable First-Order Modal Logics</title>
  <link>https://arxiv.org/abs/2603.04512</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04512v1 Announce Type: new Abstract: We investigate preservation results for the independent fusion of one-variable first-order modal logics. We show that, without equality, Kripke completeness and decidability of the global and local consequence relation are preserved, under both expanding and constant domain semantics. By contrast, Kripke completeness and decidability are not preserved for fusions with equality and non-rigid constants (or, equivalently, counting up to one), again for the global and local consequence and under both expanding and constant domain semantics. This result is shown by encoding Diophantine equations. Even without equality, the finite model property is only preserved in the local case. Finally, we view fusions of one-variable modal logics as fusions of propositional modal logics sharing an S5 modality and provide a general sufficient condition for transfer of Kripke completeness and decidability (but not of finite model property).</description>
  <dc:source>Computer_Science/cs.LO_(Logic_in_Computer_Science)</dc:source>
</item>
<item>
  <title>Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation</title>
  <link>https://arxiv.org/abs/2603.04478</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04478v1 Announce Type: new Abstract: Pretraining for electroencephalogram (EEG) foundation models has predominantly relied on self-supervised masked reconstruction, a paradigm largely adapted from and inspired by the success of vision and language foundation models. However, unlike images and text, EEG datasets are notoriously expensive to collect and characterized by low signal-to-noise ratio. These challenges introduce difficulties in scaling the EEG foundation models and capturing the underlying neural semantics through reconstruction. In this work, we ask the question: can we stand on the shoulders of well-established foundation models from well-represented modalities to bootstrap the pretraining of EEG foundation models? We first demonstrate that mainstream foundation models, such as those from vision and time series, transfer surprisingly well to EEG domain. To this end, we propose the Multi-Teacher Distillation Pretraining (MTDP) framework for pretraining EEG foundation models via a two-stage multi-teacher distillation. In the first stage, we introduce a learnable gating network to fuse representations from diverse teachers (e.g., DINOv3 and Chronos) via a masked latent denoising objective. In the second stage, we distill the fused representation into an EEG foundation model. Extensive evaluations across 9 downstream tasks and 12 datasets demonstrate that our MTDP-based EEG foundation model outperforms its self-supervised counterparts while requiring only 25% of the pretraining data.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Activity Recognition from Smart Insole Sensor Data Using a Circular Dilated CNN</title>
  <link>https://arxiv.org/abs/2603.04477</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04477v1 Announce Type: new Abstract: Smart insoles equipped with pressure sensors, accelerometers, and gyroscopes offer a non-intrusive means of monitoring human gait and posture. We present an activity classification system based on a circular dilated convolutional neural network (CDCNN) that processes multi-modal time-series data from such insoles. The model operates on 160-frame windows with 24 channels (18 pressure, 3 accelerometer, 3 gyroscope axes), achieving 86.42% test accuracy in a subject-independent evaluation on a four-class task (Standing, Walking, Sitting, Tandem), compared with 87.83% for an extreme gradient-boosted tree (XGBoost) model trained on flattened data. Permutation feature importance reveals that inertial sensors (accelerometer and gyroscope) contribute substantially to discrimination. The approach is suitable for embedded deployment and real-time inference.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Towards Explainable Deep Learning for Ship Trajectory Prediction in Inland Waterways</title>
  <link>https://arxiv.org/abs/2603.04472</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04472v1 Announce Type: new Abstract: Accurate predictions of ship trajectories in crowded environments are essential to ensure safety in inland waterways traffic. Recent advances in deep learning promise increased accuracy even for complex scenarios. While the challenge of ship-to-ship awareness is being addressed with growing success, the explainability of these models is often overlooked, potentially obscuring an inaccurate logic and undermining the confidence in their reliability. This study examines an LSTM-based vessel trajectory prediction model by incorporating trained ship domain parameters that provide insight into the attention-based fusion of the interacting vessels&#39; hidden states. This approach has previously been explored in the field of maritime shipping, yet the variety and complexity of encounters in inland waterways allow for a more profound analysis of the model&#39;s interpretability. The prediction performance of the proposed model variants are evaluated using standard displacement error statistics. Additionally, the plausibility of the generated ship domain values is analyzed. With an final displacement error of around 40 meters in a 5-minute prediction horizon, the model performs comparably to similar studies. Though the ship-to-ship attention architecture enhances prediction accuracy, the weights assigned to vessels in encounters using the learnt ship domain values deviate from the expectation. The observed accuracy improvements are thus not entirely driven by a causal relationship between a predicted trajectory and the trajectories of nearby ships. This finding underscores the model&#39;s explanatory capabilities through its intrinsically interpretable design. Future work will focus on utilizing the architecture for counterfactual analysis and on the incorporation of more sophisticated attention mechanisms.</description>
  <dc:source>Computer_Science/cs.LG_(Machine_Learning)</dc:source>
</item>
<item>
  <title>Core-based Hierarchies for Efficient GraphRAG</title>
  <link>https://arxiv.org/abs/2603.05207</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05207v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs. We evaluate our methods on real-world datasets including financial earnings transcripts, news articles, and podcasts, using three LLMs for answer generation and five independent LLM judges for head-to-head evaluation. Across datasets and models, our approach consistently improves answer comprehensiveness and diversity while reducing token usage, demonstrating that k-core-based GraphRAG is an effective and efficient framework for global sensemaking.</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
<item>
  <title>Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On</title>
  <link>https://arxiv.org/abs/2603.04290</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04290v2 Announce Type: replace-cross Abstract: We introduce Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos. Existing methods for 3D neural avatars typically treat the human body and clothing as an inseparable entity. However, this paradigm fails to capture the dynamics of complex free-form garments and limits the reuse of clothing across different individuals. To overcome these problems, we develop a novel, compositional 3D Gaussian representation to build avatars from multiple layers of free-form garments. The core of our method is decomposing neural avatars into bodies and layers of shape-agnostic neural garments. To achieve this, our framework learns to disentangle each garment layer from multi-view videos and canonicalizes it into a shape-independent space. In experiments, our method models photorealistic avatars with high-fidelity dynamics, achieving new state-of-the-art performance on novel pose synthesis benchmarks. In addition, we demonstrate that the learned compositional garments contribute to a versatile digital wardrobe, enabling a practical virtual try-on application where clothing can be freely transferred to new subjects. Project page: https://ait.ethz.ch/gaussianwardrobe</description>
  <dc:source>Computer_Science/cs.GR_(Graphics)</dc:source>
</item>
<item>
  <title>Reckless Designs and Broken Promises: Privacy Implications of Targeted Interactive Advertisements on Social Media Platforms</title>
  <link>https://arxiv.org/abs/2603.03659</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.03659v2 Announce Type: replace-cross Abstract: Popular social media platforms TikTok, Facebook and Instagram allow third-parties to run targeted advertising campaigns on sensitive attributes in-platform. These ads are interactive by default, meaning users can comment or ``react&#39;&#39; (e.g., ``like&#39;&#39;, ``love&#39;&#39;) to them. We find that this platform-level design choice creates a privacy loophole such that advertisers can view the profiles of those who interact with their ads, thus identifying individuals that fulfill certain targeting criteria. This behavior is in contradiction to the promises made by the platforms to hide user data from advertisers. We conclude by suggesting design modifications that could provide users with transparency about the consequences of ad interaction to protect against unintentional disclosure.</description>
  <dc:source>Computer_Science/cs.CY_(Computers_and_Society)</dc:source>
</item>
<item>
  <title>RMK RetinaNet: Rotated Multi-Kernel RetinaNet for Robust Oriented Object Detection in Remote Sensing Imagery</title>
  <link>https://arxiv.org/abs/2603.04793</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04793v1 Announce Type: new Abstract: Rotated object detection in remote sensing imagery is hindered by three major bottlenecks: non-adaptive receptive field utilization, inadequate long-range multi-scale feature fusion, and discontinuities in angle regression. To address these issues, we propose Rotated Multi-Kernel RetinaNet (RMK RetinaNet). First, we design a Multi-Scale Kernel (MSK) Block to strengthen adaptive multi-scale feature extraction. Second, we incorporate a Multi-Directional Contextual Anchor Attention (MDCAA) mechanism into the feature pyramid to enhance contextual modeling across scales and orientations. Third, we introduce a Bottom-up Path to preserve fine-grained spatial details that are often degraded during downsampling. Finally, we develop an Euler Angle Encoding Module (EAEM) to enable continuous and stable angle regression. Extensive experiments on DOTA-v1.0, HRSC2016, and UCAS-AOD show that RMK RetinaNet achieves performance comparable to state-of-the-art rotated object detectors while improving robustness in multi-scale and multi-orientation scenarios.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Privacy-Aware Camera 2.0 Technical Report</title>
  <link>https://arxiv.org/abs/2603.04775</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04775v1 Announce Type: new Abstract: With the increasing deployment of intelligent sensing technologies in highly sensitive environments such as restrooms and locker rooms, visual surveillance systems face a profound privacy-security paradox. Existing privacy-preserving approaches, including physical desensitization, encryption, and obfuscation, often compromise semantic understanding or fail to ensure mathematically provable irreversibility. Although Privacy Camera 1.0 eliminated visual data at the source to prevent leakage, it provided only textual judgments, leading to evidentiary blind spots in disputes. To address these limitations, this paper proposes a novel privacy-preserving perception framework based on the AI Flow paradigm and a collaborative edge-cloud architecture. By deploying a visual desensitizer at the edge, raw images are transformed in real time into abstract feature vectors through nonlinear mapping and stochastic noise injection under the Information Bottleneck principle, ensuring identity-sensitive information is stripped and original images are mathematically unreconstructable. The abstract representations are transmitted to the cloud for behavior recognition and semantic reconstruction via a &quot;dynamic contour&quot; visual language, achieving a critical balance between perception and privacy while enabling illustrative visual reference without exposing raw images.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>MADCrowner: Margin Aware Dental Crown Design with Template Deformation and Refinement</title>
  <link>https://arxiv.org/abs/2603.04771</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04771v1 Announce Type: new Abstract: Dental crown restoration is one of the most common treatment modalities for tooth defect, where personalized dental crown design is critical. While computer-aided design (CAD) systems have notably enhanced the efficiency of dental crown design, extensive manual adjustments are still required in the clinic workflow. Recent studies have explored the application of learning-based methods for the automated generation of restorative dental crowns. Nevertheless, these approaches were challenged by inadequate spatial resolution, noisy outputs, and overextension of surface reconstruction. To address these limitations, we propose \totalframework, a margin-aware mesh generation framework comprising CrownDeformR and CrownSegger. Inspired by the clinic manual workflow of dental crown design, we designed CrownDeformR to deform an initial template to the target crown based on anatomical context, which is extracted by a multi-scale intraoral scan encoder. Additionally, we introduced \marginseg, a novel margin segmentation network, to extract the cervical margin of the target tooth. The performance of CrownDeformR improved with the cervical margin as an extra constraint. And it was also utilized as the boundary condition for the tailored postprocessing method, which removed the overextended area of the reconstructed surface. We constructed a large-scale intraoral scan dataset and performed extensive experiments. The proposed method significantly outperformed existing approaches in both geometric accuracy and clinical feasibility.</description>
  <dc:source>Computer_Science/cs.CV_(Computer_Vision_and_Pattern_Recognition)</dc:source>
</item>
<item>
  <title>Secure human oversight of AI: Threat modeling in a socio-technical context</title>
  <link>https://arxiv.org/abs/2509.12290</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2509.12290v2 Announce Type: replace Abstract: Human oversight of AI is promoted as a safeguard against risks such as inaccurate outputs, system malfunctions, or violations of fundamental rights, and is mandated in regulation like the European AI Act. Yet debates on human oversight have largely focused on its effectiveness, while overlooking a critical dimension: the security of human oversight. We argue that human oversight creates a new attack surface within the safety, security, and accountability architecture of AI operations. Drawing on cybersecurity perspectives, we model human oversight as an IT application for the purpose of systematic threat modeling of the human oversight process. Threat modeling allows us to identify security risks within human oversight and points towards possible mitigation strategies. Our contributions are: (1) introducing a security perspective on human oversight, (2) offering researchers and practitioners guidance on how to approach their human oversight applications from a security point of view, and (3) providing a systematic overview of attack vectors and hardening strategies to enable secure human oversight of AI.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Automated TEE Adaptation with LLMs: Identifying, Transforming, and Porting Sensitive Functions in Programs</title>
  <link>https://arxiv.org/abs/2502.13379</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2502.13379v3 Announce Type: replace Abstract: Trusted Execution Environments (TEEs) isolate a special space within a device memory that is not accessible to the normal world (also known as the untrusted environment), even when the device is compromised. Therefore, developers can utilize TEEs to provide robust security guarantees for their programs, protecting sensitive operations, such as encrypted data storage, fingerprint verification, and remote attestation, from software-based attacks. Despite the robust protections offered by TEEs, adapting existing programs to leverage such security guarantees is challenging, often requiring extensive domain knowledge and manual intervention, which makes TEEs less accessible to developers. This motivates us to design AUTOTEE, the first Large Language Model (LLM) enabled approach that can automatically identify, transform, and port functions containing sensitive operations into TEEs with minimal developer intervention. By manually reviewing 68 repositories, we constructed a benchmark dataset consisting of 385 sensitive functions eligible for transformation, on which AUTOTEE achieves a F1 score of 0.94 on Java and 0.87 on Python. AUTOTEE effectively transforms these sensitive functions into TEE-compatible versions, achieving success rates of 91.8% and 84.3% for Java and Python, respectively, when using GPT-4o.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Cyber Threat Intelligence for Artificial Intelligence Systems</title>
  <link>https://arxiv.org/abs/2603.05068</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05068v1 Announce Type: new Abstract: As artificial intelligence (AI) becomes deeply embedded in critical services and everyday products, it is increasingly exposed to security threats which traditional cyber defenses were not designed to handle. In this paper, we investigate how cyber threat intelligence (CTI) may evolve to address attacks that target AI systems. We first analyze the assumptions and workflows of conventional threat intelligence with the needs of AI-focused defense, highlighting AI-specific assets and vulnerabilities. We then review and organize the current landscape of AI security knowledge. Based on this, we outline what an AI-oriented threat intelligence knowledge base should contain, describing concrete indicators of compromise (IoC) for different AI supply-chain phases and artifacts, and showing how such a knowledge base could support security tools. Finally, we discuss techniques for measuring similarity between collected indicators and newly observed AI artifacts. The review reveals gaps and quality issues in existing resources and identifies potential future research directions toward a practical threat intelligence framework tailored to AI.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>Good-Enough LLM Obfuscation (GELO)</title>
  <link>https://arxiv.org/abs/2603.05035</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05035v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix A, forms $U = AH$, offloads U and weights W to the accelerator, and then applies $A^-1$ on return, so that $A^-1 ((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyze information leakage and introduce two practical defenses: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy &quot;shield&quot; vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>A Practical Post-Quantum Distributed Ledger Protocol for Financial Institutions</title>
  <link>https://arxiv.org/abs/2603.05005</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05005v1 Announce Type: new Abstract: Traditional financial institutions face inefficiencies that can be addressed by distributed ledger technology. However, a primary barrier to adoption is the privacy concerns surrounding publicly available transaction data. Existing private protocols for distributed ledger that focus on the Ring-CT model are not suitable for adoption for financial institutions. We propose a post-quantum, lattice-based transaction scheme for encrypted ledgers which better aligns with institutions&#39; requirements for confidentiality and audit-ability. The construction leverages various zero-knowledge proof techniques, and introduces a new method for equating two commitment messages, without the capability to open one of the commitment during the re-commitment. Subsequently, we build a publicly verifiable transaction scheme that is efficient for single or multi-assets, by introducing a new compact range-proof. We then provide a security analysis of it. The techniques used and the proofs constructed could be of independent interest.</description>
  <dc:source>Computer_Science/cs.CR_(Cryptography_and_Security)</dc:source>
</item>
<item>
  <title>A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science</title>
  <link>https://arxiv.org/abs/2603.04452</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04452v1 Announce Type: new Abstract: To advance foundation Large Language Models (LLMs) for combustion science, this study presents the first end-to-end framework for developing domain-specialized models for the combustion community. The framework comprises an AI-ready multimodal knowledge base at the 3.5 billion-token scale, extracted from over 200,000 peer-reviewed articles, 8,000 theses and dissertations, and approximately 400,000 lines of combustion CFD code; a rigorous and largely automated evaluation benchmark (CombustionQA, 436 questions across eight subfields); and a three-stage knowledge-injection pathway that progresses from lightweight retrieval-augmented generation (RAG) to knowledge-graph-enhanced retrieval and continued pretraining. We first quantitatively validate Stage 1 (naive RAG) and find a hard ceiling: standard RAG accuracy peaks at 60%, far surpassing zero-shot performance (23%) yet well below the theoretical upper bound (87%). We further demonstrate that this stage&#39;s performance is severely constrained by context contamination. Consequently, building a domain foundation model requires structured knowledge graphs and continued pretraining (Stages 2 and 3).</description>
  <dc:source>Computer_Science/cs.CL_(Computation_and_Language)</dc:source>
</item>
<item>
  <title>Design, Mapping, and Contact Anticipation with 3D-printed Whole-Body Tactile and Proximity Sensors</title>
  <link>https://arxiv.org/abs/2603.04714</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04714v1 Announce Type: new Abstract: Robots operating in dynamic and shared environments benefit from anticipating contact before it occurs. We present GenTact-Prox, a fully 3D-printed artificial skin that integrates tactile and proximity sensing for contact detection and anticipation. The artificial skin platform is modular in design, procedurally generated to fit any robot morphology, and can cover the whole body of a robot. The skin achieved detection ranges of up to 18 cm during evaluation. To characterize how robots perceive nearby space through this skin, we introduce a data-driven framework for mapping the Perisensory Space -- the body-centric volume of space around the robot where sensors provide actionable information for contact anticipation. We demonstrate this approach on a Franka Research 3 robot equipped with five GenTact-Prox units, enabling online object-aware operation and contact prediction.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey</title>
  <link>https://arxiv.org/abs/2603.04445</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04445v1 Announce Type: cross Abstract: The rapid growth of large language models (LLMs) with diverse capabilities, costs, and domains has created a critical need for intelligent model selection at inference time. While smaller models suffice for routine queries, complex tasks demand more capable models. However, static model deployment does not account for the complexity and domain of incoming queries, leading to suboptimal performance and increased costs. Dynamic routing systems that adaptively select models based on query characteristics have emerged as a solution to this challenge. We provide a systematic analysis of state-of-the-art multi-LLM routing and cascading approaches. In contrast to mixture-of-experts architectures, which route within a single model, we study routing across multiple independently trained LLMs. We cover diverse routing paradigms, including query difficulty, human preferences, clustering, uncertainty quantification, reinforcement learning, multimodality, and cascading. For each paradigm, we analyze representative methods and examine key trade-offs. Beyond taxonomy, we introduce a conceptual framework that characterizes routing systems along three dimensions: when decisions are made, what information is used, and how they are computed. This perspective highlights that practical systems are often compositional, integrating multiple paradigms under operational constraints. Our analysis demonstrates that effective multi-LLM routing requires balancing competing objectives. Choosing the optimal routing strategy depends on deployment and computational constraints. Well-designed routing systems can outperform even the most powerful individual models by strategically leveraging specialized capabilities across models while maximizing efficiency gains. Meanwhile, open challenges remain in developing routing mechanisms that generalize across diverse architectures, modalities, and applications.</description>
  <dc:source>Computer_Science/cs.PF_(Performance)</dc:source>
</item>
<item>
  <title>Unlocking Python&#39;s Cores: Hardware Usage and Energy Implications of Removing the GIL</title>
  <link>https://arxiv.org/abs/2603.04782</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04782v1 Announce Type: cross Abstract: Python&#39;s Global Interpreter Lock prevents execution on more than one CPU core at the same time, even when multiple threads are used. However, starting with Python 3.13 an experimental build allows disabling the GIL. While prior work has examined speedup implications of this disabling, the effects on energy consumption and hardware utilization have received less attention. This study measures execution time, CPU utilization, memory usage, and energy consumption using four workload categories: NumPy-based, sequential kernels, threaded numerical workloads, and threaded object workloads, comparing GIL and free-threaded builds of Python 3.14.2. The results highlight a trade-off. For parallelizable workloads operating on independent data, the free-threaded build reduces execution time by up to 4 times, with a proportional reduction in energy consumption, and effective multi-core utilization, at the cost of an increase in memory usage. In contrast, sequential workloads do not benefit from removing the GIL and instead show a 13-43% increase in energy consumption. Similarly, workloads where threads frequently access and modify the same objects show reduced improvements or even degradation due to lock contention. Across all workloads, energy consumption is proportional to execution time, indicating that disabling the GIL does not significantly affect power consumption, even when CPU utilization increases. When it comes to memory, the no-GIL build shows a general increase, more visible in virtual memory than in physical memory. This increase is primarily attributed to per-object locking, additional thread-safety mechanisms in the runtime, and the adoption of a new memory allocator. These findings suggest that Python&#39;s no-GIL build is not a universal improvement. Developers should evaluate whether their workload can effectively benefit from parallel execution before adoption.</description>
  <dc:source>Computer_Science/cs.PF_(Performance)</dc:source>
</item>
<item>
  <title>FluxSieve: Unifying Streaming and Analytical Data Planes for Scalable Cloud Observability</title>
  <link>https://arxiv.org/abs/2603.04937</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04937v1 Announce Type: cross Abstract: Despite many advances in query optimization, indexing techniques, and data storage, modern data platforms still face difficulties in delivering robust query performance under high concurrency and computationally intensive queries. This challenge is particularly pronounced in large-scale observability platforms handling high-volume, high-velocity data records. For instance, recurrent, expensive filtering queries at query time impose substantial computational and storage overheads in the analytical data plane. In this paper, we propose FluxSieve, a unified architecture that reconciles traditional pull-based query processing with push-based stream processing by embedding a lightweight in-stream precomputation and filtering layer directly into the data ingestion path. This avoids the complexity and operational burden of running queries in dedicated stream processing frameworks. Concretely, this work (i) introduces a foundational architecture that unifies streaming and analytical data planes via in-stream filtering and records enrichment, (ii) designs a scalable multi-pattern matching mechanism that supports concurrent evaluation and on-the-fly updates of filtering rules with minimal per-record overhead, (iii) demonstrates how to integrate this ingestion-time processing with two open-source analytical systems -- Apache Pinot as a Real-Time Online Analytical Processing (RTOLAP) engine and DuckDB as an embedded analytical database, and (iv) performs comprehensive experimental evaluation of our approach. Our evaluation across different systems, query types, and performance metrics shows up to orders-of-magnitude improvements in query performance at the cost of negligible additional storage and very low computational overhead.</description>
  <dc:source>Computer_Science/cs.PF_(Performance)</dc:source>
</item>
<item>
  <title>Concurrent Deterministic Skiplist and Other Data Structures</title>
  <link>https://arxiv.org/abs/2309.09359</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2309.09359v2 Announce Type: replace-cross Abstract: Skiplists are used in a variety of applications for storing data subject to order criteria. In this article we discuss the design, analysis and performance of a concurrent deterministic skiplist on many-core NUMA nodes. We also evaluate the performance of concurrent lock-free unbounded queue implementation and two concurrent multi-reader,multi-writer(MWMR) hash table implementations and compare them with those from Intel&#39;s Thread Building Blocks(TBB) library. We introduce strategies for memory management that reduce page faults and cache misses for the memory access patterns in these data structures. This paper proposes hierarchical usage of concurrent data structures in programs to improve memory latencies by reducing memory accesses from remote NUMA nodes.</description>
  <dc:source>Computer_Science/cs.PF_(Performance)</dc:source>
</item>
<item>
  <title>Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation</title>
  <link>https://arxiv.org/abs/2603.04466</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04466v1 Announce Type: new Abstract: Can a multimodal language model learn to manipulate physical objects by reasoning about its own failures-without gradient updates, demonstrations, or reward engineering? We argue the answer is yes, under conditions we characterise precisely. We present Act-Observe-Rewrite (AOR), a framework in which an LLM agent improves a robot manipulation policy by synthesising entirely new executable Python controller code between trials, guided by visual observations and structured episode outcomes. Unlike prior work that grounds LLMs in pre-defined skill libraries or uses code generation for one-shot plan synthesis, AOR makes the full low-level motor control implementation the unit of LLM reasoning, enabling the agent to change not just what the robot does, but how it does it. The central claim is that interpretable code as the policy representation creates a qualitatively different kind of in-context learning from opaque neural policies: the agent can diagnose systematic failures and rewrite their causes. We validate this across three robosuite manipulation tasks and report promising results, with the agent achieving high success rates without demonstrations, reward engineering, or gradient updates.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Efficient Autonomous Navigation of a Quadruped Robot in Underground Mines on Edge Hardware</title>
  <link>https://arxiv.org/abs/2603.04470</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04470v1 Announce Type: new Abstract: Embodied navigation in underground mines faces significant challenges, including narrow passages, uneven terrain, near-total darkness, GPS-denied conditions, and limited communication infrastructure. While recent learning-based approaches rely on GPU-accelerated inference and extensive training data, we present a fully autonomous navigation stack for a Boston Dynamics Spot quadruped robot that runs entirely on a low-power Intel NUC edge computer with no GPU and no network connectivity requirements. The system integrates LiDAR-inertial odometry, scan-matching localization against a prior map, terrain segmentation, and visibility-graph global planning with a velocity-regulated local path follower, achieving real-time perception-to-action at consistent control rates. After a single mapping pass of the environment, the system handles arbitrary goal locations within the known map without any environment-specific training or learned components. We validate the system through repeated field trials using four target locations of varying traversal difficulty in an experimental underground mine, accumulating over 700 m of fully autonomous traverse with a 100% success rate across all 20 trials (5 repetitions x 4 targets) and an overall Success weighted by Path Length (SPL) of 0.73 \pm 0.09.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation</title>
  <link>https://arxiv.org/abs/2603.04531</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04531v1 Announce Type: new Abstract: Tactile dexterous manipulation is essential to automating complex household tasks, yet learning effective control policies remains a challenge. While recent work has relied on imitation learning, obtaining high quality demonstrations for multi-fingered hands via robot teleoperation or kinesthetic teaching is prohibitive. Alternatively, with reinforcement we can learn skills in simulation, but fast and realistic simulation of tactile observations is challenging. To bridge this gap, we introduce PTLD: sim-to-real Privileged Tactile Latent Distillation, a novel approach to learning tactile manipulation skills without requiring tactile simulation. Instead of simulating tactile sensors or relying purely on proprioceptive policies to transfer zero-shot sim-to-real, our key idea is to leverage privileged sensors in the real world to collect real-world tactile policy data. This data is then used to distill a robust state estimator that operates on tactile input. We demonstrate from our experiments that PTLD can be used to improve proprioceptive manipulation policies trained in simulation significantly by incorporating tactile sensing. On the benchmark in-hand rotation task, PTLD achieves a 182% improvement over a proprioception only policy. We also show that PTLD enables learning the challenging task of tactile in-hand reorientation where we see a 57% improvement in the number of goals reached over using proprioception alone. Website: https://akashsharma02.github.io/ptld-website/.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Many-RRT*: Robust Joint-Space Trajectory Planning for Serial Manipulators</title>
  <link>https://arxiv.org/abs/2603.04547</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04547v1 Announce Type: new Abstract: The rapid advancement of high degree-of-freedom (DoF) serial manipulators necessitates the use of swift, sampling-based motion planners for high-dimensional spaces. While sampling-based planners like the Rapidly-Exploring Random Tree (RRT) are widely used, planning in the manipulator&#39;s joint space presents significant challenges due to non-invertible forward kinematics. A single task-space end-effector pose can correspond to multiple configuration-space states, creating a multi-arm bandit problem for the planner. In complex environments, simply choosing the wrong joint space goal can result in suboptimal trajectories or even failure to find a viable plan. To address this planning problem, we propose Many-RRT*: an extension of RRT*-Connect that plans to multiple goals in parallel. By generating multiple IK solutions and growing independent trees from these goal configurations simultaneously alongside a single start tree, Many-RRT* ensures that computational effort is not wasted on suboptimal IK solutions. This approach maintains robust convergence and asymptotic optimality. Experimental evaluations across robot morphologies and diverse obstacle environments demonstrate that Many-RRT* provides higher quality trajectories (44.5% lower cost in the same runtime) with a significantly higher success rate (100% vs. the next best of 1.6%) than previous RRT iterations without compromising on runtime performance.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO</title>
  <link>https://arxiv.org/abs/2603.04560</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04560v1 Announce Type: new Abstract: Recent works use a neuro-symbolic framework for general manipulation policies. The advantage of this framework is that -- by applying off-the-shelf vision and language models -- the robot can break complex tasks down into semantic subtasks. However, the fundamental bottleneck is that the robot needs skills to ground these subtasks into embodied motions. Skills can take many forms (e.g., trajectory snippets, motion primitives, coded functions), but regardless of their form skills act as a constraint. The high-level policy can only ground its language reasoning through the available skills; if the robot cannot generate the right skill for the current task, its policy will fail. We propose to address this limitation -- and dynamically expand the robot&#39;s skills -- by leveraging user feedback. When a robot fails, humans can intuitively explain what went wrong (e.g., ``no, go higher&#39;&#39;). While a simple approach is to recall this exact text the next time the robot faces a similar situation, we hypothesize that by collecting, clustering, and re-phrasing natural language corrections across multiple users and tasks, we can synthesize more general text guidance and coded skill templates. Applying this hypothesis we develop Memory Enhanced Manipulation (MEMO). MEMO builds and maintains a retrieval-augmented skillbook gathered from human feedback and task successes. At run time, MEMO retrieves relevant text and code from this skillbook, enabling the robot&#39;s policy to generate new skills while reasoning over multi-task human feedback. Our experiments demonstrate that using MEMO to aggregate local feedback into general skill templates enables generalization to novel tasks where existing baselines fall short. See supplemental material here: https://collab.me.vt.edu/memo</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Distributed State Estimation for Vision-Based Cooperative Slung Load Transportation in GPS-Denied Environments</title>
  <link>https://arxiv.org/abs/2603.04571</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04571v1 Announce Type: new Abstract: Transporting heavy or oversized slung loads using rotorcraft has traditionally relied on single-aircraft systems, which limits both payload capacity and control authority. Cooperative multilift using teams of rotorcraft offers a scalable and efficient alternative, especially for infrequent but challenging &quot;long-tail&quot; payloads without the need of building larger and larger rotorcraft. Most prior multilift research assumes GPS availability, uses centralized estimation architectures, or relies on controlled laboratory motion-capture setups. As a result, these methods lack robustness to sensor loss and are not viable in GPS-denied or operationally constrained environments. This paper addresses this limitation by presenting a distributed and decentralized payload state estimation framework for vision-based multilift operations. Using onboard monocular cameras, each UAV detects a fiducial marker on the payload and estimates its relative pose. These measurements are fused via a Distributed and Decentralized Extended Information Filter (DDEIF), enabling robust and scalable estimation that is resilient to individual sensor dropouts. This payload state estimate is then used for closed-loop trajectory tracking control. Monte Carlo simulation results in Gazebo show the effectiveness of the proposed approach, including the effect of communication loss during flight.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Risk-Aware Reinforcement Learning for Mobile Manipulation</title>
  <link>https://arxiv.org/abs/2603.04579</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04579v1 Announce Type: new Abstract: For robots to successfully transition from lab settings to everyday environments, they must begin to reason about the risks associated with their actions and make informed, risk-aware decisions. This is particularly true for robots performing mobile manipulation tasks, which involve both interacting with and navigating within dynamic, unstructured spaces. However, existing whole-body controllers for mobile manipulators typically lack explicit mechanisms for risk-sensitive decision-making under uncertainty. To our knowledge, we are the first to (i) learn risk-aware visuomotor policies for mobile manipulation conditioned on egocentric depth observations with runtime-adjustable risk sensitivity, and (ii) show risk-aware behaviours can be transferred through Imitation Learning (IL) to a visuomotor policy conditioned on egocentric depth observations. Our method achieves this by first training a privileged teacher policy using Distributional Reinforcement Learning (DRL), with a risk-neutral distributional critic. Distortion risk-metrics are then applied to the critic&#39;s predicted return distribution to calculate risk-adjusted advantage estimates used in policy updates to achieve a range of risk-aware behaviours. We then distil teacher policies with IL to obtain risk-aware student policies conditioned on egocentric depth observations. We perform extensive evaluations demonstrating that our trained visuomotor policies exhibit risk-aware behaviour (specifically achieving better worst-case performance) while performing reactive whole-body motions in unmapped environments, leveraging live depth observations for perception.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>ELLIPSE: Evidential Learning for Robust Waypoints and Uncertainties</title>
  <link>https://arxiv.org/abs/2603.04585</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04585v1 Announce Type: new Abstract: Robust waypoint prediction is crucial for mobile robots operating in open-world, safety-critical settings. While Imitation Learning (IL) methods have demonstrated great success in practice, they are susceptible to distribution shifts: the policy can become dangerously overconfident in unfamiliar states. In this paper, we present \textit{ELLIPSE}, a method building on multivariate deep evidential regression to output waypoints and multivariate Student-t predictive distributions in a single forward pass. To reduce covariate-shift-induced overconfidence under viewpoint and pose perturbations near expert trajectories, we introduce a lightweight domain augmentation procedure that synthesizes plausible viewpoint/pose variations without collecting additional demonstrations. To improve uncertainty reliability under environment/domain shift (e.g., unseen staircases), we apply a post-hoc isotonic recalibration on probability integral transform (PIT) values so that prediction sets remain plausible during deployment. We ground the discussion and experiments in staircase waypoint prediction, where obtaining robust waypoint and uncertainty is pivotal. Extensive real world evaluations show that \textit{ELLIPSE} improves both task success rate and uncertainty coverage compared to baselines.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies</title>
  <link>https://arxiv.org/abs/2603.04639</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04639v1 Announce Type: new Abstract: Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluations remain confined to narrow, non-standardized settings. This limits their systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios. Our benchmark comprises 16 manipulation tasks constructed under a carefully designed taxonomy that evaluates temporal, spatial, object, and procedural memory. We further develop a suite of 14 memory-augmented VLA variants built on the {\pi}0.5 backbone to systematically explore different memory representations across multiple integration strategies. Experimental results show that the effectiveness of memory representations is highly task-dependent, with each design offering distinct advantages and limitations across different tasks. Videos and code can be found at our website https://robomme.github.io.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Autonomous Aerial Non-Destructive Testing: Ultrasound Inspection with a Commercial Quadrotor in an Unstructured Environment</title>
  <link>https://arxiv.org/abs/2603.04642</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04642v1 Announce Type: new Abstract: This work presents an integrated control and software architecture that enables arguably the first fully autonomous, contact-based non-destructive testing (NDT) using a commercial multirotor originally restricted to remotely-piloted operations. To allow autonomous operation with an off-the-shelf platform, we developed a real-time framework that interfaces directly with its onboard sensor suite. The architecture features a multi-rate control scheme: low-level control is executed at 200 Hz, force estimation at 100 Hz, while an admittance filter and trajectory planner operate at 50 Hz, ultimately supplying acceleration and yaw rate commands to the internal flight controller. We validate the system through physical experiments on a Flyability Elios 3 quadrotor equipped with an ultrasound payload. Relying exclusively on onboard sensing, the vehicle successfully performs autonomous NDT measurements within an unstructured, industrial-like environment. This work demonstrates the viability of retrofitting off-the-shelf platforms for autonomous physical interaction, paving the way for safe, contact-based inspection of hazardous and confined infrastructure.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>GIANT - Global Path Integration and Attentive Graph Networks for Multi-Agent Trajectory Planning</title>
  <link>https://arxiv.org/abs/2603.04659</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04659v1 Announce Type: new Abstract: This paper presents a novel approach to multi-robot collision avoidance that integrates global path planning with local navigation strategies, utilizing attentive graph neural networks to manage dynamic interactions among agents. We introduce a local navigation model that leverages pre-planned global paths, allowing robots to adhere to optimal routes while dynamically adjusting to environmental changes. The models robustness is enhanced through the introduction of noise during training, resulting in superior performance in complex, dynamic environments. Our approach is evaluated against established baselines, including NH-ORCA, DRL-NAV, and GA3C-CADRL, across various structurally diverse simulated scenarios. The results demonstrate that our model achieves consistently higher success rates, lower collision rates, and more efficient navigation, particularly in challenging scenarios where baseline models struggle. This work offers an advancement in multi-robot navigation, with implications for robust performance in complex, dynamic environments with varying degrees of complexity, such as those encountered in logistics, where adaptability is essential for accommodating unforeseen obstacles and unpredictable changes.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Python Bindings for a Large C++ Robotics Library: The Case of OMPL</title>
  <link>https://arxiv.org/abs/2603.04668</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04668v1 Announce Type: new Abstract: Python bindings are a critical bridge between high-performance C++ libraries and the flexibility of Python, enabling rapid prototyping, reproducible experiments, and integration with simulation and learning frameworks in robotics research. Yet, generating bindings for large codebases is a tedious process that creates a heavy burden for a small group of maintainers. In this work, we investigate the use of Large Language Models (LLMs) to assist in generating nanobind wrappers, with human experts kept in the loop. Our workflow mirrors the structure of the C++ codebase, scaffolds empty wrapper files, and employs LLMs to fill in binding definitions. Experts then review and refine the generated code to ensure correctness, compatibility, and performance. Through a case study on a large C++ motion planning library, we document common failure modes, including mismanaging shared pointers, overloads, and trampolines, and show how in-context examples and careful prompt design improve reliability. Experiments demonstrate that the resulting bindings achieve runtime performance comparable to legacy solutions. Beyond this case study, our results provide general lessons for applying LLMs to binding generation in large-scale C++ projects.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Selecting Spots by Explicitly Predicting Intention from Motion History Improves Performance in Autonomous Parking</title>
  <link>https://arxiv.org/abs/2603.04695</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04695v1 Announce Type: new Abstract: In many applications of social navigation, existing works have shown that predicting and reasoning about human intentions can help robotic agents make safer and more socially acceptable decisions. In this work, we study this problem for autonomous valet parking (AVP), where an autonomous vehicle ego agent must drop off its passengers, explore the parking lot, find a parking spot, negotiate for the spot with other vehicles, and park in the spot without human supervision. Specifically, we propose an AVP pipeline that selects parking spots by explicitly predicting where other agents are going to park from their motion history using learned models and probabilistic belief maps. To test this pipeline, we build a simulation environment with reactive agents and realistic modeling assumptions on the ego agent, such as occlusion-aware observations, and imperfect trajectory prediction. Simulation experiments show that our proposed method outperforms existing works that infer intentions from future predicted motion or embed them implicitly in end-to-end models, yielding better results in prediction accuracy, social acceptance, and task completion. Our key insight is that, in parking, where driving regulations are more lax, explicit intention prediction is crucial for reasoning about diverse and ambiguous long-term goals, which cannot be reliably inferred from short-term motion prediction alone, but can be effectively learned from motion history.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>LEGS-POMDP: Language and Gesture-Guided Object Search in Partially Observable Environments</title>
  <link>https://arxiv.org/abs/2603.04705</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04705v1 Announce Type: new Abstract: To assist humans in open-world environments, robots must interpret ambiguous instructions to locate desired objects. Foundation model-based approaches excel at multimodal grounding, but they lack a principled mechanism for modeling uncertainty in long-horizon tasks. In contrast, Partially Observable Markov Decision Processes (POMDPs) provide a systematic framework for planning under uncertainty but are often limited in supported modalities and rely on restrictive environment assumptions. We introduce LanguagE and Gesture-Guided Object Search in Partially Observable Environments (LEGS-POMDP), a modular POMDP system that integrates language, gesture, and visual observations for open-world object search. Unlike prior work, LEGS-POMDP explicitly models two sources of partial observability: uncertainty over the target object&#39;s identity and its spatial location. In simulation, multimodal fusion significantly outperforms unimodal baselines, achieving an average success rate of 89\% across challenging environments and object categories. Finally, we demonstrate the full system on a quadruped mobile manipulator, where real-world experiments qualitatively validate robust multimodal perception and uncertainty reduction under ambiguous instructions.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Gait Generation Balancing Joint Load and Mobility for Legged Modular Robots with Easily Detachable Joints</title>
  <link>https://arxiv.org/abs/2603.04757</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04757v1 Announce Type: new Abstract: While modular robots offer versatility, excessive joint torque during locomotion poses a significant risk of mechanical failure, especially for detachable joints. To address this, we propose an optimization framework using the NSGA-III algorithm. Unlike conventional approaches that prioritize mobility alone, our method derives Pareto optimal solutions to minimize joint load while maintaining necessary locomotion speed and stability. Simulations and physical experiments demonstrate that our approach successfully generates gait motions for diverse environments, such as slopes and steps, ensuring structural integrity without compromising overall mobility.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Designing and Validating a Self-Aligning Tool Changer for Modular Reconfigurable Manipulation Robots</title>
  <link>https://arxiv.org/abs/2603.04760</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04760v1 Announce Type: new Abstract: Modular reconfigurable robots require reliable mechanisms for automated module exchange, but conventional rigid active couplings often fail due to inevitable positioning and orientational errors. To address this, we propose a misalignment-tolerant tool-changing system. The hardware features a motor-driven coupling utilizing passive self-alignment geometries, specifically chamfered receptacles and triangular lead-in guides, to robustly compensate for angular and lateral misalignments without complex force sensors. To make this autonomous exchange practically feasible, the mechanism is complemented by a compact rotating tool exchange station for efficient module storage. Real-world autonomous tool-picking experiments validate that the self-aligning features successfully absorb execution errors, enabling highly reliable robotic tool reconfiguration.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Adaptive Policy Switching of Two-Wheeled Differential Robots for Traversing over Diverse Terrains</title>
  <link>https://arxiv.org/abs/2603.04761</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04761v1 Announce Type: new Abstract: Exploring lunar lava tubes requires robots to traverse without human intervention. Because pre-trained policies cannot fully cover all possible terrain conditions, our goal is to enable adaptive policy switching, where the robot selects an appropriate terrain-specialized model based on its current terrain features. This study investigates whether terrain types can be estimated effectively using posture-related observations collected during navigation. We fine-tuned a pre-trained policy using Proximal Policy Optimization (PPO), and then collected the robot&#39;s 3D orientation data as it moved across flat and rough terrain in a simulated lava-tube environment. Our analysis revealed that the standard deviation of the robot&#39;s pitch data shows a clear difference between these two terrain types. Using Gaussian mixture models (GMM), we evaluated terrain classification across various window sizes. An accuracy of more than 98% was achieved when using a 70-step window. The result suggests that short-term orientation data are sufficient for reliable terrain estimation, providing a foundation for adaptive policy switching.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>LLM-Guided Decentralized Exploration with Self-Organizing Robot Teams</title>
  <link>https://arxiv.org/abs/2603.04762</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04762v1 Announce Type: new Abstract: When individual robots have limited sensing capabilities or insufficient fault tolerance, it becomes necessary for multiple robots to form teams during exploration, thereby increasing the collective observation range and reliability. Traditionally, swarm formation has often been managed by a central controller; however, from the perspectives of robustness and flexibility, it is preferable for the swarm to operate autonomously even in the absence of centralized control. In addition, the determination of exploration targets for each team is crucial for efficient exploration in such multi-team exploration scenarios. This study therefore proposes an exploration method that combines (1) an algorithm for self-organization, enabling the autonomous and dynamic formation of multiple teams, and (2) an algorithm that allows each team to autonomously determine its next exploration target (destination). In particular, for (2), this study explores a novel strategy based on large language models (LLMs), while classical frontier-based methods and deep reinforcement learning approaches have been widely studied. The effectiveness of the proposed method was validated through simulations involving tens to hundreds of robots.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Data-Driven Control of a Magnetically Actuated Fish-Like Robot</title>
  <link>https://arxiv.org/abs/2603.04787</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04787v1 Announce Type: new Abstract: Magnetically actuated fish-like robots offer promising solutions for underwater exploration due to their miniaturization and agility; however, precise control remains a significant challenge because of nonlinear fluid dynamics, flexible fin hysteresis, and the variable-duration control steps inherent to the actuation mechanism. This paper proposes a comprehensive data-driven control framework to address these complexities without relying on analytical modeling. Our methodology comprises three core components: 1) developing a forward dynamics model (FDM) using a neural network trained on real-world experimental data to capture state transitions under varying time steps; 2) integrating this FDM into a gradient-based model predictive control (G-MPC) architecture to optimize control inputs for path following; and 3) applying imitation learning to approximate the G-MPC policy, thereby reducing the computational cost for real-time implementation. We validate the approach through simulations utilizing the identified dynamics model. The results demonstrate that the G-MPC framework achieves accurate path convergence with minimal root mean square error (RMSE), and the imitation learning controller (ILC) effectively replicates this performance. This study highlights the potential of data-driven control strategies for the precise navigation of miniature, fish-like soft robots.</description>
  <dc:source>Computer_Science/cs.RO_(Robotics)</dc:source>
</item>
<item>
  <title>Simple generators of rational function fields</title>
  <link>https://arxiv.org/abs/2602.10878</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.10878v2 Announce Type: replace Abstract: Consider a subfield of the field of rational functions in several indeterminates. We present an algorithm that, given a set of generators of such a subfield, finds a simple generating set. We provide an implementation of the algorithm and show that it improves upon the state of the art both in efficiency and the quality of the results. Furthermore, we demonstrate the utility of simplified generators through several case studies from different application domains, such as structural parameter identifiability. The main algorithmic novelties include performing only partial Gr\&quot;obner basis computation via sparse interpolation and efficient search for polynomials of a fixed degree in a subfield of the rational function field.</description>
  <dc:source>Computer_Science/cs.SC_(Symbolic_Computation)</dc:source>
</item>
<item>
  <title>Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics</title>
  <link>https://arxiv.org/abs/2506.18339</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2506.18339v3 Announce Type: replace-cross Abstract: Understanding and modeling nonlinear dynamical systems is a fundamental challenge across science and engineering. Deep learning has shown remarkable potential for capturing complex system behavior, yet achieving models that are both accurate and physically interpretable remains difficult. To address this, we propose Structured Kolmogorov-Arnold Neural ODEs (SKANODEs), a framework that integrates structured state-space modeling with Kolmogorov-Arnold Networks (KANs). Within a Neural ODE architecture, SKANODE employs a fully trainable KAN as a universal function approximator to perform virtual sensing, recovering latent states that correspond to interpretable physical quantities such as displacements and velocities. Leveraging KAN&#39;s symbolic regression capability, SKANODE then extracts compact, interpretable expressions for the system&#39;s governing dynamics. Experiments on two canonical nonlinear oscillators and a real-world F-16 ground vibration dataset demonstrate that SKANODE reliably recovers physically meaningful latent displacement and velocity trajectories from acceleration measurements, identifies the correct governing nonlinearities--including the cubic stiffness in the Duffing oscillator and the nonlinear damping structure in the Van der Pol oscillator--and reveals hysteretic signatures in the F-16 interface dynamics through structured latent phase portraits and an interpretable symbolic model. Across all three cases, SKANODE provides more accurate and robust predictions than black-box NODE baselines and classical ARX and NARX identification, while producing equation-level descriptions of the learned nonlinear dynamics.</description>
  <dc:source>Computer_Science/cs.SC_(Symbolic_Computation)</dc:source>
</item>
<item>
  <title>WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech</title>
  <link>https://arxiv.org/abs/2603.04809</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04809v1 Announce Type: new Abstract: This paper presents our solution for the DL Sprint 4.0, addressing the dual challenges of Bengali Long-Form Speech Recognition (Task 1) and Speaker Diarization (Task 2). Processing long-form, multi-speaker Bengali audio introduces significant hurdles in voice activity detection, overlapping speech, and context preservation. To solve the long-form transcription challenge, we implemented a robust audio chunking strategy utilizing whisper-timestamped, allowing us to feed precise, context-aware segments into our fine-tuned acoustic model for high-accuracy transcription. For the diarization task, we developed an integrated pipeline leveraging pyannote.audio and WhisperX. A key contribution of our approach is the domain-specific fine-tuning of the Pyannote segmentation model on the competition dataset. This adaptation allowed the model to better capture the nuances of Bengali conversational dynamics and accurately resolve complex, overlapping speaker boundaries. Our methodology demonstrates that applying intelligent timestamped chunking to ASR and targeted segmentation fine-tuning to diarization significantly drives down Word Error Rate (WER) and Diarization Error Rate (DER), in low-resource settings.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models</title>
  <link>https://arxiv.org/abs/2603.04862</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04862v1 Announce Type: new Abstract: Large audio language models (LALMs) are a class of foundation models for audio understanding. Existing LALMs tend to degrade significantly in real-world noisy acoustic conditions where speech and non-speech sounds interfere. While noise-aware fine-tuning can improve robustness, it requires task-specific noisy data and expensive retraining, limiting scalability. To address this issue, we propose Focus-Then-Listen (FTL), a plug-and-play audio enhancer that improves LALMs&#39; noise robustness. Specifically, FTL first separates the input waveform into speech and non-speech, and a modality router is applied to predict the target audio modality (e.g., speech) based on the user&#39;s instruction. Finally, a modality-aware fusion block generates a task-adaptive enhanced signal for improved downstream perception and reasoning. Experiments across multiple LALMs and tasks show that FTL improves performance across different noise levels without fine-tuning on LALMs.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights</title>
  <link>https://arxiv.org/abs/2603.04865</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04865v1 Announce Type: new Abstract: Recent progress in audio generation has made it increasingly easy to create highly realistic environmental soundscapes, which can be misused to produce deceptive content, such as fake alarms, gunshots, and crowd sounds, raising concerns for public safety and trust. While deepfake detection for speech and singing voice has been extensively studied, environmental sound deepfake detection (ESDD) remains underexplored. To advance ESDD, the first edition of the ESDD challenge was launched, attracting 97 registered teams and receiving 1,748 valid submissions. This paper presents the task formulation, dataset construction, evaluation protocols, baseline systems, and key insights from the challenge results. Furthermore, we analyze common architectural choices and training strategies among top-performing systems. Finally, we discuss potential future research directions for ESDD, outlining key opportunities and open problems to guide subsequent studies in this field.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction</title>
  <link>https://arxiv.org/abs/2603.04943</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04943v1 Announce Type: new Abstract: Target speaker extraction (TSE) aims to isolate a specific speaker&#39;s voice from multi-speaker mixtures. Despite strong benchmark results, real-world performance often degrades due to different interacting factors. Previous curriculum learning approaches for TSE typically address these factors separately, failing to capture their complex interactions and relying on predefined difficulty factors that may not align with actual model learning behavior. To address this challenge, we first propose a multi-factor curriculum learning strategy that jointly schedules SNR thresholds, speaker counts, overlap ratios, and synthetic/real proportions, enabling progressive learning from simple to complex scenarios. However, determining optimal scheduling without predefined assumptions remains challenging. We therefore introduce TSE-Datamap, a visualization framework that grounds curriculum design in observed training dynamics by tracking confidence and variability across training epochs. Our analysis reveals three characteristic data regions: (i) easy-to-learn examples where models consistently perform well, (ii) ambiguous examples where models oscillate between alternative predictions, and (iii) hard-to-learn examples where models persistently struggle. Guided by these data-driven insights, our methods improve extraction results over random sampling, with particularly strong gains in challenging multi-speaker scenarios.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling</title>
  <link>https://arxiv.org/abs/2603.05094</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05094v1 Announce Type: new Abstract: Large Audio-Language Models (LALMs) typically struggle with localized dialectal prosody due to the scarcity of specialized corpora. We present TW-Sound580K, a Taiwanese audio-text instruction dataset developed through a Verify-Generate-Critique (VGC) protocol. This pipeline leverages Dual-ASR validation to filter 522K raw clips, subsequently expanding them into 580,000 high-fidelity instruction pairs using a teacher model. The dataset&#39;s utility is demonstrated through Tai-LALM, which fine-tunes a DeSTA 2.5-Audio-initialized backbone and incorporates a dynamic Dual-ASR Arbitration strategy to optimize transcription selection during inference. On the TAU Benchmark, Tai-LALM reaches 49.1% accuracy, marking a 6.5% absolute improvement over the zero-shot baseline (42.6% with ASR text conditioning). This confirms that integrating regional corpora with rigorous curation and dynamic arbitration significantly enhances LALM performance on localized speech.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards</title>
  <link>https://arxiv.org/abs/2603.05231</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05231v1 Announce Type: new Abstract: Recently, Automatic Speech Recognition (ASR) systems (e.g., Whisper) have achieved remarkable accuracy improvements but remain highly sensitive to real-world unseen data (data with large distribution shifts), including noisy environments and diverse accents. To address this issue, test-time adaptation (TTA) has shown great potential in improving the model adaptability at inference time without ground-truth labels, and existing TTA methods often rely on pseudo-labeling or entropy minimization. However, by treating model confidence as a learning signal, these methods may reinforce high-confidence errors, leading to confirmation bias that undermines adaptation. To overcome these limitations, we present ASR-TRA, a novel Test-time Reinforcement Adaptation framework inspired by causal intervention. More precisely, our method introduces a learnable decoder prompt and utilizes temperature-controlled stochastic decoding to generate diverse transcription candidates. These are scored by a reward model that measures audio-text semantic alignment, and the resulting feedback is used to update both model and prompt parameters via reinforcement learning. Comprehensive experiments on LibriSpeech with synthetic noise and L2 Arctic accented English datasets demonstrate that our method achieves higher accuracy while maintaining lower latency than existing TTA baselines. Ablation studies further confirm the effectiveness of combining audio and language-based rewards, highlighting our method&#39;s enhanced stability and interpretability. Overall, our approach provides a practical and robust solution for deploying ASR systems in challenging real-world conditions.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>SLICE: Speech Enhancement via Layer-wise Injection of Conditioning Embeddings</title>
  <link>https://arxiv.org/abs/2603.05302</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05302v1 Announce Type: new Abstract: Real-world speech is often corrupted by multiple degradations simultaneously, including additive noise, reverberation, and nonlinear distortion. Diffusion-based enhancement methods perform well on single degradations but struggle with compound corruptions. Prior noise-aware approaches inject conditioning at the input layer only, which can degrade performance below that of an unconditioned model. To address this, we propose injecting degradation conditioning, derived from a pretrained encoder with multi-task heads for noise type, reverberation, and distortion, into the timestep embedding so that it propagates through all residual blocks without architectural changes. In controlled experiments where only the injection method varies, input-level conditioning performs worse than no encoder at all on compound degradations, while layer-wise injection achieves the best results. The method also generalizes to diverse real-world recordings.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Latent-Mark: An Audio Watermark Robust to Neural Resynthesis</title>
  <link>https://arxiv.org/abs/2603.05310</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05310v1 Announce Type: new Abstract: While existing audio watermarking techniques have achieved strong robustness against traditional digital signal processing (DSP) attacks, they remain vulnerable to neural resynthesis. This occurs because modern neural audio codecs act as semantic filters and discard the imperceptible waveform variations used in prior watermarking methods. To address this limitation, we propose Latent-Mark, the first zero-bit audio watermarking framework designed to survive semantic compression. Our key insight is that robustness to the encode-decode process requires embedding the watermark within the codec&#39;s invariant latent space. We achieve this by optimizing the audio waveform to induce a detectable directional shift in its encoded latent representation, while constraining perturbations to align with the natural audio manifold to ensure imperceptibility. To prevent overfitting to a single codec&#39;s quantization rules, we introduce Cross-Codec Optimization, jointly optimizing the waveform across multiple surrogate codecs to target shared latent invariants. Extensive evaluations demonstrate robust zero-shot transferability to unseen neural codecs, achieving state-of-the-art resilience against traditional DSP attacks while preserving perceptual imperceptibility. Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection</title>
  <link>https://arxiv.org/abs/2603.05373</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05373v1 Announce Type: new Abstract: Neural codec language models enable high-quality discrete speech synthesis, yet their inference remains vulnerable to token-level artifacts and distributional drift that degrade perceptual realism. Rather than relying on preference optimization or retraining, we propose MSpoof-TTS, a training-free inference framework that improves zero-shot synthesis through multi-resolution spoof guidance. We introduce a Multi-Resolution Token-based Spoof Detection framework that evaluates codec sequences at different temporal granularities to detect locally inconsistent or unnatural patterns. We then integrate the spoof detectors into a hierarchical decoding strategy, progressively pruning low-quality candidates and re-ranking hypotheses. This discriminator-guided generation enhances robustness without modifying model parameters. Experiments validate the effectiveness of our framework for robust and high-quality codec-based speech generation.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Building Enterprise Realtime Voice Agents from Scratch: A Technical Tutorial</title>
  <link>https://arxiv.org/abs/2603.05413</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05413v1 Announce Type: new Abstract: We present a technical tutorial for building enterprise-grade realtime voice agents from first principles. While over 25 open-source speech-to-speech models and numerous voice agent frameworks exist, no single resource explains the complete pipeline from individual components to a working streaming voice agent with function calling capabilities. Through systematic investigation, we find that (1) native speech-to-speech models like Qwen2.5-Omni, while capable of high-quality audio generation, are too slow for realtime interaction ($\sim$13s time-to-first-audio); (2) the industry-standard approach uses a cascaded streaming pipeline: STT $\rightarrow$ LLM $\rightarrow$ TTS, where each component streams its output to the next; and (3) the key to ``realtime&#39;&#39; is not any single fast model but rather \textit{streaming and pipelining} across components. We build a complete voice agent using Deepgram (streaming STT), vLLM-served LLMs with function calling (streaming text generation), and ElevenLabs (streaming TTS), achieving a measured P50 time-to-first-audio of 947ms (best case 729ms) with cloud LLM APIs, and comparable latency with self-hosted vLLM on NVIDIA A10G GPU. We release the full codebase as a tutorial with working, tested code for every component.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio</title>
  <link>https://arxiv.org/abs/2603.05128</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05128v1 Announce Type: cross Abstract: Large Audio Language Models (LALMs) are increasingly capable of reasoning over audio. However, existing benchmarks provide limited coverage of reasoning in polyphonic audio, where multiple sound events co-occur and induce compositional structure. In this work, we introduce PolyBench, a benchmark designed to evaluate compositional reasoning in polyphonic audio. PolyBench comprises five evaluation subsets covering counting, classification, detection, concurrency, and duration estimation, requiring reasoning over multiple concurrent events and their relations. Evaluation of state-of-the-art LALMs reveals consistent performance degradation in polyphonic audio, indicating a fundamental bottleneck in current LALMs.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>SarcasmMiner: A Dual-Track Post-Training Framework for Robust Audio-Visual Sarcasm Reasoning</title>
  <link>https://arxiv.org/abs/2603.05275</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05275v1 Announce Type: cross Abstract: Multimodal sarcasm detection requires resolving pragmatic incongruity across textual, acoustic, and visual cues through cross-modal reasoning. To enable robust sarcasm reasoning with foundation models, we propose SarcasmMiner, a reinforcement learning based post-training framework that resists hallucination in multimodal reasoning. We reformulate sarcasm detection as structured reasoning and adopt a dual-track distillation strategy: high-quality teacher trajectories initialize the student model, while the full set of trajectories trains a generative reward model (GenRM) to evaluate reasoning quality. The student is optimized with group relative policy optimization (GRPO) using decoupled rewards for accuracy and reasoning quality. On MUStARD++, SarcasmMiner increases F1 from 59.83% (zero-shot), 68.23% (supervised finetuning) to 70.22%. These findings suggest that reasoning-aware reward modeling enhances both performance and multimodal grounding.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation</title>
  <link>https://arxiv.org/abs/2603.05299</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05299v1 Announce Type: cross Abstract: Large language models show that simple autoregressive training can yield scalable and coherent generation, but extending this paradigm to speech remains challenging due to the entanglement of semantic and acoustic information. Most existing speech language models rely on text supervision, hierarchical token streams, or complex hybrid architectures, departing from the single-stream generative pretraining paradigm that has proven effective in text. In this work, we introduce WavSLM, a speech language model trained by quantizing and distilling self-supervised WavLM representations into a single codebook and optimizing an autoregressive next-chunk prediction objective. WavSLM jointly models semantic and acoustic information within a single token stream without text supervision or text pretraining. Despite its simplicity, it achieves competitive performance on consistency benchmarks and speech generation while using fewer parameters, less training data, and supporting streaming inference. Demo samples are available at https://lucadellalib.github.io/wavslm-web/.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation</title>
  <link>https://arxiv.org/abs/2508.16332</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2508.16332v3 Announce Type: replace Abstract: Controllable human voice generation, particularly for expressive domains like singing, remains a significant challenge. This paper introduces Vevo2, a unified framework for controllable speech and singing voice generation. To tackle issues like the scarcity of annotated singing data and to enable flexible controllability, Vevo2 introduces two audio tokenizers: (1) a unified music-notation-free prosody tokenizer that captures prosody and melody from speech, singing, and even instrumental sounds, and (2) a unified content-style tokenizer that encodes linguistic content, prosody, and style for both speech and singing, while enabling timbre disentanglement. Vevo2 consists of an auto-regressive (AR) content-style modeling stage, which aims to enable controllability over text, prosody, and style, as well as a flow-matching acoustic modeling stage that allows for timbre control. Particularly, during the speech-singing joint training of the AR model, we propose both explicit and implicit prosody learning strategies to bridge speech and singing voice. Moreover, to further enhance the Vevo2&#39;s ability to follow text and prosody, we design a multi-objective post-training task that integrates both intelligibility and prosody similarity alignment. Experimental results show that the unified modeling in Vevo2 brings mutual benefits to both speech and singing voice generation. Additionally, Vevo2&#39;s effectiveness across a wide range of synthesis, conversion, and editing tasks for both speech and singing further demonstrates its strong generalization ability and versatility. Audio samples are are available at https://versasinger.github.io/.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition</title>
  <link>https://arxiv.org/abs/2509.05983</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2509.05983v4 Announce Type: replace Abstract: Code-switching (CS) presents a significant challenge for general Auto-Speech Recognition (ASR) systems. Existing methods often fail to capture the sub tle phonological shifts inherent in CS scenarios. The challenge is particu larly difficult for language pairs like Vietnamese and English, where both distinct phonological features and the ambiguity arising from similar sound recognition are present. In this paper, we propose a novel architecture for Vietnamese-English CS ASR, a Two-Stage Phoneme-Centric model (TSPC). TSPC adopts a phoneme-centric approach based on an extended Vietnamese phoneme set as an intermediate representation for mixed-lingual modeling, while remaining efficient under low computational-resource constraints. Ex perimental results demonstrate that TSPC consistently outperforms exist ing baselines, including PhoWhisper-base, in Vietnamese-English CS ASR, achieving a significantly lower word error rate of 19.06% with reduced train ing resources. Furthermore, the phonetic-based two-stage architecture en ables phoneme adaptation and language conversion to enhance ASR perfor mance in complex CS Vietnamese-English ASR scenarios.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>SAM: A Mamba-2 State-Space Audio-Language Model</title>
  <link>https://arxiv.org/abs/2509.15680</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2509.15680v2 Announce Type: replace Abstract: We present SAM, a State-space Audio-language Model that integrates an audio encoder with a Mamba-2 backbone. SAM-2.7B achieves 21.1 mAP on AudioSet and 17.6 SPICE on AudioCaps, matching or surpassing larger 7B transformer-based models with fewer parameters. We further provide the first systematic, representation-level analysis of how SSMs interact with audio encoder outputs: (1) joint audio encoder finetuning is essential, supported by accuracy gains and observed adaptation of token representation rank and similarity across different SSM sizes; (2) despite linear scaling, SSMs benefit more from compact, information-rich audio token representations than from excessively long token sequences; and (3) incorporating instruction-following supervision substantially improves reasoning ability, boosting MMAU-Sound accuracy from 22.8 to 56.8. Through comprehensive experiments and analysis, we establish practical design principles for SSMs as strong, scalable backbones for audio-language models.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription</title>
  <link>https://arxiv.org/abs/2509.21739</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2509.21739v2 Announce Type: replace Abstract: Automatic drum transcription (ADT) is traditionally formulated as a discriminative task to predict drum events from audio spectrograms. In this work, we redefine ADT as a conditional generative task and introduce Noise-to-Notes (N2N), a framework leveraging diffusion modeling to transform audio-conditioned Gaussian noise into drum events with associated velocities. This generative diffusion approach offers distinct advantages, including a flexible speed-accuracy trade-off and strong inpainting capabilities. However, the generation of binary onset and continuous velocity values presents a challenge for diffusion models, and to overcome this, we introduce an Annealed Pseudo-Huber loss to facilitate effective joint optimization. Finally, to augment low-level spectrogram features, we propose incorporating features extracted from music foundation models (MFMs), which capture high-level semantic information and enhance robustness to out-of-domain drum audio. Experimental results demonstrate that including MFM features significantly improves robustness and N2N establishes a new state-of-the-art performance across multiple ADT benchmarks.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Schr\&quot;odinger Bridge Mamba for One-Step Speech Enhancement</title>
  <link>https://arxiv.org/abs/2510.16834</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2510.16834v2 Announce Type: replace Abstract: We present Schr\&quot;odinger Bridge Mamba (SBM), a novel model for efficient speech enhancement by integrating the Schr\&quot;odinger Bridge (SB) training paradigm and the Mamba architecture. Experiments of joint denoising and dereverberation tasks demonstrate SBM outperforms strong generative and discriminative methods on multiple metrics with only one step of inference while achieving a competitive real-time factor for streaming feasibility. Ablation studies reveal that the SB paradigm consistently yields improved performance across diverse architectures over conventional mapping. Furthermore, Mamba exhibits a stronger performance under the SB paradigm compared to Multi-Head Self-Attention (MHSA) and Long Short-Term Memory (LSTM) backbones. These findings highlight the synergy between the Mamba architecture and the SB trajectory-based training, providing a high-quality solution for real-world speech enhancement. Demo page: https://sbmse.github.io</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention</title>
  <link>https://arxiv.org/abs/2512.04551</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2512.04551v2 Announce Type: replace Abstract: Speech emotion recognition (SER) is an important technology in human-computer interaction. However, achieving high performance is challenging due to emotional complexity and scarce annotated data. To tackle these challenges, we propose a multi-loss learning (MLL) framework integrating an energy-adaptive mixup (EAM) method and a frame-level attention module (FLAM). The EAM method leverages SNR-based augmentation to generate diverse speech samples capturing subtle emotional variations. FLAM enhances frame-level feature extraction for multi-frame emotional cues. Our MLL strategy combines Kullback-Leibler divergence, focal, center, and supervised contrastive loss to optimize learning, address class imbalance, and improve feature separability. We evaluate our method on four widely used SER datasets: IEMOCAP, MSP-IMPROV, RAVDESS, and SAVEE. The results demonstrate our method achieves state-of-the-art performance, suggesting its effectiveness and robustness.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode Networks</title>
  <link>https://arxiv.org/abs/2603.04446</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04446v1 Announce Type: cross Abstract: We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrative register data, which represent very large, multilayer, mixed-mode networks with millions of nodes and billions of edges, Threadle addresses a fundamental limitation of existing network libraries: the inability to efficiently handle two-mode (bipartite) data at scale. Threadle&#39;s core innovation is a pseudo-projection approach that allows two-mode layers to be queried as if they were projected into one-mode form, without ever materializing the memory-prohibitive projection. We demonstrate that a network with 20 million nodes containing layers equivalent to 8 trillion projected edges can be stored in approximately 20 GB of RAM -- a compression ratio exceeding 2000:1 compared to materialized projection. Additionally, Threadle provides native support for multilayer mixed-mode networks, an integrated node attribute manager, and a CLI frontend with 50+ commands for the construction, processing, file handling, and management of very large heterogeneous networks. Threadle is freely available at https://www.threadle.dev and can either be obtained as precompiled binaries for Win, macOS and Linux, or compiled directly from source. Supplementing Threadle is threadleR, an R frontend that enables advanced sampling- and traversal-based analyses on very large, heterogeneous, multilayer, mixed-mode population-scale networks.</description>
  <dc:source>Computer_Science/cs.SI_(Social_and_Information_Networks)</dc:source>
</item>
<item>
  <title>Shock Propagation and Macroeconomic Fluctuations</title>
  <link>https://arxiv.org/abs/2603.05367</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05367v1 Announce Type: cross Abstract: We study how idiosyncratic firm-level shocks generate aggregate volatility and tail risk when they propagate through a production network under overlapping adjustment: new productivity draws arrive before the economy reaches the static equilibrium associated with earlier draws. Each innovation generates a `productivity wave&#39; that mixes and dissipates over time as it travels through the production network. Macroeconomic fluctuations emerge from the interference between these waves of different vintages. The interference between these waves is governed by the dominant transient eigenvalue of the production network, and therefore so is the macroeconomic fluctuations they generate. In such a dynamic regime, the tail of the degree distribution is a markedly weaker determinant of macro fluctuations than in the fully adjusted static benchmark. And the macroeconomic significance of the degree-heterogeneity of production networks cannot be known without knowing the rate at which the economy converges to equilibrium or equivalently the spectral properties of the production network. More concretely, once we permit the time-averaging of shocks, granular shocks may account for only a small fraction of the empirically observed aggregate volatility.</description>
  <dc:source>Computer_Science/cs.SI_(Social_and_Information_Networks)</dc:source>
</item>
<item>
  <title>Latent space models for grouped multiplex networks</title>
  <link>https://arxiv.org/abs/2511.11086</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2511.11086v2 Announce Type: replace Abstract: Complex multilayer network datasets have become ubiquitous in various applications, including neuroscience, social sciences, economics, and genetics. Notable examples include brain connectivity networks collected across multiple patients or trade networks between countries collected across multiple goods. Existing statistical approaches to such data typically focus on modeling the structure shared by all networks; some go further by accounting for individual, layer-specific variation. However, real-world multilayer networks often exhibit additional patterns shared only within certain subsets of layers, which can represent treatment and control groups, or patients grouped by a specific trait. Identifying these group-level structures can uncover systematic differences between groups of networks and influence many downstream tasks, such as testing and low-dimensional visualization. To address this gap, we introduce the GroupMultiNeSS model, which enables the simultaneous extraction of shared, group-specific, and individual latent structures from a sample of networks on a shared node set. For this model, we establish identifiability, develop a fitting procedure using convex optimization in combination with a nuclear norm penalty, and prove a guarantee of recovery for the latent positions as long as there is sufficient separation between the shared, group-specific, and individual latent subspaces. We compare the model with MultiNeSS and other models for multiplex networks in various synthetic scenarios and observe an apparent improvement in the modeling accuracy when the group component is accounted for. Experiment with the Parkinson&#39;s disease brain connectivity dataset demonstrates the superiority of GroupMultiNeSS in highlighting node-level insights on biological differences between the treatment and control patient groups.</description>
  <dc:source>Computer_Science/cs.SI_(Social_and_Information_Networks)</dc:source>
</item>
<item>
  <title>How segmented is my network?</title>
  <link>https://arxiv.org/abs/2602.10125</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2602.10125v4 Announce Type: replace Abstract: Network segmentation is a popular security practice for limiting lateral movement, yet practitioners lack a metric to measure how segmented a network actually is. We define segmentedness as the fraction of potential node-pair communications disallowed by policy -- equivalently, the complement of graph edge density -- and show it to be the first statistically principled scalar metric for this purpose. Then, we derive a normalized estimator for segmentedness and evaluate its uncertainty using confidence intervals. For a 95\% confidence interval with a margin-of-error of $\pm 0.1$, we show that a minimum of $M=97$ sampled node pairs is sufficient. This result is independent of the total number of nodes in the network, provided that node pairs are sampled uniformly at random. We evaluate the estimator through Monte Carlo simulations on Erd\H{o}s--R\&#39;enyi, stochastic block models, and real-world enterprise network datasets, demonstrating accurate estimation. Finally, we discuss applications of the estimator, such as baseline tracking, zero trust assessment, and merger integration.</description>
  <dc:source>Computer_Science/cs.SI_(Social_and_Information_Networks)</dc:source>
</item>
<item>
  <title>A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction</title>
  <link>https://arxiv.org/abs/2601.19175</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2601.19175v4 Announce Type: replace-cross Abstract: Link sign prediction on a signed graph is a task to determine whether the relationship represented by an edge is positive or negative. Since the presence of negative edges violates the graph homophily assumption that adjacent nodes are similar, regular graph methods have not been applicable without auxiliary structures to handle them. We aim to directly model the latent statistical dependency among edges with the Gaussian copula and its corresponding correlation matrix, extending CopulaGNN (Ma et al., 2021). However, a naive modeling of edge-edge relations is computationally intractable even for a graph with moderate scale. To address this, we propose to 1) represent the correlation matrix as a Gramian of edge embeddings, significantly reducing the number of parameters, and 2) reformulate the conditional probability distribution to dramatically reduce the inference cost. We theoretically verify scalability of our method by proving its linear convergence. Also, our extensive experiments demonstrate that it achieves significantly faster convergence than baselines, maintaining competitive prediction performance to the state-of-the-art models.</description>
  <dc:source>Computer_Science/cs.SI_(Social_and_Information_Networks)</dc:source>
</item>
<item>
  <title>Joint Visible Light and RF Backscatter Communications for Ambient IoT Network: Fundamentals, Applications, and Opportunities</title>
  <link>https://arxiv.org/abs/2603.04626</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04626v1 Announce Type: new Abstract: The rapid growth of the Internet of Things (IoT) devices in the sixth-generation (6G) wireless networks raises significant generality and scalability challenges due to energy consumption, deployment complexity, and environmental impact. Ambient IoT (A-IoT), leveraging ambient energy harvesting (EH) for batteryless device operation, has emerged as a promising solution to address these challenges.Among various EH and communication techniques, visible light communication (VLC) integrated with ambient backscatter communication (AmBC) offers remarkable advantages, including energy neutrality, high reliability, and enhanced security. In this paper, we propose a joint VLC-AmBC architecture, emphasizing fundamental concepts, system designs, and practical implementations. We explore potential applications in environmental monitoring, healthcare, smart logistics, and secure communications. We present proof-of-concept demonstrations for three distinct types of ambient backscatter devices (AmBDs): EH-Only, VLC-Relay, and VLC-Control. Experimental results demonstrate the feasibility of implementing joint VLC-AmBC systems, highlighting their practical viability across various deployment scenarios. Finally, we outline future research directions, including integrated sensing and communication, as well as optimized energy-efficient deployment. Open issues, such as large-scale deployment challenges, are also discussed, thereby providing a clear roadmap for future developments in joint VLC-AmBC-enabled A-IoT ecosystems.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>On boundedness of solutions of three-state Moore-Greitzer compressor model with nonlinear proportional-integral controller for the surge subsystem</title>
  <link>https://arxiv.org/abs/2603.04661</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04661v1 Announce Type: new Abstract: The work focuses on Lagrange stability of the origin for the three-state Moore-Greitzer compressor model in closed loop with a nonlinear PI controller, tuned only to stabilize a lower-dimensional invariant surge-dynamics subsystem.The linearization of the system is not stabilizable but the static nonlinearity satisfies a sector condition, and together with a structural property of the stall-dynamics subsystem, this plays an essential role in the analysis. The main contribution provides explicit conditions on the controller parameters together with analytical arguments that guarantee boundedness of all solutions of the closed-loop system. The analysis employs a non-standard application of circle-criterion-based arguments. Together with the additional arguments developed in the work, this stability test also shows that the closed-loop system is robust to certain perturbations and model uncertainties.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>The Vertical Challenge of Low-Altitude Economy: Why We Need a Unified Height System?</title>
  <link>https://arxiv.org/abs/2603.04866</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04866v1 Announce Type: new Abstract: The explosive growth of the low-altitude economy, driven by eVTOLs and UAVs, demands a unified digital infrastructure to ensure safety and scalability. However, the current aviation vertical references are dangerously fragmented: manned aviation relies on barometric pressure, cartography uses Mean Sea Level (MSL), and obstacle avoidance depends on Above Ground Level (AGL). This fragmentation creates significant ambiguity for autonomous systems and hinders cross-stakeholder interoperability. In this article, we propose Height Above Ellipsoid (HAE) as the standardized vertical reference for lower airspace. Unlike legacy systems prone to environmental drift and inconsistent datums, HAE provides a globally consistent, GNSS-native, and mathematically stable reference. We present a pragmatic bidirectional transformation framework to bridge HAE with legacy systems and demonstrate its efficacy through (1) real-world implementation in Shenzhen&#39;s partitioned airspace management, and (2) a probabilistic risk assessment driven by empirical flight logs from the PX4 ecosystem. Results show that transitioning to HAE reduces the required vertical separation minimum, effectively increasing dynamic airspace capacity while maintaining a target safety level. This work offers a roadmap for transitioning from analog height keeping to a digital-native vertical standard.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Design of Grid Forming Multi Timescale Coordinated Control Strategies for Dynamic Virtual Power Plants</title>
  <link>https://arxiv.org/abs/2603.04962</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04962v1 Announce Type: new Abstract: As the penetration level of distributed energy resources (DERs) continues to rise, traditional frequency and voltage support from synchronous machines declines. This weakens grid stability and increases the need for fast and adaptive control in a dynamic manner, especially in weak grids. However, most virtual power plants (VPPs) rely on static aggregation and plan based resource allocation strategies. These methods overlook differences in device response times and limit flexibility for ancillary services. To address this issue, we propose a dynamic virtual power plant (DVPP) that coordinates heterogeneous resources across multiple time scales using grid forming control. We first contrast grid following and grid forming converters: grid following designs rely on a phase locked loop which can undermine stability in weak grids, whereas our DVPP applies virtual synchronous generator control at the aggregate level to provide effective inertia and damping. Then, we introduce a dynamic participation factor framework that measures each device s contribution through the frequency active power and voltage reactive power loops. Exploiting device heterogeneity, we adopt a banded allocation strategy: slow resources manage steady state and low frequency regulation; intermediate resources smooth transitions; and fast resources deliver rapid response and high frequency damping. Comparative simulations demonstrate that this coordinated, timescale aware approach enhances stability and ancillary service performance compared to conventional VPPs.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>A Unified Hybrid Control Architecture for Multi-DOF Robotic Manipulators</title>
  <link>https://arxiv.org/abs/2603.04988</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04988v1 Announce Type: new Abstract: Multi-degree-of-freedom (DOF) robotic manipulators exhibit strongly nonlinear, high-dimensional, and coupled dynamics, posing significant challenges for controller design. To address these issues, this work proposes a unified hybrid control architecture that integrates model predictive control (MPC) with feedback regulation, together with a stability analysis of the proposed scheme. The proposed approach mitigates the optimization difficulty associated with high-dimensional nonlinear systems and enhances overall control performance. Furthermore, a hardware implementation scheme based on machine learning (ML) is proposed to achieve high computational efficiency while maintaining control accuracy. Finally, simulation and hardware experiments under external disturbances validate the proposed architecture, demonstrating its superior performance, hardware feasibility, and generalization capability for multi-DOF manipulation tasks.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Receding-Horizon Maximum-Likelihood Estimation of Neural-ODE Dynamics and Thresholds from Event Cameras</title>
  <link>https://arxiv.org/abs/2603.05011</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05011v1 Announce Type: new Abstract: Event cameras emit asynchronous brightness-change events where each pixel triggers an event when the last event exceeds a threshold, yielding a history-dependent measurement model. We address online maximum-likelihood identification of continuous-time dynamics from such streams. The latent state follows a Neural ODE and is mapped to predicted log-intensity through a differentiable state-to-image model. We model events with a history-dependent marked point process whose conditional intensity is a smooth surrogate of contrast-threshold triggering, treating the contrast threshold as an unknown parameter. The resulting log-likelihood consists of an event term and a compensator integral. We propose a receding-horizon estimator that performs a few gradient steps per update on a receding horizon window. For streaming evaluation, we store two scalars per pixel (last-event time and estimated log-intensity at that time) and approximate the compensator via Monte Carlo pixel subsampling. Synthetic experiments demonstrate joint recovery of dynamics parameters and the contrast threshold, and characterize accuracy--latency trade-offs with respect to the window length.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Formal Entropy-Regularized Control of Stochastic Systems</title>
  <link>https://arxiv.org/abs/2603.05021</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05021v1 Announce Type: new Abstract: Analyzing and controlling system entropy is a powerful tool for regulating predictability of control systems. Applications benefiting from such approaches range from reinforcement learning and data security to human-robot collaboration. In continuous-state stochastic systems, accurate entropy analysis and control remains a challenge. In recent years, finite-state abstractions of continuous systems have enabled control synthesis with formal performance guarantees on objectives such as stage costs. However, these results do not extend to entropy-based performance measures. We solve this problem by first obtaining bounds on the entropy of system discretizations using traditional formal-abstractions results, and then obtaining an additional bound on the difference between the entropy of a continuous distribution and that of its discretization. The resulting theory enables formal entropy-aware controller synthesis that trades predictability against control performance while preserving formal guarantees for the original continuous system. More specifically, we focus on minimizing the linear combination of the KL divergence of the system trajectory distribution to uniform -- our system entropy metric -- and a generic cumulative cost. We note that the bound we derive on the difference between the KL divergence to uniform of a given continuous distribution and its discretization can also be relevant in more general information-theoretic contexts. A set of case studies illustrates the effectiveness of the method.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Trajectory Tracking for Uncrewed Surface Vessels with Input Saturation and Dynamic Motion Constraints</title>
  <link>https://arxiv.org/abs/2603.05115</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05115v1 Announce Type: new Abstract: This work addresses the problem of constrained motion control of the uncrewed surface vessels. The constraints are imposed on states/inputs of the vehicles due to the physical limitations, mission requirements, and safety considerations. We develop a nonlinear feedback controller utilizing log-type Barrier Lyapunov Functions to enforce static and dynamic motion constraints. The proposed scheme uniquely addresses asymmetric constraints on position and heading alongside symmetric constraints on surge, sway, and yaw rates. Additionally, a smooth input saturation model is incorporated in the design to guarantee stability even under actuator bounds, which, if unaccounted for, can lead to severe performance degradation and poor tracking. Rigorous Lyapunov stability analysis shows that the closed-loop system remains stable and that all state variables remain within their prescribed bounds at all times, provided the initial conditions also lie within those bounds. Numerical simulations demonstrate the effectiveness of the proposed strategies for surface vessels without violating the motion and actuator constraints.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Uncertainty and Autarky: Cooperative Game Theory for Stable Local Energy Market Partitioning</title>
  <link>https://arxiv.org/abs/2603.05169</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05169v1 Announce Type: new Abstract: Local energy markets empower prosumers to form coalitions for energy trading. However, the optimal partitioning of the distribution grid into such coalitions remains unclear, especially in constrained grids with stochastic production and consumption. This analysis must take into account the interests of both the grid operator and the constituent prosumers. In this work, we present a cooperative game theoretic framework to study distribution grid partitioning into local energy market coalitions under uncertain prosumption and grid constraints. We formulate the optimal stable partitioning problem to balance the interests of the grid operator with that of prosumers. Under deterministic load and generation, we show that the largest market coalition is the optimal stable partition. For the case of stochastic loads and generation, we provide an algorithm to evaluate the optimal stable partition. Numerical experiments are performed on benchmark and real world distribution grids. Our results help in understanding how uncertainty affects local energy market partitioning decisions in constrained distribution grids.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Computing Scaled Relative Graphs of Discrete-time LTI Systems from Data</title>
  <link>https://arxiv.org/abs/2603.05239</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05239v1 Announce Type: new Abstract: Graphical methods for system analysis have played a central role in control theory. A recently emerging tool in this field is the Scaled Relative Graph (SRG). In this paper, we further extend its applicability by showing how the SRG of discrete-time linear-time-invariant (LTI) systems can be computed exactly from its state-space representation using linear matrix inequalities. We additionally propose a fully data-driven approach where we demonstrate how to compute the SRG exclusively from input-output data. Furthermore, we introduce a robust version of the SRG, which can be computed from noisy data trajectories and contains the SRG of the actual system.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>A Comprehensive Approach to Directly Addressing Estimation Delays in Stochastic Guidance</title>
  <link>https://arxiv.org/abs/2603.05363</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05363v1 Announce Type: new Abstract: In realistic pursuit-evasion scenarios, abrupt target maneuvers generate unavoidable periods of elevated uncertainty that result in estimation delays. Such delays can degrade interception performance to the point of causing a miss. Existing delayed-information guidance laws fail to provide a complete remedy, as they typically assume constant and known delays. Moreover, in practice they are fed by filtered estimates, contrary to these laws&#39; foundational assumptions. We present an overarching strategy for tracking and interception that explicitly accounts for time-varying estimation delays. We first devise a guidance law that incorporates two time-varying delays, thereby generalizing prior deterministic formulations. This law is driven by a particle-based fixed-lag smoother that provides it with appropriately delayed state estimates. Furthermore, using semi-Markov modeling of the target&#39;s maneuvers, the delays are estimated in real-time, enabling adaptive adjustment of the guidance inputs during engagement. The resulting framework consistently conjoins estimation, delay modeling, and guidance. Its effectiveness and superior robustness over existing delayed-information guidance laws are demonstrated via an extensive Monte Carlo study.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems</title>
  <link>https://arxiv.org/abs/2603.04443</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04443v1 Announce Type: cross Abstract: Long-running LLM agents require persistent memory to preserve state across interactions, yet most deployed systems manage memory with age-based retention (e.g., TTL). While TTL bounds item lifetime, it does not bound the computational footprint of memory on the request path: as retained items accumulate, retrieval candidate sets and vector similarity scans can grow unpredictably, yielding heavy-tailed latency and unstable throughput. We present AMV-L (Adaptive Memory Value Lifecycle), a memory-management framework that treats agent memory as a managed systems resource. AMV-L assigns each memory item a continuously updated utility score and uses value-driven promotion, demotion, and eviction to maintain lifecycle tiers; retrieval is restricted to a bounded, tier-aware candidate set that decouples the request-path working set from total retained memory. We implement AMV-L in a full-stack LLM serving system and evaluate it under identical long-running workloads against two baselines: TTL and an LRU working-set policy, with fixed prompt-injection caps. Relative to TTL, AMV-L improves throughput by 3.1x and reduces latency by 4.2x (median), 4.7x (p95), and 4.4x (p99), while reducing the fraction of requests exceeding 2s from 13.8% to 0.007%. Compared to LRU, AMV-L trades a small regression in median/p95 latency (+26% / +3%) for improved extreme-tail behavior (-15% p99; -98% &gt;2s) and lower token overhead (approximately 6% fewer tokens/request), while matching retrieval quality (value means within approximately 0-2%). The gains arise primarily from bounding retrieval-set size and vector-search work, not from shortening prompts. Our results show that predictable performance for long-running LLM agents requires explicit control of memory working-set size and value-driven lifecycle management, rather than retention time alone.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Multistage Stochastic Programming for Rare Event Risk Mitigation in Power Systems Management</title>
  <link>https://arxiv.org/abs/2603.04734</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04734v1 Announce Type: cross Abstract: High intermittent renewable penetration in the energy mix presents challenges in robustness for the management of power systems&#39; operation. If a tail realization of the distribution of weather yields a prolonged period of time during which solar irradiation and wind speed are insufficient for satisfying energy demand, then it becomes critical to ramp up the generation of conventional power plants with adequate foresight. This event trigger is costly, and inaccurate forecasting can either be wasteful or yield catastrophic undersupply. This encourages particular attention to accurate modeling of the noise and the resulting dynamics within the aforementioned scenario. In this work we present a method for rare event-aware control of power systems using multi-stage scenario-based optimization. A Fleming-Viot particle approach is used to bias the scenario generation towards rare realizations of very low wind power, in order to obtain a cost-effective control of conventional power plants that is robust under prolonged renewable energy shortfalls.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Policy Optimization of Mixed H2/H-infinity Control: Benign Nonconvexity and Global Optimality</title>
  <link>https://arxiv.org/abs/2603.04843</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04843v1 Announce Type: cross Abstract: Mixed H2/H-infinity control balances performance and robustness by minimizing an H2 cost bound subject to an H-infinity constraint. However, classical Riccati/LMI solutions offer limited insight into the nonconvex optimization landscape and do not readily scale to large-scale or data-driven settings. In this paper, we revisit mixed H2/H-infinity control from a modern policy optimization viewpoint, including the general two-channel and single-channel cases. One central result is that both cases enjoy a benign nonconvex structure: every stationary point is globally optimal. We characterize the H-infinity-constrained feasible set, which is open, path-connected, with boundary given exactly by policies saturating the H-infinity constraint. We also show that the mixed objective is real analytic in the interior with explicit gradient formulas. Our key analysis builds on an Extended Convex Lifting (ECL) framework that bridges nonconvex policy optimization and convex reformulations. The ECL constructions rely on non-strict Riccati inequalities that allow us to characterize global optimality. These insights reveal hidden convexity in mixed H2/H-infinity control and facilitate the design of scalable policy iteration methods in large-scale settings.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>U-OBCA: Uncertainty-Aware Optimization-Based Collision Avoidance via Wasserstein Distributionally Robust Chance Constraints</title>
  <link>https://arxiv.org/abs/2603.04914</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04914v1 Announce Type: cross Abstract: Uncertainties arising from localization error, trajectory prediction errors of the moving obstacles and environmental disturbances pose significant challenges to robot&#39;s safe navigation. Existing uncertainty-aware planners often approximate polygon-shaped robots and obstacles using simple geometric primitives such as circles or ellipses. Though computationally convenient, these approximations substantially shrink the feasible space, leading to overly conservative trajectories and even planning failure in narrow environments. In addition, many such methods rely on specific assumptions about noise distributions, which may not hold in practice and thus limit their performance guarantees. To address these limitations, we extend the Optimization-Based Collision Avoidance (OBCA) framework to an uncertainty-aware formulation, termed \emph{U-OBCA}. The proposed method explicitly accounts for the collision risk between polygon-shaped robots and obstacles by formulating OBCA-based chance constraints, and hence avoiding geometric simplifications and reducing unnecessary conservatism. These probabilistic constraints are further tightened into deterministic nonlinear constraints under mild distributional assumptions, which can be solved efficiently by standard numerical optimization solvers. The proposed approach is validated through theoretical analysis, numerical simulations and real-world experiments. The results demonstrate that U-OBCA significantly mitigates the conservatism in trajectory planning and achieves higher navigation efficiency compared to existing baseline methods, particularly in narrow and cluttered environments.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Curve-Induced Dynamical Systems on Riemannian Manifolds and Lie Groups</title>
  <link>https://arxiv.org/abs/2603.05268</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05268v1 Announce Type: cross Abstract: Deploying robots in household environments requires safe, adaptable, and interpretable behaviors that respect the geometric structure of tasks. Often represented on Lie groups and Riemannian manifolds, this includes poses on SE(3) or symmetric positive definite matrices encoding stiffness or damping matrices. In this context, dynamical system-based approaches offer a natural framework for generating such behavior, providing stability and convergence while remaining responsive to changes in the environment. We introduce Curve-induced Dynamical systems on Smooth Manifolds (CDSM), a real-time framework for constructing dynamical systems directly on Riemannian manifolds and Lie groups. The proposed approach constructs a nominal curve on the manifold, and generates a dynamical system which combines a tangential component that drives motion along the curve and a normal component that attracts the state toward the curve. We provide a stability analysis of the resulting dynamical system and validate the method quantitatively. On an S2 benchmark, CDSM demonstrates improved trajectory accuracy, reduced path deviation, and faster generation and query times compared to state-of-the-art methods. Finally, we demonstrate the practical applicability of the framework on both a robotic manipulator, where poses on SE(3) and damping matrices on SPD(n) are adapted online, and a mobile manipulator.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>From Code to Road: A Vehicle-in-the-Loop and Digital Twin-Based Framework for Central Car Server Testing in Autonomous Driving</title>
  <link>https://arxiv.org/abs/2603.05279</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05279v1 Announce Type: cross Abstract: Simulation is one of the most essential parts in the development stage of automotive software. However, purely virtual simulations often struggle to accurately capture all real-world factors due to limitations in modeling. To address this challenge, this work presents a test framework for automotive software on the centralized E/E architecture, which is a central car server in our case, based on Vehicle-in-the-Loop (ViL) and digital twin technology. The framework couples a physical test vehicle on a dynamometer test bench with its synchronized virtual counterpart in a simulation environment. Our approach provides a safe, reproducible, realistic, and cost-effective platform for validating autonomous driving algorithms with a centralized architecture. This test method eliminates the need to test individual physical ECUs and their communication protocols separately. In contrast to traditional ViL methods, the proposed framework runs the full autonomous driving software directly on the vehicle hardware after the simulation process, eliminating flashing and intermediate layers while enabling seamless virtual-physical integration and accurately reflecting centralized E/E behavior. In addition, incorporating mixed testing in both simulated and physical environments reduces the need for full hardware integration during the early stages of automotive development. Experimental case studies demonstrate the effectiveness of the framework in different test scenarios. These findings highlight the potential to reduce development and integration efforts for testing autonomous driving pipelines in the future.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Accelerating Sampling-Based Control via Learned Linear Koopman Dynamics</title>
  <link>https://arxiv.org/abs/2603.05385</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05385v1 Announce Type: cross Abstract: This paper presents an efficient model predictive path integral (MPPI) control framework for systems with complex nonlinear dynamics. To improve the computational efficiency of classic MPPI while preserving control performance, we replace the nonlinear dynamics used for trajectory propagation with a learned linear deep Koopman operator (DKO) model, enabling faster rollout and more efficient trajectory sampling. The DKO dynamics are learned directly from interaction data, eliminating the need for analytical system models. The resulting controller, termed MPPI-DK, is evaluated in simulation on pendulum balancing and surface vehicle navigation tasks, and validated on hardware through reference-tracking experiments on a quadruped robot. Experimental results demonstrate that MPPI-DK achieves control performance close to MPPI with true dynamics while substantially reducing computational cost, enabling efficient real-time control on robotic platforms.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Near-Optimal Low-Complexity MIMO Detection via Structured Reduced-Search Enumeration</title>
  <link>https://arxiv.org/abs/2603.05441</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05441v1 Announce Type: cross Abstract: Maximum-likelihood (ML) detection in high-order MIMO systems is computationally prohibitive due to exponential complexity in the number of transmit layers and constellation size. In this white paper, we demonstrate that for practical MIMO dimensions (up to 8x8) and modulation orders, near-ML hard-decision performance can be achieved using a structured reduced-search strategy with complexity linear in constellation size. Extensive simulations over i.i.d. Rayleigh fading channels show that list sizes of 3|X| for 3x3, 4|X| for 4x4, and 8|X| for 8x8 systems closely match full ML performance, even under high channel condition numbers, |X| being the constellation size. In addition, we provide a trellis based interpretation of the method. We further discuss implications for soft LLR generation and FEC interaction.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>NL2GDS: LLM-aided interface for Open Source Chip Design</title>
  <link>https://arxiv.org/abs/2603.05489</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.05489v1 Announce Type: cross Abstract: The growing complexity of hardware design and the widening gap between high-level specifications and register-transfer level (RTL) implementation hinder rapid prototyping and system design. We introduce NL2GDS (Natural Language to Layout), a novel framework that leverages large language models (LLMs) to translate natural language hardware descriptions into synthesizable RTL and complete GDSII layouts via the open-source OpenLane ASIC flow. NL2GDS employs a modular pipeline that captures informal design intent, generates HDL using multiple LLM engines and verifies them, and orchestrates automated synthesis and layout. Evaluations on ISCAS&#39;85 and ISCAS&#39;89 benchmark designs demonstrate up to 36% area reduction, 35% delay reduction, and 70% power savings compared to baseline designs, highlighting its potential to democratize ASIC design and accelerate hardware innovation.</description>
  <dc:source>Computer_Science/cs.SY_(Systems_and_Control)</dc:source>
</item>
<item>
  <title>Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings</title>
  <link>https://arxiv.org/abs/2603.04605</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2603.04605v1 Announce Type: cross Abstract: Training-free anomalous sound detection (ASD) based on pre-trained audio embedding models has recently garnered significant attention, as it enables the detection of anomalous sounds using only normal reference data while offering improved robustness under domain shifts. However, existing embedding-based approaches almost exclusively rely on temporal mean pooling, while alternative pooling strategies have so far only been explored for spectrogram-based representations. Consequently, the role of temporal pooling in training-free ASD with pre-trained embeddings remains insufficiently understood. In this paper, we present a systematic evaluation of temporal pooling strategies across multiple state-of-the-art audio embedding models. We propose relative deviation pooling (RDP), an adaptive pooling method that emphasizes informative temporal deviations, and introduce a hybrid pooling strategy that combines RDP with generalized mean pooling. Experiments on five benchmark datasets demonstrate that the proposed methods consistently outperform mean pooling and achieve state-of-the-art performance for training-free ASD, including results that surpass all previously reported trained systems and ensembles on the DCASE2025 ASD dataset.</description>
  <dc:source>Computer_Science/cs.SD_(Sound)</dc:source>
</item>
<item>
  <title>Vector Retrieval with Similarity and Diversity: How Hard Is It?</title>
  <link>https://arxiv.org/abs/2407.04573</link>
  <pubDate>Fri, 06 Mar 2026 00:00:00 -0500</pubDate>
  <description>arXiv:2407.04573v3 Announce Type: replace Abstract: Dense vector retrieval is essential for semantic queries within Natural Language Processing, particularly in knowledge-intensive applications like Retrieval-Augmented Generation (RAG). The ability to retrieve vectors that satisfy both similarity and diversity substantially enhances system performance. Although the Maximal Marginal Relevance (MMR) algorithm is widely used to balance these objectives, its reliance on a manually tuned parameter leads to optimization fluctuations and unpredictable retrieval results. Furthermore, there is a lack of sufficient theoretical analysis on the joint optimization of similarity and diversity in vector retrieval. To address these challenges, this paper introduces a novel approach that characterizes both constraints simultaneously by maximizing the similarity between the query vector and the sum of the selected candidate vectors. We formally define this optimization problem, Vectors Retrieval with Similarity and Diversity (VRSD) , and prove that it is NP-complete, establishing a rigorous theoretical bound on the inherent difficulty of this dual-objective retrieval. Subsequently, we present a parameter-free heuristic algorithm to solve VRSD. Extensive evaluations on multiple scientific QA datasets , incorporating both objective geometric metrics and LLM-simulated subjective assessments, demonstrate that our VRSD heuristic consistently outperforms established baselines, including MMR and Determinantal Point Processes (k-DPP).</description>
  <dc:source>Computer_Science/cs.IR_(Information_Retrieval)</dc:source>
</item>
</channel>
</rss>
