arXiv: 2604.22085 · PDF
Authors: Seyed Moein Abtahi, Rasa Rahnema, Hetkumar Patel, Neel Patel, Majid Fekri, Tara Khani
Affiliations: Moorcheh AI, EdgeAI Innovations
Primary category: cs.AI · all: cs.AI
Matched keywords: large language model, agent, agentic, retrieval, inference, latency
TL;DR
Memanto is a universal memory layer for long-horizon agents that replaces hybrid semantic-graph architectures with a typed semantic schema plus Moorcheh’s information-theoretic search engine, reaching 89.8% on LongMemEval and 87.1% on LoCoMo with single-query retrieval and sub-90ms latency.
Key Ideas
- Knowledge-graph complexity is not necessary for high-fidelity agent memory.
- A fixed schema of 13 typed memory categories suffices for production agents.
- Information-theoretic retrieval beats hybrid graph + vector pipelines.
- Automated conflict resolution and temporal versioning handle multi-session state.
- Zero ingestion cost and no indexing yield lower operational complexity.
Approach
Memanto defines thirteen predefined memory categories as a typed semantic schema, layered with automated conflict resolution and temporal versioning for multi-session persistence. Storage and retrieval run on Moorcheh’s Information Theoretic Search engine — a no-indexing semantic database offering deterministic retrieval and eliminating ingestion delay. Retrieval uses a single query rather than multi-query pipelines, and avoids LLM-mediated entity extraction or explicit graph schema maintenance.
Experiments
Systematic benchmarking on LongMemEval and LoCoMo evaluation suites. Baselines are hybrid graph and vector-based memory systems. Metrics center on retrieval accuracy, retrieval latency, and ingestion cost. A five-stage progressive ablation study quantifies the contribution of each architectural component (schema, conflict resolution, versioning, retrieval engine, etc.).
Results
- LongMemEval: 89.8% accuracy (state-of-the-art).
- LoCoMo: 87.1% accuracy (state-of-the-art).
- Sub-90ms retrieval latency, single query, no ingestion cost.
- Surpasses all evaluated hybrid graph and vector baselines while reducing operational complexity.
Why It Matters
For agent/LLM infra practitioners, Memanto suggests persistent memory layers can drop the graph-extraction tax entirely: simpler schema, cheaper ingestion, faster retrieval, and better accuracy. It reframes memory design around typed categories + information-theoretic search rather than LLM-driven knowledge-graph construction, which is appealing for scalable multi-session agent deployments.
Connections to Prior Work
- Hybrid semantic-graph memory systems (MemGPT, Zep, Graphiti-style pipelines).
- Vector-database retrieval (dense retrievers, RAG memory stores).
- LongMemEval and LoCoMo long-horizon memory benchmarks.
- Information-theoretic retrieval and entropy-based similarity search.
- Temporal and versioned knowledge bases for agent state.
Open Questions
- How is the 13-category schema chosen, and does it generalize beyond evaluated domains?
- Performance under adversarial, noisy, or schema-mismatched inputs?
- Scaling behavior as memory grows to millions of entries.
- Details and reproducibility of Moorcheh’s information-theoretic engine.
- Comparison against newer graph-free baselines and token-cost trade-offs.
Original abstract
The transition from stateless language model inference to persistent, multi session autonomous agents has revealed memory to be a primary architectural bottleneck in the deployment of production grade agentic systems. Existing methodologies largely depend on hybrid semantic graph architectures, which impose substantial computational overhead during both ingestion and retrieval. These systems typically require large language model mediated entity extraction, explicit graph schema maintenance, and multi query retrieval pipelines. This paper introduces Memanto, a universal memory layer for agentic artificial intelligence that challenges the prevailing assumption that knowledge graph complexity is necessary to achieve high fidelity agent memory. Memanto integrates a typed semantic memory schema comprising thirteen predefined memory categories, an automated conflict resolution mechanism, and temporal versioning. These components are enabled by Moorcheh’s Information Theoretic Search engine, a no indexing semantic database that provides deterministic retrieval within sub ninety millisecond latency while eliminating ingestion delay. Through systematic benchmarking on the LongMemEval and LoCoMo evaluation suites, Memanto achieves state of the art accuracy scores of 89.8 percent and 87.1 percent respectively. These results surpass all evaluated hybrid graph and vector based systems while requiring only a single retrieval query, incurring no ingestion cost, and maintaining substantially lower operational complexity. A five stage progressive ablation study is presented to quantify the contribution of each architectural component, followed by a discussion of the implications for scalable deployment of agentic memory systems.