arXiv: 2604.22085 · PDF
Authors: Seyed Moein Abtahi, Rasa Rahnema, Hetkumar Patel, Neel Patel, Majid Fekri, Tara Khani
Primary category: cs.AI · all: cs.AI
Matched keywords: large language model, agent, agentic, retrieval, inference, latency
TL;DR
Memanto is a memory layer for long-horizon LLM agents that replaces knowledge-graph pipelines with a typed semantic schema plus an information-theoretic retrieval engine, hitting 89.8% on LongMemEval and 87.1% on LoCoMo with single-query retrieval and no ingestion cost.
Key Ideas
- Knowledge-graph complexity is not required for high-fidelity agent memory.
- A fixed schema of 13 typed memory categories suffices for broad agent use.
- Automated conflict resolution + temporal versioning handle evolving facts.
- Moorcheh’s no-index information-theoretic search gives sub-90ms deterministic retrieval.
- One retrieval query per turn beats multi-query hybrid pipelines.
Approach
Memanto defines a typed semantic memory with 13 predefined categories. Writes go through automated conflict resolution and temporal versioning, avoiding LLM-mediated entity extraction and graph schema upkeep. Retrieval uses Moorcheh’s information-theoretic search engine — described as a no-indexing semantic database — issuing a single query with sub-90ms latency. No ingestion-time indexing is performed.
Experiments
Benchmarks: LongMemEval and LoCoMo. Baselines: hybrid semantic-graph systems and vector-based memory systems (unnamed in abstract). Metric: accuracy. A five-stage progressive ablation quantifies the contribution of each component (typed schema, conflict resolution, versioning, retrieval engine, single-query policy).
Results
- LongMemEval: 89.8% accuracy (SOTA over evaluated baselines).
- LoCoMo: 87.1% accuracy (SOTA over evaluated baselines).
- Single retrieval query, zero ingestion cost, sub-90ms retrieval latency.
- Ablation reportedly isolates per-component gains, though specific deltas are not in the abstract.
Why It Matters
For agent infra teams, this suggests graph-heavy memory stacks (entity extraction, schema curation, multi-hop retrieval) can be replaced by a typed schema + fast semantic search without accuracy loss. That lowers operational complexity and unlocks cheaper multi-session agents.
Connections to Prior Work
- Hybrid graph memory systems (e.g., GraphRAG-style, Zep, MemGPT variants).
- Long-term conversational memory benchmarks (LongMemEval, LoCoMo).
- Vector-database RAG and dense retrieval literature.
- Information-theoretic retrieval / coding-based search as an alternative to ANN indexes.
Open Questions
- Which exact baselines are compared, and by how much does Memanto beat each?
- How were the 13 categories chosen, and do they generalize beyond the two benchmarks?
- Does the no-index engine scale to millions of memories per user without latency regression?
- How robust is automated conflict resolution under adversarial or contradictory updates?
- Reproducibility: is Moorcheh open or proprietary, and are results independently verifiable?
- Cost/latency comparison against strong vector baselines under identical hardware is unclear.
Figures
Figure 1: Page 2 (rendered)

Figure 2: Page 3 (rendered)

Figure 3: Page 4 (rendered)

Original abstract
The transition from stateless language model inference to persistent, multi session autonomous agents has revealed memory to be a primary architectural bottleneck in the deployment of production grade agentic systems. Existing methodologies largely depend on hybrid semantic graph architectures, which impose substantial computational overhead during both ingestion and retrieval. These systems typically require large language model mediated entity extraction, explicit graph schema maintenance, and multi query retrieval pipelines. This paper introduces Memanto, a universal memory layer for agentic artificial intelligence that challenges the prevailing assumption that knowledge graph complexity is necessary to achieve high fidelity agent memory. Memanto integrates a typed semantic memory schema comprising thirteen predefined memory categories, an automated conflict resolution mechanism, and temporal versioning. These components are enabled by Moorcheh’s Information Theoretic Search engine, a no indexing semantic database that provides deterministic retrieval within sub ninety millisecond latency while eliminating ingestion delay. Through systematic benchmarking on the LongMemEval and LoCoMo evaluation suites, Memanto achieves state of the art accuracy scores of 89.8 percent and 87.1 percent respectively. These results surpass all evaluated hybrid graph and vector based systems while requiring only a single retrieval query, incurring no ingestion cost, and maintaining substantially lower operational complexity. A five stage progressive ablation study is presented to quantify the contribution of each architectural component, followed by a discussion of the implications for scalable deployment of agentic memory systems.