Memory infrastructure for AI developers

Persistent memory for any LLM.
Zero tokens.

Drop memory into your existing workflow. Works with GPT, Claude, Gemini, Llama, or any model — and they all share the same memory store. Your users' context survives every session, across every LLM. Every run teaches the next — your pipeline compounds in intelligence with zero extra code.

LIVE api.becomer.net
JSON · stateless · per-user ● 200 OK · 163ms
One API key. Every LLM. One shared memory.
Your GPT app, Claude app, and LangChain agent
all remember the same user.
GPT
Claude
Gemini
LangChain
BECOMER
SHARED MEMORY STORE
GPT
Claude
Gemini
LangChain
# GPT stores it Client("bk-your-key").store("Sarah prefers TypeScript, works at Stripe") # Claude recalls it — same key, same store Client("bk-your-key").recall("who is the user?") # → ["Sarah prefers TypeScript, works at Stripe"]
01 — Integration
Two lines before. Two lines after.

That's the entire integration. Connect via REST API or MCP — works in any stack.

# Store memory POST https://api.becomer.net/v1/store Authorization: Bearer YOUR_API_KEY { "content": "User prefers dark mode and concise answers" } # Recall before your LLM call POST https://api.becomer.net/v1/recall Authorization: Bearer YOUR_API_KEY { "query": "What does this user prefer?", "top_k": 5 } # Returns: { "memories": ["User prefers dark mode and concise answers"] }
# Add to your mcp.json { "mcpServers": { "becomer": { "command": "python", "args": ["-m", "becomer"], "env": { "BECOMER_API_KEY": "YOUR_API_KEY" } } } }
# pip install becomer import becomer mem = becomer.Client("YOUR_API_KEY") # Before your LLM call context = mem.recall("What does this user like?") # After your LLM call mem.store("User asked about Python decorators")
# pip install becomer langchain from langchain.memory import BaseMemory from becomer import Client class BecomerMemory(BaseMemory): def __init__(self, api_key): self.mem = Client(api_key) @property def memory_variables(self): return ["history"] def load_memory_variables(self, inputs): mems = self.mem.recall(inputs.get("input", ""), top_k=5) return {"history": "\n".join(mems)} def save_context(self, inputs, outputs): self.mem.store(f"User: {inputs['input']} | AI: {outputs['output']}") def clear(self): self.mem.forget() # Drop into any chain — persistent across sessions from langchain.chains import ConversationChain from langchain_openai import ChatOpenAI chain = ConversationChain( llm=ChatOpenAI(model="gpt-4o"), memory=BecomerMemory("YOUR_API_KEY") )
# pip install becomer llama-index from llama_index.core import VectorStoreIndex, Settings from llama_index.core.memory import ChatMemoryBuffer from becomer import Client mem = Client("YOUR_API_KEY") # Seed context before querying def query_with_memory(engine, user_input): context = mem.recall(user_input, top_k=5) system_prefix = "What you know about this user:\n" + "\n".join(context) Settings.system_prompt = system_prefix response = engine.query(user_input) # Persist new context mem.store(f"User asked: {user_input}. Key points: {str(response)[:300]}") return response
# Works with CrewAI, AutoGen, LangGraph, or any agent framework # Pattern: recall before → run agent → store after from becomer import Client mem = Client("YOUR_API_KEY") def run_agent(user_id: str, task: str): # 1. Load relevant context before agent runs context = mem.recall(task, top_k=5) # 2. Inject into your agent / system prompt agent_result = your_agent.run( task=task, system_context="\n".join(context) ) # 3. Store what was learned for next time mem.store(f"Task: {task} | Result: {agent_result[:200]}") return agent_result # OpenAI / Anthropic direct — same pattern def chat(message: str) -> str: context = "\n".join(mem.recall(message, top_k=5)) response = openai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": f"User context:\n{context}"}, {"role": "user", "content": message} ] ) answer = response.choices[0].message.content mem.store(f"{message} → {answer}") return answer
02 — Benchmarks
Benchmark results

Tested against LongMemEval (n=500) and LOCOMO (n=1,978) — the two industry-standard memory benchmarks. BECOMER matches mem0's LongMemEval score at zero LLM tokens. Same accuracy. 6,787 fewer tokens per query.

Benchmark BECOMER LLM tokens/query Mem0 (June 2026) Mem0 tokens/query
LongMemEval (overall) 94.4% 0 tokens 0 94.4% ~6,787
LME — Temporal reasoning ~93% 0 ~93% ~6,787
LME — Knowledge update ~95% 0 96.2% ~6,787
LME — Multi-session ~87% 0 86.5% ~6,787
LOCOMO (overall) 69.5% retrieval only 0 91.6% ~6,956
LOCOMO — Temporal 76.9% 0 92.8% ~6,956
LOCOMO — Multi-hop 59.6% 0 93.3% ~6,956
How to read these numbers: BECOMER matches mem0's LongMemEval score (94.4%) while using zero LLM tokens per query. Mem0 uses ~6,787 tokens per recall. Same accuracy, 6,787 fewer tokens — every single query. On LOCOMO, systems that add an LLM reasoning pass score higher; BECOMER retrieves the right context and your own LLM reasons on top. We retrieve, you reason.
Full methodology + 30 iterations + competitor comparison →
~150ms
RECALL · P50
~100ms
STORE · P50
<500ms
RECALL · P99
0ms
LLM WAIT TIME

Measured on CPU-only server. No GPU. No LLM calls. Pure retrieval engine.

03 — Architecture
Built differently

Most memory systems run an LLM under the hood — every recall spends tokens. BECOMER doesn't, so there's nothing to pay per query.

🔗
One key. Every LLM. Shared memory.
Your GPT-4 app, Claude app, and LangChain agent all read and write to the same memory store — no syncing, no duplication.
# App 1 — GPT stores a memory mem = Client("bk-same-key") mem.store("User is Sarah, prefers TypeScript") # App 2 — Claude recalls it instantly mem = Client("bk-same-key") mem.recall("who is the user?") # → ["User is Sarah, prefers TypeScript"]
Zero tokens consumed
Storage and retrieval happen in our engine. No LLM calls. No per-query token cost.
🔌
Any LLM, any stack
Works with GPT, Claude, Gemini, Llama, Mistral, or your own fine-tuned model.
🔒
Memory stays in our engine
Your users' memories are never sent to a third-party LLM. Pure retrieval, no data leakage.
📊
30+ benchmark iterations
Zero regressions across all phases. 49/49 tests pass. Built to production quality.
📈
Every run teaches the next one
Every store() compounds. An agent that ran 100 tasks has 100 learnings to draw on for task 101 — at zero extra cost. Your pipeline gets smarter the more it runs.
04 — Use Cases
Built for what AI is becoming.

Single-turn chatbots are yesterday. Agents, pipelines, and self-improving systems need persistent memory that works across sessions, LLMs, and processes. The longer they run, the smarter they get.

Shared memory across agents

Multiple agents working on the same task — researcher, executor, reviewer — share a single memory namespace. No message passing. No coordination code. Zero tokens on every recall.

🔍
Research agent stores findings
GPT, Claude, or any LLM
Executor recalls with zero tokens
Different process, same namespace
Reviewer sees full context
No message passing needed
# Research agent (GPT-4o) mem = Client("key", user_id="task-abc") mem.store("API endpoint: POST /v2/payments, OAuth2") mem.store("Rate limit: 100 req/min, 429 on breach") mem.store("Auth token expires in 3600s") # Executor agent (Claude) — different process # No message passing needed mem = Client("key", user_id="task-abc") ctx = mem.recall("payment API details") # → ["API endpoint: POST /v2/payments, OAuth2", # "Rate limit: 100 req/min...", # "Auth token expires in 3600s"] # Reviewer agent (Gemini) — same namespace audit = mem.recall("what was implemented?")
Systems that learn from themselves

Store every attempt with its outcome. Recall what worked before the next run. Semantic retrieval surfaces similar past attempts even with different phrasing — no structured query language needed.

ITERATION 1
Zero-shot prompt — 71%
ITERATION 2
Chain-of-thought — 84%
ITERATION 3 — recalled best approach
Few-shot + CoT — 91%
# Store every attempt with outcome mem.store("Approach: zero-shot. Score: 71%. " "Weakness: misses edge cases") mem.store("Approach: chain-of-thought. Score: 84%." "Strength: handles multi-step reasoning") # Before next run — recall what worked history = mem.recall( "what approaches scored highest?", top_k=10 ) # Inject learnings into system prompt system = "Previous learnings:\n" + \ "\n".join(history) # System now builds on its own history # Gets smarter every iteration
Pick up where you left off

Agents that run for hours, get interrupted, restart days later — and continue exactly where they stopped. No manual state management. No checkpoint files.

📋
Day 1 — session ends
Progress stored in one call
Day 3 — new session
Full context recalled instantly
Continues from step 7
No files, no checkpoints
# Day 1 — agent works, then gets interrupted mem.store("Completed steps 1-6: schema designed") mem.store("Step 7 in progress: auth middleware") mem.store("Blocked: choosing between authlib " "vs python-jose — evaluate tomorrow") # Day 3 — fresh session, full context restored status = mem.recall("what was I working on?") blockers = mem.recall("what am I blocked on?") # → ["Step 7 in progress: auth middleware"] # → ["Blocked: choosing between authlib...] # Agent continues from exactly step 7 # No files, no checkpoints, no setup
05 — Pricing
Simple pricing

Start free. Scale when you're ready.

Free
/ month
  • 1,000 API calls / month
  • Unlimited memories stored
  • REST API + MCP
  • Dashboard access
Get started free
Team
Coming soon
  • 500,000 API calls / month
  • Unlimited memories
  • Team API keys
  • SLA + dedicated support
06 — Security
Your data, sealed shut.

We are infrastructure, not an audience. Your stored content is encrypted, isolated per account, and never sent to any third party for processing.

🔒
Encrypted at rest
Your memories sit in an encrypted database. Even direct disk access reveals nothing readable.
🚫
No third-party processing
Your data is never sent to any external LLM, AI service, or analytics provider — and never sold or shared. Stored on dedicated cloud infrastructure, isolated per account.
🌐
Compliance built in
DPDP Act 2023 compliant. CCPA-aligned for US users. Your data stays on our infrastructure and never crosses into a third-party pipeline.
🧱
Isolated per account
Database-level isolation means one customer can never read another's memories. Enforced by the engine, not just code.
07 — FAQ
Common questions
BECOMER's retrieval engine handles storage and recall without calling any language model. Your LLM call happens outside BECOMER — we return the relevant context for you to inject into your prompt. No GPT, Claude, or Gemini is invoked inside the memory layer.
Memories are encrypted at rest, isolated per account at the database level, and never sent to any external AI service or analytics provider. One account can never read another's data — enforced by the database engine, not just application code.
Yes — pass a user_id in any API call and BECOMER automatically isolates that user's memories under your master key. One key covers your entire user base.
mem.store("Alice loves TypeScript", user_id="alice-123")
mem.recall("preferences", user_id="alice-123") # → Alice's only
Each user_id is fully isolated — one user can never read another's memories. Billing counts against the master key. Browse and delete per-user memories from the Users tab in your dashboard.
You receive an HTTP 402 response with a quota_exceeded error. Upgrade to Pro from your dashboard to continue. Quotas reset on the 1st of each month.
Yes. Install with npm install @becomerpackage/sdk — zero dependencies, Node 18+, TypeScript types included. Python SDK: pip install becomer.
Run python -m becomer with your BECOMER_API_KEY set and add it to your mcp.json. Claude Desktop and Cursor automatically call store and recall as MCP tools — no code changes needed. For multi-tenant apps, also set BECOMER_USER_ID to scope memory to a specific end-user. Full config in docs.
No. BECOMER is a managed cloud service. There is no self-hosted option at this time.
Your memories are deleted on cancellation. Export everything from the dashboard first using the export button — it downloads all your stored memories as a text file.
How does BECOMER compare to mem0, Zep, and Hindsight?
Side-by-side benchmark scores, token cost, pricing, and feature comparison.
See full comparison →