How to build a retrieval system for agents

Every time you start a new session with your AI coding agent, it starts from scratch.

It doesn't remember that you use pnpm, not npm. It doesn't remember that the auth middleware is in src/middleware/auth.rs, not src/auth/. It doesn't remember that you spent 45 minutes last Tuesday correcting it on exactly that. Every session, the same mistakes, the same detours, the same corrections.

Frustrated with that, I pointed to my agent that he was stupid and couldn't remember sh**. After saying I was absolutely right, we embarked on a journey together to build a solution to this that has been working tremendously well for me: memelord

Memelord is a fully local memory system for agents built with Turso and a local model. In this post, we will investigate how it works, what worked well, and how you too can use some of the insights here to build a similar system.

#The Problem With Stateless Agents

There's an obvious fix for the agentic memory issue that most people reach for: just put everything in .md files that will end up in the system prompt. But that doesn't scale. Your context window is a finite resource, and dumping every lesson the agent has ever learned into it is a terrible use of it. You end up either choking the context with irrelevant noise or trimming it so aggressively that you lose the things that matter.

What you actually need is retrieval. You want relevant memories injected at the right moment, when the agent starts a task. You want memories that improve based on whether they were actually useful. And you want them to persist across sessions without any ceremony.

#How Memelord Works

Memelord is an in-process memory system for coding agents. Every project gets its own SQLite database (stored in .memelord/memory.db), powered by Turso. Memories are embedded locally using all-MiniLM-L6-v2 (no API key required) and retrieved via vector similarity search at task start.

The memory lifecycle looks like this:

Session starts
  |
  +-- Top memories injected into context (SessionStart hook)
  |
  +-- Agent calls memory_start_task("fix the auth bug")
  |     \-- Vector search retrieves relevant memories:
  |         "Auth middleware is in src/middleware/auth.rs, not src/auth/"
  |         "Always run 'make check' before committing"
  |
  +-- Agent works on the task...
  |
  +-- Agent self-corrects and adapts --> memory_report(type: "correction", "insight", etc)
  |     "Tried src/config.json but config is actually in .env.local"
  |
  +-- Agent finishes --> memory_end_task(ratings)
  |     Rates each retrieved memory 0-3 (ignored -> directly applied)
  |
  \-- SessionEnd: embed new memories, run weight decay

There are many kinds of memories the agents store:

correction, when something is wrong and leads it astray.
insight when it learns something after spending a high amount of tokens on the task (configurable threshold).
user when the user sends an explicit request for storage.

Memories are not equal. Each one carries a weight that changes over time based on feedback. Memories that consistently help get promoted. Memories that turn out to be wrong or irrelevant get demoted and eventually garbage collected.

Each memory has a weight, updated via exponential moving average based on how useful the agent found it. A memory the agent consistently rates as directly applied (score 3) survives and gets promoted. A memory that consistently gets rated as irrelevant (score 0) decays. A memory the agent flags as actively wrong gets deleted immediately via memory_contradict.

There's also a time decay component: memories that go unused across sessions gradually lose weight, even if they were once useful. Projects evolve. What was true six months ago may not be true today.

#Turso: Why and How?

Turso is a full rewrite from scratch of the beloved database SQLite. Among the many things it adds, is native support for Vector Search without the need for any external dependency.

Memelord uses Turso's native vector32 (or optionally vector8, for a more compact, 8-bit representation) type and cosine similarity to do semantic retrieval at task start. This is what makes memory retrieval actually useful. You're asking "which past experiences are most relevant to what I'm trying to do right now?" and getting a ranked answer, not just keyword matches.

The query looks like this:

SELECT id, content, weight, category
FROM memories
ORDER BY vector_distance_cos(embedding, vector32(?))
LIMIT 10;

That's it. No external service, no extensions, no embeddings API. The vector index lives right next to the memories in a single .db file, per project, on disk.

SQLite has always had the right shape for this kind of thing: a file you can carry around, copy, snapshot, and reason about. Turso extends that with the vector primitives that modern agentic workflows require.

#Setting It Up

Getting memelord running with Claude Code takes about 30 seconds:

npm install -g memelord
cd your-project
memelord init

Then restart Claude Code. That's it.

memelord init sets up an MCP server (.mcp.json) so the agent can call memory tools, hooks in ~/.claude/settings.json to instrument the agent lifecycle, and a .memelord/ directory for the database.

The hooks do the heavy lifting automatically. At session start, relevant memories get injected into context. At session end, new memories are embedded and weights decay. The agent just needs to call two MCP tools:

memory_start_task("fix the auth bug")   // at the beginning of a task
memory_end_task(taskId, ratings)        // when done, rating each retrieved memory

Tracking self-corrections, detecting expensive explorations, flagging tool failures: all of that is handled by the hooks in the background.

#Using the SDK Directly

If you're building your own agent rather than using Claude Code, the SDK gives you full control. The only thing you need to bring is an embedding function. This is to keep the SDK slim, and avoid a mandatory dependency on all-MiniLM-L6-v2. Not to mention, to give the user the ability to use stronger models:

import { createMemoryStore } from "memelord";

const store = createMemoryStore({
  dbPath: ".memelord/memory.db",
  sessionId: crypto.randomUUID(),
  embed: async (text) => {
    // bring your own embedding function
    return yourEmbedFunction(text);
  },
});

await store.init();

// At the start of a task
const { taskId, memories } = await store.startTask("refactor the payment module");

// memories is already ranked by relevance
for (const m of memories) {
  console.log(`[${m.category}] ${m.content}`);
}

// Store a correction mid-task
await store.reportCorrection({
  lesson: "Payment config is in src/payments/config.ts, not at the root",
  whatFailed: "Looked in config.json",
  whatWorked: "Found it in src/payments/config.ts",
});

// End the task with ratings
await store.endTask(taskId, {
  tokensUsed: 18000,
  toolCalls: 42,
  completed: true,
  selfReport: memories.map((m) => ({ memoryId: m.id, score: 2 })),
});

#Summary

A lot of the conversation around making coding agents better focuses on the model. That matters, but for day-to-day coding work, a lot of the friction comes from the agent having no persistent context about your specific codebase or how you like to work.

A SQLite file per project, with vector search and a feedback loop, is about as simple as this problem can be solved.

If you are interested in memelord, go check it out: github.com/glommer/memelord. Or if you want to build your own version of this, check out Turso today Turso!