Ollama is a powerful open-source language model that can be used for a variety of tasks, such as text generation, summarization, and question answering.
Whether you're building a document search system, a technical support chatbot, or a content recommendation engine, this local-first approach provides the perfect foundation for secure, efficient, and cost-effective RAG implementations.
In this article, we'll take a look at how you can use Ollama with Turso to build a local RAG (Retrieval-Augmented Generation) model that works locally, and offline.
RAG has become a crucial technique for enhancing LLM responses with relevant context. Traditional cloud-based RAG solutions come with several challenges that local RAG can address:
Sending sensitive documents to cloud services exposes your data to potential security risks and compliance issues. Local RAG keeps your data entirely within your control, whether you use Turso's Vector on your server or on device.
Cloud vector databases can become expensive as data grows. With Turso's libSQL, you can:
By embedding your database locally, whether on-device or offline, you can:
With Turso's Embedded Databases and Ollama running locally, your RAG architecture remains functional without an internet connection. This makes it ideal for:
Turso's libSQL provides a unified solution for data storage:
For local embedding generation, Ollama provides efficient processing with popular models like Mistral, making it an excellent choice for local-first architectures.
Let's build a local RAG system that stores and retrieves more data. You can adapt this to your use case, such as a local search engine, a chatbot, or a PDF chat bot.
We'll first build the database and cover the retrieval with vector similarity search to show how easy that is, then adapt it to follow a RAG pattern.
First, install the following dependencies:
npm install @libsql/client ollama
Install Ollama locally, and then run:
ollama run mistral
You'll notice below there is very little setup for the database — simply pass the libSQL client a file path, and that's it! There is no extensions to install.
We'll use libSQL Vector that stores its data inside a SQLite file — the vector embeddings will be stored in a BLOB, but the libSQL client will handle the serialization and deserialization for you.
import { createClient } from '@libsql/client';
const client = createClient({
url: 'file:local.db',
});
// Initialize the database schema
await client.batch(
[
// Create table with a vector embedding column (F32_BLOB)
'CREATE TABLE IF NOT EXISTS movies (id INTEGER PRIMARY KEY, title TEXT NOT NULL, description TEXT NOT NULL, embedding F32_BLOB(4096))',
// Create a vector index for similarity search
'CREATE INDEX IF NOT EXISTS movies_embedding_idx ON movies(libsql_vector_idx(embedding))',
],
'write',
);
Let's break down the important parts:
F32_BLOB(4096)
— This column type stores 32-bit floating-point vectors with 4096 dimensions (matching Mistral's embedding size).libsql_vector_idx
— Creates an index optimized for vector similarity search.file:local.db
— Stores everything in a local SQLite file.We'll use Ollama's local API to generate embeddings:
import ollama from 'ollama';
async function getEmbedding(prompt: string) {
const response = await ollama.embeddings({
model: 'mistral',
prompt,
});
return response.embedding;
}
Here's how you can insert documents (in our case, movies) with their embeddings:
async function insertMovie(title: string, description: string) {
const embedding = await getEmbedding(description);
await client.execute({
sql: `
INSERT INTO movies (title, description, embedding)
VALUES (?, ?, vector(?))
`,
args: [title, description, JSON.stringify(embedding)],
});
}
Note the vector()
SQL function — this converts the JSON array of embeddings into libSQL's vector format.
The real power comes from finding similar documents using vector similarity search:
async function findSimilarMovies(description: string, limit = 3) {
const queryEmbedding = await getEmbedding(description);
const results = await client.execute({
sql: `
WITH vector_scores AS (
SELECT
rowid as id,
title,
description,
embedding,
1 - vector_distance_cos(embedding, vector32(?)) AS similarity
FROM movies
ORDER BY similarity DESC
LIMIT ?
)
SELECT id, title, description, similarity
FROM vector_scores
`,
args: [JSON.stringify(queryEmbedding), limit],
});
return results.rows;
}
Let's break down the important parts about the search:
vector_distance_cos
calculates cosine similarity between vectors.1 - distance
converts distance to similarity (higher is better).vector32(?)
converts the JSON embedding array to a vector.// Insert some sample movies
const sampleMovies = [
{
title: 'Inception',
description:
'A thief who enters the dreams of others to steal secrets from their subconscious.',
},
{
title: 'The Matrix',
description:
'A computer programmer discovers that reality as he knows it is a simulation created by machines.',
},
{
title: 'Interstellar',
description:
'Astronauts travel through a wormhole in search of a new habitable planet for humanity.',
},
];
for (const movie of sampleMovies) {
await insertMovie(movie.title, movie.description);
console.log(`Inserted: ${movie.title}`);
}
If you wanted to stop here, and just focus on semantic search, you could use the above code to build a local search engine like this:
const query =
'A sci-fi movie about virtual reality and artificial intelligence';
const similarMovies = await findSimilarMovies(query);
console.log('\nSimilar movies found:');
similarMovies.forEach((movie) => {
console.log(`\nTitle: ${movie.title}`);
console.log(`Description: ${movie.description}`);
console.log(`Similarity: ${movie.similarity.toFixed(4)}`);
});
But let's take it a step further and make it RAG.
Now use the findSimilarMovies
function and pass the context to Ollama to generate a response for the users prompt.
async function generateResponse(query: string) {
const similarMovies = await findSimilarMovies(query);
const context = similarMovies
.map(
(movie) =>
`Title: ${movie.title}\nDescription: ${
movie.description
}\nSimilarity Score: ${movie.similarity.toFixed(4)}`,
)
.join('\n\n');
const prompt = `
You are a knowledgeable movie expert.
Use the following movie information to answer the user's question.
Only use information from the provided context.
If the context doesn't contain enough information to answer fully, acknowledge this limitation.
Context:
${context}
User Question: ${query}
Instructions:
1. Base your response only on the movies provided in the context
2. Consider the similarity scores when weighing the relevance of each movie but don't reference it in your response
3. If a movie is only tangentially related, mention this
4. If the context doesn't provide enough information, acknowledge this limitation
Response:`;
const response = await ollama.chat({
model: 'mistral',
messages: [
{
role: 'user',
content: prompt,
},
],
});
return {
response: response.message.content,
sourceDocuments: similarMovies,
};
}
The key difference is that RAG doesn't just find similar documents — it uses those documents as context for the LLM to generate a more informed response.
const result = await generateResponse(
'What movies involve artificial intelligence?',
);
console.log('\nGenerated Response:', result.response);
console.log('\nSource Documents:', result.sourceDocuments);
You should the generated response looks something like this:
In the movies provided within the given context, "The Matrix" might be the one that hints at the concept of artificial intelligence. While not explicitly stating AI, it presents a reality that is a simulation created by machines, which could be interpreted as a form of advanced artificial intelligence.
This local RAG implementation offers a powerful combination of features that make it an excellent choice for many applications:
Privacy and Security
Performance Benefits
Cost and Resource Efficiency
Developer Experience
Flexibility and Extensibility
By combining libSQL's vector capabilities with Ollama's local embeddings, you can build powerful, privacy-preserving applications that run entirely on your own infrastructure — without the need for external cloud providers.
Think of Turso like iCloud for your vector database. Just as iCloud seamlessly syncs your Notes, Reminders, and Documents across all your Apple devices, Turso handles the synchronization of your vector embeddings and associated data across multiple instances of your application.
Everything in this post is built with the libSQL client locally, using a single SQLite file. Extending this guide to use Turso is straightforward, and comes with many benefits:
Multi-Device Synchronization
Tenant Isolation
Hybrid Architecture
Updating our RAG code to work with Turso is simple — replace the url
with your Turso database URL, and provide the path to your local SQLite file to the syncUrl
property:
import { createClient } from '@libsql/client';
const client = createClient({
url: 'libsql://[your-database].turso.io',
authToken: '...',
// Enable local sync
syncUrl: 'file:local.db',
});
Your application now gets the best of both worlds:
This architecture opens up exciting possibilities for AI applications:
Sign up for free to start building your offline-capable AI applications today.