Skip to content

feat(knowledge): add Ollama embedding provider support#3714

Open
teedonk wants to merge 59 commits intosimstudioai:stagingfrom
teedonk:feat/ollama-embedding-support
Open

feat(knowledge): add Ollama embedding provider support#3714
teedonk wants to merge 59 commits intosimstudioai:stagingfrom
teedonk:feat/ollama-embedding-support

Conversation

@teedonk
Copy link
Copy Markdown

@teedonk teedonk commented Mar 22, 2026

Summary

  • Adds Ollama as an embedding provider for knowledge bases, enabling fully offline/private RAG workflows
  • Implements per-KB dynamic pgvector tables to handle variable embedding dimensions across Ollama models
  • Auto-detects embedding dimension and context length from Ollama's /api/show endpoint
  • Adds provider selection UI with preset models (nomic-embed-text, mxbai-embed-large, all-minilm, etc.)

Fixes (#3037)

  • Per-KB embedding tables solve the variable dimension problem
  • Context-aware chunking and smart batching prevent context length overflow
  • Cross-provider search normalizes distance scores across OpenAI and Ollama KBs
  • All existing OpenAI/Azure paths remain untouched

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

Tested with Ollama (nomic-embed-text) on local setup:

  • Created KB with Ollama provider — dimension auto-detected
  • Uploaded small and large documents (40+ pages) — all chunks processed successfully
  • Search returns relevant results across both OpenAI and Ollama KBs
  • All 4267 existing tests pass, zero TypeScript errors, Biome lint clean

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the [Contributor License Agreement (CLA)](./CONTRIBUTING.md#contributor-license-agree
Video.Project.1.mp4

ment-cla)

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 22, 2026

@teedonk is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@cursor
Copy link
Copy Markdown

cursor bot commented Mar 22, 2026

PR Summary

High Risk
High risk because it adds a new embedding provider path that changes how embeddings are generated, stored (dynamic per-KB tables), and searched, plus introduces new network calls and validation for Ollama URLs/models that could impact reliability and security.

Overview
Adds Ollama (local) embedding provider support for knowledge bases, including UI selection, API validation, and end-to-end document ingestion/search for non-OpenAI embeddings.

Knowledge base creation now accepts ollama/* models and optional ollamaBaseUrl, validates the Ollama model via /api/show, auto-corrects embedding dimensions, and creates a dedicated per-KB kb_embeddings_{id} pgvector table; failures roll back via a new hardDeleteKnowledgeBase. Document processing and embedding generation are updated to route to OpenAI vs Ollama, cap chunk sizes based on Ollama context length, and write embeddings to either the shared embedding table (OpenAI) or the per-KB table (Ollama).

Search is updated to route queries per KB provider, generating embeddings per provider (and per unique Ollama model+URL), querying per-KB tables for Ollama, normalizing distances only when mixing providers, and tracking cost only for OpenAI calls. Tests and types are updated accordingly, and deletion clarifies that per-KB tables are not dropped on soft delete.

Written by Cursor Bugbot for commit dbabedd. This will update automatically on new commits. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR adds Ollama as a fully offline/private embedding provider for knowledge bases. It introduces per-KB dynamic pgvector tables (kb_embeddings_{uuid}) to handle variable embedding dimensions across Ollama models, auto-detects model dimension and context length via Ollama's /api/show endpoint, adds provider selection UI, and wires cross-provider search with score normalization when mixing OpenAI and Ollama KBs.

Key design decisions:

  • Per-KB tables: Each Ollama KB gets its own kb_embeddings_{uuid} table with the exact vector dimension, avoiding dimension conflicts in a shared table.
  • ollamaBaseUrl is stored inside the chunkingConfig JSONB field to avoid a schema migration — pragmatic but couples embedding config to chunking config.
  • Chunk size is capped at 30% of the Ollama model's context length (using a conservative 3 chars/token estimate) to stay safely within context limits.
  • Score normalization (min-max to [0, 1]) is applied only when mixing OpenAI and Ollama results, leaving single-provider similarity scores unchanged.
  • Soft-delete preserves the per-KB table for restore compatibility; hardDeleteKnowledgeBase is scoped to creation-rollback only.

Issues from previous reviews that are now addressed:

  • ✅ Non-atomic delete + insert for Ollama embeddings — now wrapped in db.transaction
  • ✅ Orphaned KB row on table creation failure — hard-deleted in catch block
  • sql.raw for queryVector — replaced with parameterized sql\\${queryVector}::vector\``
  • ✅ Document status update moved back inside the transaction
  • ✅ Native <select> replaced with the Select component from the UI library

Remaining findings:

  • The ollamaBaseUrl SSRF allowlist accepts any *.internal hostname, broader than needed for Docker use cases.
  • Tag definitions are fetched twice per KB per search request — a minor redundancy worth caching across the two passes.
  • The sql.raw usage in createKBEmbeddingTable is safe (dimension is validated) but lacks an explicit comment documenting why.

Confidence Score: 5/5

PR is safe to merge; all previously raised P0/P1 issues have been addressed and the remaining findings are minor style/hardening suggestions.

All critical issues from prior review rounds (non-atomic delete+insert, orphaned KB row, sql.raw injection, status-update regression, native select element) have been resolved in this iteration. The remaining findings are P2: an overly broad SSRF allowlist suffix, a double tag-definition fetch, and a missing safety comment on sql.raw DDL. None of these affect correctness or data integrity in the current implementation.

apps/sim/app/api/knowledge/route.ts (SSRF allowlist), apps/sim/app/api/knowledge/search/route.ts (duplicate tag definition fetch)

Important Files Changed

Filename Overview
apps/sim/lib/knowledge/dynamic-tables.ts New file: per-KB pgvector table lifecycle (create/drop/insert/search) with safe UUID-to-table-name conversion and validated dimension; sql.raw usage is safe given upfront validation but could benefit from an explicit safety comment.
apps/sim/app/api/knowledge/route.ts POST creates per-KB table after inserting the KB row, with a cleanup block that hard-deletes the orphaned row on table-creation failure; ollamaBaseUrl SSRF validation is solid but *.internal allowance is broader than needed.
apps/sim/app/api/knowledge/search/route.ts Routes search to per-KB tables for Ollama and shared embedding table for OpenAI; generates one query embedding per unique (model, URL) pair; normalizes distances only when mixing providers; tag definitions fetched twice per KB.
apps/sim/lib/knowledge/embeddings.ts Adds Ollama embedding generation via /api/embed with token-aware batching, context-length caching, and chunk truncation; module-level cache is a no-op in serverless (acknowledged in previous review).
apps/sim/lib/knowledge/documents/service.ts Embedding generation and insertion wrapped in a single db.transaction for both Ollama (per-KB table) and OpenAI paths; document status update moved inside the transaction; chunk size capped at 30% of Ollama context length.
apps/sim/lib/knowledge/service.ts Adds hardDeleteKnowledgeBase (creation-rollback only) and preserves ollamaBaseUrl in chunkingConfig JSONB during updates by reading the existing value before writing.
apps/sim/app/workspace/[workspaceId]/knowledge/components/create-base-modal/create-base-modal.tsx Adds provider toggle and Ollama-specific fields using the existing component library (Select, Input, Label); previous native select concern was addressed.
apps/sim/lib/chunkers/text-chunker.ts Adds Ollama-aware token estimation ratio (3 chars/token vs 4 for OpenAI) to produce smaller chunks that fit within model context limits.

Sequence Diagram

sequenceDiagram
    participant UI as Create KB Modal
    participant API as POST /api/knowledge
    participant Ollama as Ollama /api/show
    participant DB as Database
    participant PG as PostgreSQL

    UI->>API: POST embeddingModel ollama/nomic-embed-text
    API->>Ollama: POST /api/show validate model and detect dimension
    Ollama-->>API: contextLength and embeddingLength
    API->>DB: INSERT knowledgeBase row
    API->>PG: CREATE TABLE kb_embeddings_uuid vector dimension
    alt Table creation fails
        API->>PG: DROP TABLE kb_embeddings_uuid
        API->>DB: DELETE knowledgeBase row
        API-->>UI: 500 error
    else Success
        API-->>UI: 201 with id and embeddingDimension
    end

    participant DocSvc as Document Service
    participant Embed as Ollama /api/embed

    DocSvc->>Ollama: GET context length cached 5min
    Ollama-->>DocSvc: contextLength
    DocSvc->>DocSvc: Cap chunkSize to 30pct of contextLength
    DocSvc->>Embed: POST model and chunks batch of 100
    Embed-->>DocSvc: embeddings array
    DocSvc->>DB: transaction DELETE old then INSERT into kb_embeddings_uuid then UPDATE status
Loading

Reviews (2): Last reviewed commit: "fix: batch Ollama embeddings by item cou..." | Re-trigger Greptile

Comment on lines +611 to +614
if (embeddingProvider === 'ollama') {
// Per-KB table: delete old chunks then bulk-insert new ones
await deleteKBDocumentEmbeddings(knowledgeBaseId, documentId)
await insertKBEmbeddings(knowledgeBaseId, embeddingRecords, kb[0].embeddingDimension)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-atomic delete + insert risks data loss

The Ollama path deletes all existing embeddings for the document and then inserts new ones without a wrapping transaction. If insertKBEmbeddings fails mid-way (e.g., after a partial batch insert), the old embeddings have already been deleted but the new ones are only partially written — leaving the document with fewer embeddings or none at all, with no way to recover automatically.

The OpenAI path below correctly wraps both operations in a db.transaction. Consider wrapping the Ollama path similarly. Since deleteKBDocumentEmbeddings and insertKBEmbeddings both use db.execute / db.insert, they should participate in a transaction too.

Comment on lines 136 to +147
const newKnowledgeBase = await createKnowledgeBase(createData, requestId)

if (provider === 'ollama') {
try {
await createKBEmbeddingTable(newKnowledgeBase.id, effectiveDimension)
} catch (tableError) {
logger.error(
`[${requestId}] Failed to create embedding table for KB ${newKnowledgeBase.id}`,
tableError
)
throw tableError
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Orphaned KB row when embedding table creation fails

createKnowledgeBase persists the KB row to the database at line 136 before createKBEmbeddingTable is called. If createKBEmbeddingTable throws (e.g. the pgvector extension isn't enabled or a naming collision occurs), the error is re-thrown and the request returns 500 — but the KB record is left behind in the database without a corresponding embedding table. Any subsequent document upload to this KB will fail with a table-not-found error and the user has no way to fix it from the UI.

A safe fix is to delete the orphaned KB row in the catch block before re-throwing:

} catch (tableError) {
  logger.error(...)
  // Clean up orphaned KB row
  await deleteKnowledgeBase(newKnowledgeBase.id, requestId).catch(() => {})
  throw tableError
}

Comment on lines +238 to +246
// Use drizzle's insert API with dynamic table schema
await db.insert(dynamicTable).values(batch)
} catch (err: unknown) {
const pg = err as { code?: string; detail?: string; message?: string; cause?: unknown }
logger.error(`insertKBEmbeddings failed for table ${table}`, {
code: pg.code,
detail: pg.detail,
message: pg.message,
cause: pg.cause,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Vector literal injected via sql.raw — fragile and risky pattern

queryVector is constructed by the caller as JSON.stringify(emb) where emb is a number[], so in practice it contains no single quotes. However, interpolating it directly into a sql.raw string template is fragile: a future refactor that passes a different value could silently introduce a SQL injection or at minimum a syntax error.

Drizzle's parameterized approach is safer and just as performant:

// Instead of:
const vecLiteral = sql.raw(`'${queryVector}'::vector`)

// Prefer:
const vecLiteral = sql`${queryVector}::vector`

Using a Drizzle placeholder ensures the value is always parameterized correctly, regardless of its content.

Comment on lines +628 to +638
await db
.update(document)
.set({
chunkCount: processed.metadata.chunkCount,
tokenCount: processed.metadata.tokenCount,
characterCount: processed.metadata.characterCount,
processingStatus: 'completed',
processingCompletedAt: now,
processingError: null,
})
.where(eq(document.id, documentId))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Document status update moved outside the transaction (regression)

In the original code, both the embedding inserts and the processingStatus: 'completed' update were inside a single db.transaction, so they were atomic. This PR extracts the status update to a separate await db.update(document)... call that runs after the transaction.

For the OpenAI path, if the transaction (embedding inserts) succeeds but the subsequent status update fails, the document stays permanently in 'processing' state — even though its embeddings are fully in place. There's no retry that would recover this (the catch block at line 656 sets status to 'failed', which is also incorrect since the embeddings are already there).

This is a regression for the OpenAI path that was introduced by the refactor to support the Ollama code path. The processingStatus update should remain inside the db.transaction for the OpenAI path.

Comment on lines +477 to +495
onChange={(e) => {
const preset = OLLAMA_PRESET_MODELS.find(
(m) => m.value === e.target.value
)
setOllamaPreset(e.target.value)
if (preset && preset.dimension > 0) {
setOllamaDimension(preset.dimension)
}
}}
className='rounded-[4px] border border-[var(--border-1)] bg-[var(--surface-1)] px-[10px] py-[6px] text-[12px] text-[var(--text-primary)]'
>
{OLLAMA_PRESET_MODELS.map((m) => (
<option key={m.value} value={m.value}>
{m.label}
</option>
))}
</select>
</div>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Native <select> instead of the component library's Select

The rest of the form uses components from the existing @/components/ui library (e.g. Input, Label, Button). This <select> is a raw HTML element with hand-written Tailwind classes for the border/background/text, which won't automatically respect dark mode tokens or any future design-system updates.

Consider replacing it with the project's Select / SelectContent / SelectItem / SelectTrigger pattern used elsewhere in the codebase.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

baseUrl = 'http://localhost:11434'
): Promise<OllamaModelInfo> {
const cacheKey = `${modelName}@${baseUrl}`
const cached = ollamaModelInfoCache.get(cacheKey)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Module-level cache ineffective in serverless/edge deployments

ollamaModelInfoCache is a module-level Map. In serverless environments (Next.js API routes deployed to Vercel, AWS Lambda, etc.) each invocation may spawn a fresh process where the module is re-initialized, so the cache is effectively a no-op. Additionally, multiple concurrent invocations will bypass the cache and issue redundant /api/show calls.

If the project already uses a shared cache layer (e.g. Redis via lib/core/storage), consider using it here. Otherwise, this is worth documenting as a known limitation so future maintainers aren't surprised.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@waleedlatif1 waleedlatif1 deleted the branch simstudioai:staging April 3, 2026 23:01
@waleedlatif1 waleedlatif1 reopened this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants