feat(knowledge): add Ollama embedding provider support#3714
feat(knowledge): add Ollama embedding provider support#3714teedonk wants to merge 59 commits intosimstudioai:stagingfrom
Conversation
|
@teedonk is attempting to deploy a commit to the Sim Team on Vercel. A member of the Team first needs to authorize it. |
PR SummaryHigh Risk Overview Knowledge base creation now accepts Search is updated to route queries per KB provider, generating embeddings per provider (and per unique Ollama model+URL), querying per-KB tables for Ollama, normalizing distances only when mixing providers, and tracking cost only for OpenAI calls. Tests and types are updated accordingly, and deletion clarifies that per-KB tables are not dropped on soft delete. Written by Cursor Bugbot for commit dbabedd. This will update automatically on new commits. Configure here. |
Greptile SummaryThis PR adds Ollama as a fully offline/private embedding provider for knowledge bases. It introduces per-KB dynamic pgvector tables ( Key design decisions:
Issues from previous reviews that are now addressed:
Remaining findings:
Confidence Score: 5/5PR is safe to merge; all previously raised P0/P1 issues have been addressed and the remaining findings are minor style/hardening suggestions. All critical issues from prior review rounds (non-atomic delete+insert, orphaned KB row, sql.raw injection, status-update regression, native select element) have been resolved in this iteration. The remaining findings are P2: an overly broad SSRF allowlist suffix, a double tag-definition fetch, and a missing safety comment on sql.raw DDL. None of these affect correctness or data integrity in the current implementation. apps/sim/app/api/knowledge/route.ts (SSRF allowlist), apps/sim/app/api/knowledge/search/route.ts (duplicate tag definition fetch) Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as Create KB Modal
participant API as POST /api/knowledge
participant Ollama as Ollama /api/show
participant DB as Database
participant PG as PostgreSQL
UI->>API: POST embeddingModel ollama/nomic-embed-text
API->>Ollama: POST /api/show validate model and detect dimension
Ollama-->>API: contextLength and embeddingLength
API->>DB: INSERT knowledgeBase row
API->>PG: CREATE TABLE kb_embeddings_uuid vector dimension
alt Table creation fails
API->>PG: DROP TABLE kb_embeddings_uuid
API->>DB: DELETE knowledgeBase row
API-->>UI: 500 error
else Success
API-->>UI: 201 with id and embeddingDimension
end
participant DocSvc as Document Service
participant Embed as Ollama /api/embed
DocSvc->>Ollama: GET context length cached 5min
Ollama-->>DocSvc: contextLength
DocSvc->>DocSvc: Cap chunkSize to 30pct of contextLength
DocSvc->>Embed: POST model and chunks batch of 100
Embed-->>DocSvc: embeddings array
DocSvc->>DB: transaction DELETE old then INSERT into kb_embeddings_uuid then UPDATE status
Reviews (2): Last reviewed commit: "fix: batch Ollama embeddings by item cou..." | Re-trigger Greptile |
| if (embeddingProvider === 'ollama') { | ||
| // Per-KB table: delete old chunks then bulk-insert new ones | ||
| await deleteKBDocumentEmbeddings(knowledgeBaseId, documentId) | ||
| await insertKBEmbeddings(knowledgeBaseId, embeddingRecords, kb[0].embeddingDimension) |
There was a problem hiding this comment.
Non-atomic delete + insert risks data loss
The Ollama path deletes all existing embeddings for the document and then inserts new ones without a wrapping transaction. If insertKBEmbeddings fails mid-way (e.g., after a partial batch insert), the old embeddings have already been deleted but the new ones are only partially written — leaving the document with fewer embeddings or none at all, with no way to recover automatically.
The OpenAI path below correctly wraps both operations in a db.transaction. Consider wrapping the Ollama path similarly. Since deleteKBDocumentEmbeddings and insertKBEmbeddings both use db.execute / db.insert, they should participate in a transaction too.
| const newKnowledgeBase = await createKnowledgeBase(createData, requestId) | ||
|
|
||
| if (provider === 'ollama') { | ||
| try { | ||
| await createKBEmbeddingTable(newKnowledgeBase.id, effectiveDimension) | ||
| } catch (tableError) { | ||
| logger.error( | ||
| `[${requestId}] Failed to create embedding table for KB ${newKnowledgeBase.id}`, | ||
| tableError | ||
| ) | ||
| throw tableError | ||
| } |
There was a problem hiding this comment.
Orphaned KB row when embedding table creation fails
createKnowledgeBase persists the KB row to the database at line 136 before createKBEmbeddingTable is called. If createKBEmbeddingTable throws (e.g. the pgvector extension isn't enabled or a naming collision occurs), the error is re-thrown and the request returns 500 — but the KB record is left behind in the database without a corresponding embedding table. Any subsequent document upload to this KB will fail with a table-not-found error and the user has no way to fix it from the UI.
A safe fix is to delete the orphaned KB row in the catch block before re-throwing:
} catch (tableError) {
logger.error(...)
// Clean up orphaned KB row
await deleteKnowledgeBase(newKnowledgeBase.id, requestId).catch(() => {})
throw tableError
}| // Use drizzle's insert API with dynamic table schema | ||
| await db.insert(dynamicTable).values(batch) | ||
| } catch (err: unknown) { | ||
| const pg = err as { code?: string; detail?: string; message?: string; cause?: unknown } | ||
| logger.error(`insertKBEmbeddings failed for table ${table}`, { | ||
| code: pg.code, | ||
| detail: pg.detail, | ||
| message: pg.message, | ||
| cause: pg.cause, |
There was a problem hiding this comment.
Vector literal injected via
sql.raw — fragile and risky pattern
queryVector is constructed by the caller as JSON.stringify(emb) where emb is a number[], so in practice it contains no single quotes. However, interpolating it directly into a sql.raw string template is fragile: a future refactor that passes a different value could silently introduce a SQL injection or at minimum a syntax error.
Drizzle's parameterized approach is safer and just as performant:
// Instead of:
const vecLiteral = sql.raw(`'${queryVector}'::vector`)
// Prefer:
const vecLiteral = sql`${queryVector}::vector`Using a Drizzle placeholder ensures the value is always parameterized correctly, regardless of its content.
| await db | ||
| .update(document) | ||
| .set({ | ||
| chunkCount: processed.metadata.chunkCount, | ||
| tokenCount: processed.metadata.tokenCount, | ||
| characterCount: processed.metadata.characterCount, | ||
| processingStatus: 'completed', | ||
| processingCompletedAt: now, | ||
| processingError: null, | ||
| }) | ||
| .where(eq(document.id, documentId)) |
There was a problem hiding this comment.
Document status update moved outside the transaction (regression)
In the original code, both the embedding inserts and the processingStatus: 'completed' update were inside a single db.transaction, so they were atomic. This PR extracts the status update to a separate await db.update(document)... call that runs after the transaction.
For the OpenAI path, if the transaction (embedding inserts) succeeds but the subsequent status update fails, the document stays permanently in 'processing' state — even though its embeddings are fully in place. There's no retry that would recover this (the catch block at line 656 sets status to 'failed', which is also incorrect since the embeddings are already there).
This is a regression for the OpenAI path that was introduced by the refactor to support the Ollama code path. The processingStatus update should remain inside the db.transaction for the OpenAI path.
| onChange={(e) => { | ||
| const preset = OLLAMA_PRESET_MODELS.find( | ||
| (m) => m.value === e.target.value | ||
| ) | ||
| setOllamaPreset(e.target.value) | ||
| if (preset && preset.dimension > 0) { | ||
| setOllamaDimension(preset.dimension) | ||
| } | ||
| }} | ||
| className='rounded-[4px] border border-[var(--border-1)] bg-[var(--surface-1)] px-[10px] py-[6px] text-[12px] text-[var(--text-primary)]' | ||
| > | ||
| {OLLAMA_PRESET_MODELS.map((m) => ( | ||
| <option key={m.value} value={m.value}> | ||
| {m.label} | ||
| </option> | ||
| ))} | ||
| </select> | ||
| </div> | ||
|
|
There was a problem hiding this comment.
Native
<select> instead of the component library's Select
The rest of the form uses components from the existing @/components/ui library (e.g. Input, Label, Button). This <select> is a raw HTML element with hand-written Tailwind classes for the border/background/text, which won't automatically respect dark mode tokens or any future design-system updates.
Consider replacing it with the project's Select / SelectContent / SelectItem / SelectTrigger pattern used elsewhere in the codebase.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| baseUrl = 'http://localhost:11434' | ||
| ): Promise<OllamaModelInfo> { | ||
| const cacheKey = `${modelName}@${baseUrl}` | ||
| const cached = ollamaModelInfoCache.get(cacheKey) |
There was a problem hiding this comment.
Module-level cache ineffective in serverless/edge deployments
ollamaModelInfoCache is a module-level Map. In serverless environments (Next.js API routes deployed to Vercel, AWS Lambda, etc.) each invocation may spawn a fresh process where the module is re-initialized, so the cache is effectively a no-op. Additionally, multiple concurrent invocations will bypass the cache and issue redundant /api/show calls.
If the project already uses a shared cache layer (e.g. Redis via lib/core/storage), consider using it here. Otherwise, this is worth documenting as a known limitation so future maintainers aren't surprised.
…t silent failures
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Summary
/api/showendpointFixes (#3037)
Type of Change
Testing
Tested with Ollama (nomic-embed-text) on local setup:
Checklist
Video.Project.1.mp4
ment-cla)