perf: eliminate N+1 queries and add bounded caches in hsg_query#1
Draft
perf: eliminate N+1 queries and add bounded caches in hsg_query#1
Conversation
Co-authored-by: nilhemdot <262599666+nilhemdot@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Analyze and improve slow code performance
perf: eliminate N+1 queries and add bounded caches in hsg_query
Mar 8, 2026
nilhemdot
approved these changes
Mar 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 Description
The
hsg_queryhot path issued up to 4N sequential DB round-trips per query (N = candidate memory count) due to per-record fetches in the scoring loop, tag matching, vector retrieval, and feedback updates. Also,cacheandsal_cachewere unbounded Maps that grew indefinitely on long-running servers.Core fixes
Batch DB fetching (
db.ts,vector_store.ts,postgres.ts,valkey.ts)get_mems_by_idsto both SQLite (WHERE id IN (?,...)) and PostgreSQL (WHERE id = ANY($1)) backendsgetVectorsByIdstoVectorStoreinterface + all implementations:IN/ANYqueryvec:{sector}:{id}keys upfront, fetches with a single pipeline round-triphsg_queryloop refactor (hsg.ts)Before: 4N+ sequential DB calls inside the scoring loop
After: 2 batch queries before the loop, pure in-memory scoring
Other improvements
calc_multi_vec_fusion_score: accepts pre-fetchedvecs[]instead of fetching by ID; now synchronouscompute_tag_match_score: acceptstags_jsondirectly instead of re-fetching bymemory_id; now synchronousPromise.allfor concurrent writes (was N sequential fetches + N sequential writes)get_mems_by_idsbatch +Promise.allconcurrent updatesCACHE_MAX = 500andSAL_CACHE_MAX = 2000with FIFO eviction — matches existing pattern used byseg_cache🔄 Type of Change
🧪 Testing
📱 Screenshots (if applicable)
N/A
🔍 Code Review Checklist
🚀 Deployment Notes
No schema changes. No config changes required. Drop-in replacement — existing behavior is preserved.
📋 Additional Context
All changes are backward-compatible. The
VectorStoreinterface gains a newgetVectorsByIdsmethod — any custom implementations outside this repo will need to add it.🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.