feat(inference): multi-route proxy with alias-based model routing#618
feat(inference): multi-route proxy with alias-based model routing#618cosmicnet wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO document and I hereby sign the DCO. |
There was a problem hiding this comment.
Pull request overview
Adds multi-route inference proxying so sandboxes can route inference.local requests to multiple LLM backends by using a model alias in the request body.
Changes:
- Extends the inference proto + gateway storage to support multiple
(alias, provider_name, model_id)entries per route. - Adds alias-first route selection in the router and passes a
model_hintextracted from sandbox request bodies. - Expands sandbox L7 inference patterns and adds an Ollama provider profile + endpoint validation probe.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| proto/inference.proto | Adds InferenceModelEntry and models fields for multi-model inference config. |
| crates/openshell-server/src/inference.rs | Implements multi-model upsert + resolves each alias into separate ResolvedRoute entries. |
| crates/openshell-sandbox/src/proxy.rs | Extracts model from JSON body and forwards it as model_hint to the router. |
| crates/openshell-sandbox/src/l7/inference.rs | Adds Codex + Ollama native API patterns and tests. |
| crates/openshell-router/src/lib.rs | Adds select_route() and extends proxy APIs to accept model_hint. |
| crates/openshell-router/src/backend.rs | Adds Ollama validation probe and changes backend URL construction behavior. |
| crates/openshell-router/tests/backend_integration.rs | Updates tests for new proxy function signatures and /v1 endpoint expectations. |
| crates/openshell-core/src/inference.rs | Adds OLLAMA_PROFILE (protocols/base URL/config keys). |
| crates/openshell-cli/src/run.rs | Adds gateway_inference_set_multi() to send multi-model configs. |
| crates/openshell-cli/src/main.rs | Adds --model-alias ALIAS=PROVIDER/MODEL CLI flag and dispatch. |
| architecture/inference-routing.md | Documents alias-based route selection, new patterns, and multi-model route behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
af1748b to
ab71175
Compare
Add pattern detection, provider profile, and validation probe for Ollama's native /api/chat, /api/tags, and /api/show endpoints. Proxy changes (l7/inference.rs): - POST /api/chat -> ollama_chat protocol - GET /api/tags -> ollama_model_discovery protocol - POST /api/show -> ollama_model_discovery protocol Provider profile (openshell-core/inference.rs): - New 'ollama' provider type with default endpoint http://host.openshell.internal:11434 - Supports ollama_chat, ollama_model_discovery, and OpenAI-compatible protocols (openai_chat_completions, openai_completions, model_discovery) - Credential lookup via OLLAMA_API_KEY, base URL via OLLAMA_BASE_URL Validation (backend.rs): - Ollama validation probe sends minimal /api/chat request with stream:false Tests: 4 new tests for pattern detection (ollama chat, tags, show, and GET /api/chat rejection). Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
- Proto: add InferenceModelEntry message with alias/provider/model fields; add repeated models field to ClusterInferenceConfig, Set/Get request/response - Server: add upsert_multi_model_route() for storing multiple model entries under a single route slot; update resolve_route_by_name() to expand multi-model configs into per-alias ResolvedRoute entries - Router: add select_route() with alias-first, protocol-fallback strategy; add model_hint parameter to proxy_with_candidates() variants - Sandbox proxy: extract model field from JSON body as routing hint - Tests: 7 new tests covering select_route, multi-model resolution, and bundle expansion; all 291 existing tests continue to pass Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
- Add --model-alias flag to 'inference set' for multi-model config (e.g. --model-alias gpt=openai/gpt-4 --model-alias claude=anthropic/claude-sonnet-4-20250514) - Add gateway_inference_set_multi() handler in run.rs - Update inference get/print to display multi-model entries - Import InferenceModelEntry proto type in CLI - Fix build_backend_url to always strip /v1 prefix for codex paths - Add /v1/codex/* inference pattern for openai_responses protocol - Fix backend tests to use /v1 endpoint suffix Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
…te guard - Add timeout_secs parameter to gateway_inference_set_multi and pass through to SetClusterInferenceRequest - Add print_timeout to multi-model output display - Add timeout field to router test helper make_route (upstream added timeout to ResolvedRoute) - Add system route guard: upsert_multi_model_route rejects route_name == sandbox-system with InvalidArgument - Add timeout_secs: 0 to multi-model test ClusterInferenceConfig structs - Add upsert_multi_model_route_rejects_system_route test Signed-off-by: Lyle Hopkins <lyle@cosmicnetworks.com>
ab71175 to
d887f04
Compare
|
@pimlock Happy to address any feedback or questions. Let me know if you'd like anything restructured or split differently. |
|
The use of I am curious, if you need such level of routing support, have you considered setting up a dedicated proxy/router that is accessible outside of the sandbox and just configuring access to it with network policies? This is a typical pattern we have several users follow. |
Summary
Adds multi-route inference proxy support, allowing sandboxed agents to reach multiple LLM providers (OpenAI, Anthropic, NVIDIA, Ollama) through a single
inference.localendpoint. Agents select a backend by setting themodelfield to an alias name. Also adds Ollama native API support and Codex URL pattern matching.Related Issue
Closes #203
Changes
InferenceModelEntrymessage (alias,provider_name,model_id); addmodelsrepeated field to set/get request/response messagesupsert_multi_model_route()validates and stores multiple alias→provider mappings; resolves each entry into a separateResolvedRouteat bundle timeselect_route()implements alias-first, protocol-fallback selection;proxy_with_candidates/proxy_with_candidates_streamingaccept optionalmodel_hintmodelfield from request body asmodel_hintfor route selection/v1/codex/*,/api/chat,/api/tags,/api/showinference patternsbuild_backend_url()always strips/v1prefix to support both versioned and non-versioned endpoints (e.g. Codex)OLLAMA_PROFILEprovider profile with native + OpenAI-compat protocols--model-alias ALIAS=PROVIDER/MODELflag (repeatable, conflicts with--provider/--model)inference-routing.mdwith all new sectionsTesting
mise run pre-commitpassesChecklist