Commit Graph

194 Commits

Author SHA1 Message Date
Marcello Fitton
c4f19cec0e
Refactor LLMPerformanceMonitor.measureStream() to Use Options Object Pattern (#4786)
* Refactor LLMPerformanceMonitor to use options object for measureStream parameters

* Refactor invocations of `measureStream` to use options arguments

* Change invocation of `measureStream` in anthropic provider to use options argument

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-12-16 13:10:09 -08:00
Timothy Carambat
664f466e3f
4601 log model on response (#4781)
* add model tag to chatCompletion

* add modelTag `model` to async streaming
keeps default arguments for prompt token calculation where applied via explict arg

* fix HF default arg

* render all performance metrics as available for backward compatibility
add `timestamp` to both sync/async chat methods

* extract metrics string to function
2025-12-14 14:46:55 -08:00
Marcello Fitton
a7da757c84
Migrate Azure OpenAI Integration To v1 API | Enable Streaming for Reasoning Models in Azure OpenAI Basic Inference Provider (#4744)
* Refactor Azure OpenAI integration to use OpenAI SDK and the v1 API | Enable streaming for Azure Open AI basic inference provider

* Add info tooltip to inform user about 'Model Type' form field

* Add 'model_type_tooltip' key to multiple language translations

* Validate AZURE_OPENAI_ENDPOINT in provider construction

* remove unused import, update error handler, rescope URL utils

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-12-10 18:56:55 -08:00
方程
90e474abcb
Support Gitee AI(LLM Provider) (#3361)
* Support Gitee AI(LLM Provider)

* refactor(server): 重构 GiteeAI 模型窗口限制功能,暂时将窗口限制硬编码,计划使用外部 API 数据和缓存

* updates for Gitee AI

* use legacy lookup since gitee does not enable getting token context windows

* add more missing records

* reorder imports

---------

Co-authored-by: 方程 <fangcheng@oschina.cn>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-11-25 14:19:32 -08:00
Sean Hatfield
c913a2d68c
Prompt caching for Anthropic LLM and Agent providers (#4488)
* prompt caching for anthropic llm and agent providers

* add UI for control of ENV
simplify implementation

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-11-20 17:17:03 -08:00
Timothy Carambat
f0b3dab4c1
Simplify cache condition for LMStudio and Ollama to prevent race condition (#4669)
closes #4597
resolves #4572
closes #4600
resolves #4599
2025-11-20 16:32:02 -08:00
Sean Hatfield
49c29fb968
Z.ai LLM & agent provider (#4573)
* wip zai llm provider

* cleanup + add zai agent provider

* lint

* change how caching works for failed models

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-11-20 15:57:03 -08:00
Marcello Fitton
7a7ec969d7
Update Ollama AI Provider to Support Parsing "Thinking" Content From New Message Schema (#4587)
* add className prop to OllamaAILLM

* Enhance `OllamaAILLM.handleStream` to support parsing thinking content from the `message.thinking` property.

* refactor thinking property handler
patched ollama `@agent` flow calls

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-11-20 15:39:17 -08:00
Chetan Sarva
c169193fc4
feature: Support for AWS Bedrock API Keys (#4651)
* feat: add AWS Bedrock API Key option to settings panel

* feat: Bedrock API key auth method

* fix: hide IAM note when using bedrock api key

* move to camcelCase identifier for bedrock api key use
linting

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-11-20 15:38:45 -08:00
jonathanortega2023
7a0c149d2e
fix: Use eval_duration for output TPS calculations in Ollama LLM provider (#4568)
* fix: Use eval_duration for output TPS calculations and add as a metric field

* refactor usage of eval_duration from ollama metrics

* move eval_duration to usage

* overwrite duration in ollama provider wip measureAsyncFunction optional param

* allow for overloaded duration in measureAsyncFunction

* simplify flow for duration tracking

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-11-20 13:02:47 -08:00
Timothy Carambat
cf76bad452
Implement full chat and @agent chat user indentificiation for OpenRouter (#4668)
Implmenet chat and agentic chat user-id for OpenRouter
resolves #4553
closes #4482
2025-11-20 12:38:43 -08:00
timothycarambat
0a1a5a216a patch ollama context window error when unreachable 2025-10-06 16:25:06 -07:00
Timothy Carambat
c2e7ccc00f
Reimplement Cohere models for basic chat (#4489)
* Reimplement Cohere models
- Redo LLM implementation to grab models from endpoint and pre-filter
- Migrate embedding models to also grab from remote
- Add records for easy context window lookup'

* fix comment
2025-10-03 18:28:20 -07:00
Timothy Carambat
8cdadd8cb3
Sync models from remote for FireworksAI (#4475)
resolves #4474
2025-10-02 12:34:05 -07:00
Sean Hatfield
0b18ac6577
Model context limit auto-detection for LM Studio and Ollama LLM Providers (#4468)
* auto model context limit detection for ollama llm provider

* auto model context limit detection for lmstudio llm provider

* Patch Ollama to function and sync context windows like Foundry

* normalize how model context windows are cached from endpoint service
todo: move this into global utility class with MODEL_MAP
eager load models on boot to pre-cache them
add performance model improvements into ollama agent as well as apply n_ctx

* remove debug log

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-10-02 11:54:19 -07:00
Sean Hatfield
599a3fd8b8
Microsoft Foundry Local LLM provider & agent provider (#4435)
* add microsoft foundry local llm and agent providers

* minor change to fix early stop token + overloading of context window
always use user defined window _unless_ it is larger than the models real contenxt window
cache the context windows when we can from the API (0.7.*)+
Unload model forcefully on model change to prevent resource hogging

* add back token preference since some models have very large windows and can crash a machine
normalize cases

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-10-01 20:04:13 -07:00
Marcello Fitton
004327264a
Add stream options to Gemini LLM for usage tracking (#4466)
* Add stream options to Gemini LLM for usage tracking

* Update Gemini LLM to disable prompt token calculation

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-10-01 14:00:26 -07:00
Timothy Carambat
cd34063111
Patch OpenAI metrics (#4458)
resolves #4457
2025-09-30 15:19:34 -07:00
Timothy Carambat
c8f13d5f27
Enable custom HTTP response timeout for ollama (#4448) 2025-09-29 12:32:55 -07:00
Marcello Fitton
6855bbf695
Refactor Class Name Logging (#4426)
* Add className property to various LLM and embedder classes to fix logging bug after minification

* Fix bug with this.log method by applying the missing private field symbol
2025-09-25 15:34:19 -10:00
Timothy Carambat
9466f67162
Update the timeout value on all stream-timeout providers: (#4412)
- OpenRouter
- Novita
- CometAPI
updated to 3,000ms default with 500ms min
2025-09-19 08:52:20 -07:00
Sean Hatfield
1209606d9a
Migrate OpenAI LLM provider to use Responses API (#4404)
* migrate openai llm provider to use responses api

* add back image support

* dont recalc tokens from OpenAI since we get metrics back

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-09-18 21:15:19 -07:00
Marcello Fitton
50d4a198a4
Add User-Agent header on the requests sent by Generic OpenAI providers. (#4393)
* Add User-Agent header on the requests sent by Generic OpenAI providers.

* Moved getAnythingLLMUserAgent helper fn to server/endpoints/utils.js and changed fallback version string to "unknown"

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-09-17 13:08:18 -07:00
TensorNull
5922349bb7
feat: Implement CometAPI integration for chat completions and model m… (#4379)
* feat: Implement CometAPI integration for chat completions and model management

- Added CometApiLLM class for handling chat completions using CometAPI.
- Implemented model synchronization and caching mechanisms.
- Introduced streaming support for chat responses with timeout handling.
- Created CometApiProvider class for agent interactions with CometAPI.
- Enhanced error handling and logging throughout the integration.
- Established a structure for managing function calls and completions.

* linting

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-09-16 14:38:49 -07:00
Sean Hatfield
31a8ead823
Fix multimodal chats via openai compat api (#4135)
* fix multimodal chats via openai compat api

* lint

* add tests for multi-modal content in openai compat endpoint

* refactor to normalize how openai attachments are handled

* uncheck file

* rewrite tests, autodetect mime from dataurl, and spread attachments from prompt

* lint

* revert and fix tests

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-07-22 09:57:32 -07:00
Sean Hatfield
6d6bd14622
Moonshot AI LLM & agent provider (#4178)
* add moonshot ai LLM & agent provider

* fix moonshot agent calling

* handle attachments/fix moonshot llm provider

* update docs/example env

* add moonshot to onboarding privacy

* add moonshot to onboarding llm preference

* update privacy for moonshot ai

* update logo higher res

* remove caching and use modelmap
2025-07-22 09:56:51 -07:00
Fabio Nonato
0d7a7551b8
fix to support: feat2864 - using local credentials file with Amazon Bedrock (#3986)
* fix: feat2864

* patch default case

---------

Co-authored-by: nonatofabio <fnp@amazon.com>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-07-02 09:15:23 -07:00
Sean Hatfield
07129e81f8
Add option to disable streaming via env for generic openai provider (#4079)
* add option to disable streaming via env for generic openai provider

* move env check to streamingEnabled
2025-07-01 12:47:46 -07:00
Timothy Carambat
4eb951d40e
Fix model map staleness behavior or fallback (#3971)
* Fix model map staleness behavior or fallback

* patch url

* fix log

* dev build
2025-06-06 17:39:48 -07:00
Timothy Carambat
a57536b715
Handle invalid response bodies for ContextWindowFinder (#3896)
Handle invalid response bodies for contextwindowfinder
2025-05-27 15:40:06 -07:00
timothycarambat
2450e49ac3 hoisting cleanup for format var 2025-05-14 16:25:17 -07:00
timothycarambat
605910b76d forgot files for DPAIS 2025-05-14 15:26:14 -07:00
Timothy Carambat
e80492606a
Automatic Context window detection (#3817)
* Add context window finder from litellm maintained list
apply to all cloud providers, have client cache for 3 days

* linting
2025-05-14 11:03:19 -07:00
timothycarambat
492570dfed patch Azure image reading regressions
resolves #3811
2025-05-12 11:10:35 -07:00
Danny Steenman
5500fa2bc5
feat: support for iam roles for bedrock client (#2632)
* feat: implement iam role auth for bedrock

* fix: make client refreshes properly when switching between iam_user and iam_role

* checkout agent flow

* fix aiprovider for bedrock in agent use

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-05-06 13:48:15 -07:00
Tristan Stahnke
b64a77f29f
Refactor AWS Bedrock Provider for Multi-modal Support & Correct Token Limits (#3714)
* Fixed two primary issues discovered while using AWS Bedrock with Anthropic Claude Sonnet models:
- Context Window defaults to 8192 maximum, which isn't correct
- Multimodal stopped working when removing langchain, which was transparently handling image_url to a format sonnet expects.

* Ran `yarn lint`

* Updated .env.example to have aws bedrock examples too

* Refactor for readability
move utils for AWS specific functionality to subfile
add token output max to ENV so setting persits

---------

Co-authored-by: Tristan Stahnke <tristan.stahnke+gpsec@guidepointsecurity.com>
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-05-06 12:55:24 -07:00
Sean Hatfield
8912d0f0fc
Add option to control KoboldCPP max response tokens (#3746)
add option to control koboldcpp max response tokens
2025-05-02 14:12:06 -07:00
Shinya Suzuki
cd900f9e4c
Replace @azure/openai with openai, and update openai to version 4.95.1 (#3691)
* Replace @azure/openai to OpenAI lib

* Remove @azure/openai dependency and update openai to version 4.95.1

* linting

* update logging
fix translation dictionary error

* remove bad ENV key that DNE
linting
Patch Azure OpenAI
Migrate Azure Agent provider to use OpenAI Schema for tool calling performance

* unset

* migrate azure to use default OAI stream handler

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-04-29 11:21:39 -07:00
Shinya Suzuki
98c46c04e4
Update Azure AI options and model map with new model configurations (#3660)
* Update Azure AI options and model map with new model configurations

* linting

---------

Co-authored-by: Shinya Suzuki <shinya.s.825@gmail.com>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-04-16 09:08:40 -07:00
timothycarambat
1d1fb817b0 linting 2025-04-15 12:51:08 -07:00
Michał Rudziński
be27299897
handling of citations in openRouter provider #3581 (#3620)
* handling of citations in openRouter provider #3581

* Update pplx enrichToken function comment
Modify OR enrichToken to be generic handler function with optional params
handle _just_ Perplexity in-line citations since no other models support this functionality

* remove console log

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-04-15 10:57:09 -07:00
Timothy Carambat
1b59295f89
Refactor Gemini to use OpenAI interface API (#3616)
* Refactor Gemini to use OpenAI interface API

* add TODO

* handle errors better (gemini)

* remove unused code
2025-04-07 17:18:31 -07:00
Timothy Carambat
4ac900f645
Gemini model list sync (#3609)
* Update defaultModels.js

add gemma-3-27b-it to v1BetaModels

* Update defaultModels.js

20250330 model update

* Update defaultModels.js

remove text embedding

* Update name and inputTokenLimit modelMap.js

* Update gemini to load models from both endpoints
dedupe models
decide endpoint based on expieremental status from fetch
add util script for maintainers
reduce cache time on gemini models to 1 day

* remove comment

---------

Co-authored-by: DreamerC <dreamerwolf.tw@gmail.com>
2025-04-07 13:45:16 -07:00
Timothy Carambat
78c83383d8
Overhaul AWS Bedrock provider (#3537)
* Patch AWS Bedrock provider for newer models and performance

* patch prompt constructor
2025-03-25 15:58:16 -07:00
Timothy Carambat
66b4bf2679
Add support for Anthropics /model endpoint (finally) (#3376)
* Add support for Anthropics /model endpoint (finally)

* dev
2025-02-28 13:29:43 -08:00
cnJasonZ
2aeb4c2961
Add new model provider PPIO (#3211)
* feat: add new model provider PPIO

* fix: fix ppio model fetching

* fix: code lint

* reorder LLM
update interface for streaming and chats to use valid keys
linting

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-02-27 10:53:00 -08:00
Skanda Kaashyap
d1354caccb
[FEAT] Add claude-3-7 (#3337)
* add claude 3-7 sonnet

* made all the changes everywhere

* add 3-7-sonnet-latest model

* lint

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2025-02-25 12:52:17 -08:00
timothycarambat
12b43256a0 lint 2025-02-18 20:49:40 -08:00
Sushanth Srivatsa
3fd0fe8fc5
2749 ollama client auth token (#3005)
* ollama auth token provision

* auth token provision

* ollama auth provision

* ollama auth token

* ollama auth provision

* token input field css fix

* Fix provider handler not using key
sensible fallback to not break existing installs
re-order of input fields
null-check for API key and header optional insert on request
linting

* apply header and auth to agent invocations

* upgrading to ollama 5.10 for passing headers to constructor

* rename Auth systemSetting key to be more descriptive
linting and copy

* remove untracked files + update gitignore

* remove debug

* patch lockfile

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-02-18 16:00:17 -08:00
Timothy Carambat
cc3d619061
Add handling to reasoning models for Generic OpenAI connector (#3183)
* Add handling to resoning models for Generic OpenAI connector
resolves #3177

* linting
2025-02-12 10:28:44 -08:00