Commit Graph

77 Commits

Author SHA1 Message Date
Sean Hatfield
4c16e7f7ab
Fix score reporting for Milvus, Zilliz, and Pinecone vector databases (#4106)
* normalize milvus score reporting

* normalize pinecone score reporting

* normalize score reporting for zilliz
2025-07-08 13:47:11 -07:00
timothycarambat
d1d68af0f8 patch skipCache logic among all vector db providers
resolves #3958
2025-06-30 13:16:46 -07:00
FT
e37f20f547
Fix Typo in Milvus Provider and Update Comment in Prisma Schema (#4003)
* Update schema.prisma

* Update index.js
2025-06-15 12:58:56 -07:00
leopardracer
e779dcfeff
Fix Typo in Documentation and Metadata Field Name (#3979)
* Update index.js

* Update index.js
2025-06-10 09:27:44 -07:00
Shixian Sheng
aef455d977
Fixed grammar and typos (#3802)
* Update common.js

* Update README.md

* Update common.js

* Update README.md

* linting

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-05-12 09:42:50 -07:00
timothycarambat
d99ab52165 add error fix for pgvector 2025-05-09 14:20:52 -07:00
Timothy Carambat
e1b7f5820c
PGvector vector database support (#3788)
* PGVector support for vector db storage

* forgot files

* comments

* dev build

* Add ENV connection and table schema validations for vector table
add .reset call to drop embedding table when changing the AnythingLLM embedder
update instrutions
Add preCheck error reporting in UpdateENV
add timeout to pg connection

* update setup

* update README

* update doc
2025-05-09 12:27:11 -07:00
Shixian Sheng
dd701f9aa4
Update QDRANT_SETUP.md (#3530) 2025-03-25 12:45:18 -07:00
Sean Hatfield
f6239a39f8
fix chroma db + add similarity offset (#3458)
* fix chroma db + add similarity offset

* patch chroma scoring

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-03-17 17:48:23 -07:00
Hakeem Abbas
ca60ba827b
fix: sanitizeNamespace (#3246)
bug fixes for sanitizing Namespaces and handling chunk size limit of astradb collections in each doc

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-02-17 13:54:32 -08:00
Adam Setch
d63438fa61
chore: rename Github to GitHub (#3199)
* chore: rename Github to GitHub

Signed-off-by: Adam Setch <adam.setch@outlook.com>

* chore: rename Github to GitHub

Signed-off-by: Adam Setch <adam.setch@outlook.com>

* Undo some code changes for references

---------

Signed-off-by: Adam Setch <adam.setch@outlook.com>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-02-13 10:45:43 -08:00
Sean Hatfield
f8c72786df
Fix similarity score bug in lance/chroma dbs (#2986)
* fix similarity score bug in lance/chroma dbs

* batch lower bound case

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-01-17 18:27:54 -08:00
Timothy Carambat
ad01df8790
Reranker option for RAG (#2929)
* Reranker WIP

* add cacheing and singleton loading

* Add field to workspaces for vectorSearchMode
Add UI for lancedb to change mode
update all search endpoints to pass in reranker prop if provider can use it

* update hint text

* When reranking, swap score to rerank score

* update optchain
2025-01-02 14:27:52 -08:00
Timothy Carambat
bb5c3b7e0d
make similarityResponse object arguments and not positional (#2930)
* make `similarityResponse` object arguments and not positional

* reuse client for qdrant
2025-01-02 12:03:26 -08:00
wolfganghuse
af703427c7
fix wrong metadata assignment in MilvusProvider (#2870)
fixed wrong metadata assignment
2024-12-18 10:33:18 -08:00
Timothy Carambat
04e29203a5
Add header static class for metadata assembly (#2567)
* Add header static class for metadata assembly

* update comments

* patch header parsing for links
2024-11-04 11:47:46 -08:00
Sean Hatfield
a58f271149
Milvus bug fix (#2183)
* patch no text results for milvus chunks

* wrap addDocumentToNamespace in try catch for handling milvus errors

* lint

* revert milvus db changes

* add try catch to handle grpc error from milvus
2024-09-09 15:32:08 -07:00
Timothy Carambat
9bd65f1567
[CHORE] Migration from vectordb to @lancedb/lancedb NodeJS SDK (#1766)
WIP on migration to @lancedb/lancedb NodeJS SDK
2024-06-26 21:57:16 -07:00
Timothy Carambat
dc4ad6b5a9
[BETA] Live document sync (#1719)
* wip bg workers for live document sync

* Add ability to re-embed specific documents across many workspaces via background queue
bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled
UI for watching/unwatching docments that are embedded.
TODO: UI to easily manage all bg tasks and see run results
TODO: UI to enable this feature and background endpoints to manage it

* create frontend views and paths
Move elements to correct experimental scope

* update migration to delete runs on removal of watched document

* Add watch support to YouTube transcripts (#1716)

* Add watch support to YouTube transcripts
refactor how sync is done for supported types

* Watch specific files in Confluence space (#1718)

Add failure-prune check for runs

* create tmp workflow modifications for beta image

* create tmp workflow modifications for beta image

* create tmp workflow modifications for beta image

* dual build
update copy of alert modals

* update job interval

* Add support for live-sync of Github files

* update copy for document sync feature

* hide Experimental features from UI

* update docs links

* [FEAT] Implement new settings menu for experimental features (#1735)

* implement new settings menu for experimental features

* remove unused context save bar

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* dont run job on boot

* unset workflow changes

* Add persistent encryption service
Relay key to collector so persistent encryption can be used
Encrypt any private data in chunkSources used for replay during resync jobs

* update jsDOC

* Linting and organization

* update modal copy for feature

---------

Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
2024-06-21 13:38:50 -07:00
Sean Hatfield
1b8386b079
[FIX] ChromaDB namespace normalization (#1625)
* chromadb namespace normalization

* update normalization function with more clarity

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-06-06 15:38:05 -07:00
Anush
771889ad7f
[FIX] Incorrect vectors count with Qdrant (#1561)
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2024-06-06 13:18:01 -07:00
Shixian Sheng
a256db132d
Fixed links (#1485)
* Update CHROMA_SETUP.md

* Update ASTRA_SETUP.md
2024-05-22 10:06:39 -05:00
Timothy Carambat
b23cb1a90f
Improve RAG results via chunkHeader append (#1473) 2024-05-21 14:43:39 -05:00
Timothy Carambat
cae6cee1b5
Do not go through LLM to embed when embedding documents (#1428) 2024-05-16 17:51:04 -07:00
Timothy Carambat
94017e2b51
bump langchain deps (#1231)
* bump langchain deps

* patch native and ollama providers remove deprecated deps

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-04-30 12:04:24 -07:00
Timothy Carambat
ca63012c0f
bump lancedb dep (#1229) 2024-04-29 09:52:22 -07:00
Timothy Carambat
9655880cf0
Update all vector dbs to filter duplicate source documents that may be pinned (#1122)
* Update all vector dbs to filter duplicate parents

* cleanup
2024-04-17 18:04:39 -07:00
Timothy Carambat
24b523d5eb
append missing import for some vectordb providers (#1066) 2024-04-07 14:40:23 -07:00
Timothy Carambat
ce98ff4653
Enable customization of chunk length and overlap (#1059)
* Enable customization of chunk length and overlap

* fix onboarding link
show max limit in UI and prevent overlap >= chunk size
2024-04-06 16:38:07 -07:00
timothycarambat
718062d033 patch milvus/zilliz auto-generated collection name
resolves #1027
2024-04-03 12:34:23 -07:00
Gabriel Koo
4731ec8be8
[FIX] : missing import for parseAuthHeader in server/utils/vectorDbProviders/chroma/index.js (#869)
fix: import parseAuthHeader in chroma/index.js
2024-03-06 09:14:36 -08:00
Timothy Carambat
44c71013c8
Enforce name requirements for Zilliz/Milvus (#723) 2024-02-14 13:01:05 -08:00
Timothy Carambat
dfab14a5d2
Patch lanceDB not deleting vectors from workspace (#655)
patch lanceDB not deleting vectors from workspace
documentVectors self-sanitize on delete of parent document
2024-01-29 09:49:22 -08:00
Hakeem Abbas
5614e2ed30
feature: Integrate Astra as vectorDBProvider (#648)
* feature: Integrate Astra as vectorDBProvider

feature: Integrate Astra as vectorDBProvider

* Update .env.example

* Add env.example to docker example file
Update spellcheck fo Astra
Update Astra key for vector selection
Update order of AstraDB options
Resize Astra logo image to 330x330
Update methods of Astra to take in latest vectorDB params like TopN and more
Update Astra interface to support default methods and avoid crash errors from 404 collections
Update Astra interface to comply to max chunk insertion limitations
Update Astra interface to dynamically set dimensionality from chunk 0 size on creation

* reset workspaces

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-01-26 13:07:53 -08:00
Sean Hatfield
2f3db0e63a
[FEAT] support pinecone serverless (#639)
* migrate pinecone package to latest version and migrate pinecone vectordb provider class

* remove pinecone environment name env variable and update docs to reflect removal & serverless support complete

* migrate query for pinecone db

* typo in log

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-01-22 16:41:20 -08:00
Sean Hatfield
56fa17caf2
create configurable topN per workspace (#616)
* create configurable topN per workspace

* Update TopN UI text
Fix fallbacks for all providers
Add SQLite CHECK to TOPN value

* merge with master
Update zilliz provider for variable TopN

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-01-18 12:34:20 -08:00
Timothy Carambat
658e7fa390
chore: Better VectorDb and Embedder error messages (#620)
* chore: propogate embedder and vectordb errors during document mutations

* add default value for errors on addDocuments
2024-01-18 11:40:48 -08:00
Timothy Carambat
0df86699e7
feat: Add support for Zilliz Cloud by Milvus (#615)
* feat: Add support for Zilliz Cloud by Milvus

* update placeholder text
update data handling stmt

* update zilliz descriptor
2024-01-17 18:00:54 -08:00
Timothy Carambat
d0a3f1e3e1
Fix present diminsions on vectorDBs to be inferred for providers who require it (#605) 2024-01-16 13:41:01 -08:00
Shuyoou
6faa0efaa8
Issue #543 support milvus vector db (#579)
* issue #543 support milvus vector db

* migrate Milvus to use MilvusClient instead of ORM
normalize env setup for docs/implementation
feat: embedder model dimension added

* update comments

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-01-12 13:23:57 -08:00
Sayan Gupta
b7d2756754
Issue #204 Added a check to ensure that 'chunk.payload' exists and contains the 'id' property (#526)
* Issue #204 Added a check to ensure that 'chunk.payload' exists and contains the 'id' property before attempting to destructure it

* run linter

* simplify condition and comment

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-01-04 16:39:43 -08:00
Timothy Carambat
8cc1455b72
feat: add support for variable chunk length (#415)
fix: cleanup code for embedding length clarify
resolves #388
2023-12-07 16:27:36 -08:00
Timothy Carambat
6fa8b0ce93
Add API key option to LocalAI (#407)
* Add API key option to LocalAI

* add api key for model dropdown selector
2023-12-04 08:38:15 -08:00
Timothy Carambat
88d4808c52
315 show citations based on relevancy score (#316)
* settings for similarity score threshold and prisma schema updated

* prisma schema migration for adding similarityScore setting

* WIP

* Min score default change

* added similarityThreshold checking for all vectordb providers

* linting

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2023-11-06 16:49:29 -08:00
Timothy Carambat
be9d8b0397
Infinite prompt input and compression implementation (#332)
* WIP on continuous prompt window summary

* wip

* Move chat out of VDB
simplify chat interface
normalize LLM model interface
have compression abstraction
Cleanup compressor
TODO: Anthropic stuff

* Implement compression for Anythropic
Fix lancedb sources

* cleanup vectorDBs and check that lance, chroma, and pinecone are returning valid metadata sources

* Resolve Weaviate citation sources not working with schema

* comment cleanup
2023-11-06 13:13:53 -08:00
Timothy Carambat
5d56ab623b
Anthropic claude 2 support (#305)
* WIP Anythropic support for chat, chat and query w/context

* Add onboarding support for Anthropic

* cleanup

* fix Anthropic answer parsing
move embedding selector to general util
2023-10-30 15:44:03 -07:00
Sean Hatfield
669d7a396d
282 return relevancy score with similarityresponse (#304)
* include score value in similarityResponse for weaviate

* include score value in si
milarityResponse for qdrant

* include score value in si
milarityResponse for pinecone

* include score value in similarityResponse for chroma

* include score value in similarityResponse for lancedb

* distance to similarity

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-10-30 12:46:38 -07:00
Timothy Carambat
a8ec0d9584
Compensate for upper OpenAI emedding limit chunk size (#292)
Limit is due to POST body max size. Sufficiently large requests will abort automatically
We should report that error back on the frontend during embedding
Update vectordb providers to return on failed
2023-10-26 10:57:37 -07:00
Timothy Carambat
62d39eb4fb
resolves #259 (#260)
Support API client for chroma
2023-09-29 13:20:06 -07:00
Sean Hatfield
a126b5f5aa
Replace custom sqlite dbms with prisma (#239)
* WIP converted all sqlite models into prisma calls

* modify db setup and fix ApiKey model calls in admin.js

* renaming function params to be consistent

* converted adminEndpoints to utilize prisma orm

* converted chatEndpoints to utilize prisma orm

* converted inviteEndpoints to utilize prisma orm

* converted systemEndpoints to utilize prisma orm

* converted workspaceEndpoints to utilize prisma orm

* converting sql queries to prisma calls

* fixed default param bug for orderBy and limit

* fixed typo for workspace chats

* fixed order of deletion to account for sql relations

* fix invite CRUD and workspace management CRUD

* fixed CRUD for api keys

* created prisma setup scripts/docs for understanding how to use prisma

* prisma dependency change

* removing unneeded console.logs

* removing unneeded sql escape function

* linting and creating migration script

* migration from depreciated sqlite script update

* removing unneeded migrations in prisma folder

* create backup of old sqlite db and use transactions to ensure all operations complete successfully

* adding migrations to gitignore

* updated PRISMA.md docs for info on how to use sqlite migration script

* comment changes

* adding back migrations folder to repo

* Reviewing SQL and prisma integraiton on fresh repo

* update inline key replacement

* ensure migration script executes and maps foreign_keys regardless of db ordering

* run migration endpoint

* support new prisma backend

* bump version

* change migration call

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2023-09-28 14:00:03 -07:00