merlyn

Author	SHA1	Message	Date
Jonas Stawski	b8d4cc3454	Added metadata parameter to document/upload, document/upload/{folderName}, and document/upload-link (#4342 ) * Added the ability to pass in metadata to the /document/upload/{folderName} endpoint * Added the ability to pass in metadata to the /document/upload-link endpoint * feat: added metadata to document/upload api endpoint * simplify optional metadata in document dev api endpoints * lint * patch handling of metadata in dev api * Linting, small comments --------- Co-authored-by: jstawskigmi <jstawski@getmyinterns.org> Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-17 11:17:29 -07:00
timothycarambat	bb7d65f0eb	patch missing options resolves #4316	2025-08-20 10:51:14 -07:00
timothycarambat	a4a84f9bdd	forgot 1.8.5 tag :)	2025-08-14 17:43:55 -07:00
timothycarambat	0200e647b8	add back normalization + docs link	2025-08-14 11:43:04 -07:00
Timothy Carambat	0fb33736da	Workspace Chat with documents overhaul (#4261 ) * Create parse endpoint in collector (#4212) * create parse endpoint in collector * revert cleanup temp util call * lint * remove unused cleanupTempDocuments function * revert slug change minor change for destinations --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> * Add parsed files table and parse server endpoints (#4222) * add workspace_parsed_files table + parse endpoints/models * remove dev api parse endpoint * remove unneeded imports * iterate over all files + remove unneeded update function + update telemetry debounce * Upload UI/UX context window check + frontend alert (#4230) * prompt user to embed if exceeds prompt window + handle embed + handle cancel * add tokenCountEstimate to workspace_parsed_files + optimizations * use util for path locations + use safeJsonParse * add modal for user decision on overflow of context window * lint * dynamic fetching of provider/model combo + inject parsed documents * remove unneeded comments * popup ui for attaching/removing files + warning to embed + wip fetching states on update * remove prop drilling, fetch files/limits directly in attach files popup * rework ux of FE + BE optimizations * fix ux of FE + BE optimizations * Implement bidirectional sync for parsed file states linting small changes and comments * move parse support to another endpoint file simplify calls and loading of records * button borders * enable default users to upload parsed files but NOT embed * delete cascade on user/workspace/thread deletion to remove parsedFileRecord * enable bgworker with "always" jobs and optional document sync jobs orphan document job: Will find any broken reference files to prevent overpollution of the storage folder. This will run 10s after boot and every 12hr after * change run timeout for orphan job to 1m to allow settling before spawning a worker * linting and cleanup pr --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com> * dev build * fix tooltip hiding during embedding overflow files * prevent crash log from ERRNO on parse files * unused import * update docs link * Migrate parsed-files to GET endpoint patch logic for grabbing models names from utils better handling for undetermined context windows (null instead of Pos_INIFI) UI placeholder for null context windows * patch URL --------- Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2025-08-11 09:26:19 -07:00
Timothy Carambat	70a07b743b	Update `writeToServerDocuments` to take config object (#4213 )	2025-07-29 17:53:05 -07:00
timothycarambat	7692775942	minor change to XLSX parse and upload output folder	2025-07-29 17:44:47 -07:00
timothycarambat	ff34c8cefc	use documentsFolder path for simplification	2025-07-16 11:14:18 -07:00
timothycarambat	c535c69345	1.8.4 tag update	2025-07-16 10:40:39 -07:00
Sean Hatfield	5485c58b44	Sanitize youtube transcription file paths (#4148 ) sanitize youtube transcription file paths	2025-07-14 13:53:34 -07:00
Sean Hatfield	5d60047dc7	Handle BigInt in message response (#4110 ) * wip handle bigints in message response * extend bigint protoype to handle bigint stringification + add test * unset unrelated file * update tests, simplify implementation; --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-07-10 12:33:34 -07:00
Timothy Carambat	8001d9ddeb	update 1.8.3 tags for release (#4109 ) * update 1.8.3 tags for release * whoops, botched news	2025-07-09 12:17:56 -07:00
rexjohannes	14fa079953	Fix/drupal wiki (improve table & url handling) (#4097 ) * feat: add support for custom table formatting in htmlToText conversion * fix tables * feat: improve plain text table formatting for AI readability * fix options * improve drupal wiki connector * final fix * adjust leading slash to match code * linting --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:39:38 -07:00
bobbercheng	d0978fa363	Fix broken YT scraping with YT API (#4005 ) * Fix broken YT scraping with YT API * refactor youtube transcript class/add jsdoc comments * fix test --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:06:18 -07:00
Timothy Carambat	64d9fbc8f0	Show app version in system settings sidebar (#4044 ) * Add version tagging resolves #4038 closes #4034 closes #4028 * add hook * add build * patch	2025-06-24 13:56:12 -07:00
timothycarambat	3d5e8602a8	lint	2025-05-27 13:54:13 -07:00
rexjohannes	dc80d3e535	fixed drupal connector (#3893 ) https://github.com/Mintplex-Labs/anything-llm/issues/3875#issuecomment-2913211343	2025-05-27 13:15:43 -07:00
Timothy Carambat	245a5969b8	normalize path on drupal to use documentsFolder constant normalize path on drupal to use documentsFolder constant	2025-05-27 09:25:48 -07:00
Sean Hatfield	2b274c62b7	Obsidian data connector (#3798 ) * add obsidian vault data connector * lint * add english translations * normalize translations * improve file parser and reader --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-05-12 13:45:27 -07:00
Timothy Carambat	6fc0a6a644	Enable workflow rule for package verification (#3778 ) enable workflow rule	2025-05-07 12:51:14 -07:00
timothycarambat	3f4fda86bf	match openai versions across collector/backend	2025-05-07 12:30:09 -07:00
timothycarambat	9d661bb96e	linting	2025-05-07 09:40:31 -07:00
mr-chenguang	eff9d24cb9	feat: support fetch wikis for gitlab data connectors (#3271 ) * feat: support fetch wikis for gitlab data connectors * gitlab connector button spacing * add docAuthor and description metadata for GitLab wiki pages --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-05-06 14:09:53 -07:00
Sean Hatfield	610bdd4673	Allow custom headers in upload-link endpoint (#3695 ) * allow custom headers in upload-link endpoint * override loader.scrape to allow for passing of headers in langchain puppeteer * lint * Rename some variables move positional args to named args update documentation to reflect arg changes and funciton sigs validate header object before attempting to end to forward to request * update header validation for custom headers --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-04-22 12:47:12 -07:00
Timothy Carambat	1601eb986c	Enable bypass of ip limitations via ENV in collector processing (#3652 ) * Enable bypass of ip limitations via ENV in collector startup resolves #3625 connect #3626 * dev build * bump dockerx build action * enable runtime setting config of collector requests * comments and linting for option passing * unset * unset * update docs link * linting and docs	2025-04-21 11:10:41 -07:00
Timothy Carambat	fd4929b4d2	Feature/drupalwiki collector (#3693 ) * Implement DrupalWiki collector * Add attachment downloading and processing functionality (#3) * linting * Linting Add citation image small refactors add URL for citation identifier --------- Co-authored-by: em <eugen.mayer@kontextwork.de> Co-authored-by: rexjohannes <53578137+rexjohannes@users.noreply.github.com> Co-authored-by: Eugen Mayer <136934+EugenMayer@users.noreply.github.com>	2025-04-21 09:17:24 -07:00
Timothy Carambat	fd174cab86	Apply `.git` logic handler for repo URLs (#3655 ) * Apply `.git` logic handler for repo URLs * remove comment	2025-04-15 18:01:14 -07:00
Timothy Carambat	fab74037fa	Prevent collector crash when blocked by CDN (#3373 ) resolves #3365	2025-02-28 10:27:05 -08:00
AbelDuan	df166eb64e	feat: Add multilingual support for ocr module (#3325 ) * Add multilingual support for ocr mudule * Add OCR langauge as server var that is passed into Collector Support all valid tesseract language codes Filter and parse only valid codes with fallbacks' * persist TARGET_OCR_LANG * update docker example env --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-27 12:31:17 -08:00
Kristofer Bourro	b07240deee	Windows development environment variables support (#3354 ) * Windows development environment variables support * moved cross-env to dev dependencies --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-27 10:43:31 -08:00
t2	0eb86e2c12	for projects in gitlab subgroup (#3075 ) (#3247 ) * for projects in gitlab subgroup (#3075) * fix: false condition * refactor pattern matching logic --------- Co-authored-by: t2 <> Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-17 12:25:11 -08:00
Timothy Carambat	4545ce24cd	Drop Node `canvas` for manual `sharp` conversion (#3221 ) * Drop Node `canvas` for manual `sharp` conversion * bump dev	2025-02-14 17:38:13 -08:00
mr-chenguang	6ffdbf074d	feat(dataconnectors): support confluence personal access token (#3206 ) * feat(dataconnectors): support confluence personal access token * fix: change select option * linting change name on accesstype field --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-02-14 12:12:01 -08:00
Timothy Carambat	89bba68219	Add OCR of image support (#3219 ) * OCR PDFs as fallback in spawn thread * wip * build our own worker fanout and wrapper * norm pkgs * Add image OCR support	2025-02-14 12:07:33 -08:00
Timothy Carambat	2a9066e83a	OCR PDFs as fallback during upload (#3204 ) * OCR PDFs as fallback in spawn thread * wip * build our own worker fanout and wrapper * norm pkgs * bump dev	2025-02-14 11:57:31 -08:00
Timothy Carambat	b6d3a411b1	Add `querySelectorAll` capability to web-scraping block (#3186 ) * Add `querySelectorAll` capability to web-scraping block * patches and fallbacks * fix styles of text in web scraping block --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2025-02-13 16:11:15 -08:00
Adam Setch	d63438fa61	chore: rename Github to GitHub (#3199 ) * chore: rename Github to GitHub Signed-off-by: Adam Setch <adam.setch@outlook.com> * chore: rename Github to GitHub Signed-off-by: Adam Setch <adam.setch@outlook.com> * Undo some code changes for references --------- Signed-off-by: Adam Setch <adam.setch@outlook.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-02-13 10:45:43 -08:00
Timothy Carambat	9a4df22c70	autodetect parseable text file contents (#3079 )	2025-01-31 13:31:26 -08:00
Timothy Carambat	d1ca16f7f8	Add tokenizer improvments via Singleton class and estimation (#3072 ) * Add tokenizer improvments via Singleton class linting * dev build * Estimation fallback when string exceeds a fixed byte size * Add notice to tiktoken on backend	2025-01-30 17:55:03 -08:00
Sean Hatfield	dd017c6cbb	Audio file validations (#2902 ) * add audio file validations * patch sharp to support wavfile parsing --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-12-30 14:48:28 -08:00
Sean Hatfield	9bc01afa7d	Fix scraping failed bug in link/bulk link scrapers (#2807 ) * fix scraping failed bug in link/bulk link scrapers * reset submodule * swap to networkidle2 as a safe mix for SPA and API-loaded pages, but also not hang on request heavy pages * lint --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-12-11 14:01:52 -08:00
Timothy Carambat	5e698534fe	Add plaintext file extensions (#2664 )	2024-11-20 09:56:03 -08:00
Sean Hatfield	cf3b085a3a	Handle OpenAI whisper transcription edge case (#2621 ) remove openai whisper transcription provider response_format option	2024-11-11 17:32:03 -08:00
Sean Hatfield	0bb47619dc	Allow 127.0.0.1 as valid URL for scraping (#2560 ) * allow 127.0.0.1 as valid url for scraping * update comments and lint --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-10-31 09:57:28 -07:00
timothycarambat	c870e31aaa	add `ino` filetype to text/plain support	2024-10-28 11:44:15 -07:00
Sean Hatfield	0074ededdd	Github data connector improvements (#2439 ) * fix tree/blob github urls from branches not being loaded * improve ux of github data connector * lint * patch Github URL parser to just validate with `URL` native parser * uncheck LocalStorage of PAT for security reasons --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-10-21 15:25:35 -07:00
timothycarambat	ab6f03ce1c	linting	2024-10-18 11:44:14 -07:00
Sean Hatfield	41522cdfb4	Handle non-ascii characters in single and bulk link scraper URLs (#2495 ) handle non-ascii characters in urls	2024-10-17 17:04:00 -07:00
Sean Hatfield	b658f5012d	Support XLSX files (#2403 ) * support xlsx files * lint * create seperate docs for each xlsx sheet * lint * use node-xlsx pkg for parsing xslx files * lint * update error handling --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-10-03 13:45:23 -07:00
Timothy Carambat	93d64642f3	Add exception handling for special case files like `Dockerfile` and `Jenkinsfile` (#2410 )	2024-10-02 15:13:31 -07:00
Blazej Owczarczyk	348d9c8285	Add 3GB file size limit to body parser middlewares (#2390 )	2024-09-30 11:19:41 -07:00
Timothy Carambat	30645831a1	1959 filetype filters (#2378 ) * Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly. * add @langchain/community to collector package.json * fix: Improve handling of complex ignore patterns in GitLabRepoLoader * refactor: use ignore package for simplified ignore logic * run yarn lint * add @langchain/community@^0.2.23 * remove unused dep lint --------- Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>	2024-09-26 12:50:35 -07:00
Blazej Owczarczyk	b2123b13b0	Added an option to fetch issues from gitlab. Made the file fetching a… (#2335 ) * Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. #2334 * Fixed a typo in loadGitlabRepo. * Convert issues to markdown. * Fixed an issue with time estimate field names in issueToMarkdown. * handle rate limits more gracefully + update checkbox to toggle switch * lint --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com> Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-09-26 11:45:18 -07:00
Timothy Carambat	961b567541	Add dropdown for confluence connector deployment (#2376 )	2024-09-26 08:49:05 -07:00
Sean Hatfield	4488744850	Support more Confluence URL formats (#2118 ) * support more confluence url formats * use pattern matching for confluence urls and manual splitting as fallback * rework entire Confluence flow to prevent issues with custom, local, and cloud spaces * remove dep --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-09-25 16:12:17 -07:00
Sean Hatfield	5a3d55db67	Fix custom domain in confluence (#2328 ) confluence custom domain fix	2024-09-19 15:36:07 -05:00
Timothy Carambat	4fa3d6d333	Load all branches in gitlab data connector (#2325 ) * Fix gitlab data connector for self-hosted instances (#2315) * Linting fix. * Load all branches in the GitLab data connector #2319 * #2319 lint fixes. * update fetch on fail --------- Co-authored-by: Błażej Owczarczyk <blazeyy@gmail.com>	2024-09-19 13:34:38 -05:00
Blazej Owczarczyk	b25298c04a	Fix gitlab data connector for self-hosted instances (#2315 ) (#2316 ) * Fix gitlab data connector for self-hosted instances (#2315) * Linting fix.	2024-09-18 16:12:15 -05:00
timothycarambat	9aa77dfb8d	Add verbose logging to GH loader connect #2243	2024-09-09 14:36:37 -07:00
timothycarambat	5f477e0dbd	remove log	2024-09-06 11:37:46 -07:00
timothycarambat	619f6b3884	Ignore SSL errors for web scraper resolves #2114	2024-08-14 09:11:22 -07:00
timothycarambat	b541623c9e	add SSRF notice	2024-08-13 17:46:07 -07:00
Sean Hatfield	2797298507	Fix depth handling in bulk link scraper (#2096 ) fix depth handling in bulk link scraper	2024-08-12 11:44:35 -07:00
Lea Anthony	3b6a2fd2fa	#2084 Support Go filetype (#2085 ) Support Go filetype	2024-08-09 19:29:29 -07:00
Mehmet Ünlü	0d4560b9e4	2049 remove break that prevents fetching files from gitlab repo (#2050 ) fix: remove unnecessary break Remove unnecessary break that prevents checking next pages for blob objects.	2024-08-06 10:17:55 -07:00
Sean Hatfield	be3b0b4916	Youtube loader whitespace fix (#2051 ) youtube loader whitespace fix	2024-08-06 10:16:17 -07:00
Timothy Carambat	04a0fc4ec9	Remove unused deps (#1938 ) * Remove unused deps * improve dependency	2024-07-25 10:21:03 -07:00
Timothy Carambat	42235fcd8a	GitLab Hosted and Local Connector (#1932 ) * Add support for GitLab repo collection as well as Github Repo collection * Refactor for repo collectors to be more compact --------- Co-authored-by: Emil Rofors <emirof@gmail.com>	2024-07-23 12:23:51 -07:00
timothycarambat	f15529653f	patch logger for full logs	2024-07-19 18:35:41 -07:00
timothycarambat	cec1a3d585	append stacktraces to winston	2024-07-19 18:13:54 -07:00
Sean Hatfield	9b86bbd2b8	[FIX] PDFLoader module bug fix (#1879 ) use pdf.js by importing it from pdf-parse and fix custom PDFLoader module	2024-07-16 13:09:43 -07:00
Sean Hatfield	79656718b2	[FEAT] Create custom pdfloader (#1852 ) * implement custom PDFLoader to remove LC dep * remove unneeded comment * remove pdfjs as dep and fix page splitting using pdf-parse * linting + export rename for desktop compat --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-07-11 12:26:11 -07:00
timothycarambat	8658b1e7c7	linting	2024-07-03 18:25:44 -07:00
Timothy Carambat	29c9eeaa5c	Add `winston` logging for production (#1811 )	2024-07-03 16:39:33 -07:00
Sean Hatfield	a87014822a	[REFACTOR] Improve asPDF collector processor with pdfjs (#1791 ) * WIP replace langchain pdfloader with pdfjs and add more context to each page * remove extras from pdfjs and just replace langchain library * remove unneeded dep * fix console log in docs --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-07-03 14:26:48 -07:00
Sean Hatfield	f205d51fe9	[FIX] Confluence code snippet blocks not being extracted (#1804 ) implement custom confluence loader to extract code blocks properly from documents Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-07-03 14:00:44 -07:00
Sean Hatfield	fc375f4036	[FIX] Bulk link scraper bug fix (#1800 ) patch website depth data connector to work for other links that are not root url	2024-07-01 16:59:28 -07:00
Jason Zhang	fa4ab0f65f	fix: sanitize filename before writing (#1743 ) * fix: sanitize filename before writing Fixes: https://github.com/Mintplex-Labs/anything-llm/issues/1737 * fixup * fixup	2024-06-25 15:45:09 -07:00
Timothy Carambat	dc4ad6b5a9	[BETA] Live document sync (#1719 ) * wip bg workers for live document sync * Add ability to re-embed specific documents across many workspaces via background queue bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled UI for watching/unwatching docments that are embedded. TODO: UI to easily manage all bg tasks and see run results TODO: UI to enable this feature and background endpoints to manage it * create frontend views and paths Move elements to correct experimental scope * update migration to delete runs on removal of watched document * Add watch support to YouTube transcripts (#1716) * Add watch support to YouTube transcripts refactor how sync is done for supported types * Watch specific files in Confluence space (#1718) Add failure-prune check for runs * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * dual build update copy of alert modals * update job interval * Add support for live-sync of Github files * update copy for document sync feature * hide Experimental features from UI * update docs links * [FEAT] Implement new settings menu for experimental features (#1735) * implement new settings menu for experimental features * remove unused context save bar --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> * dont run job on boot * unset workflow changes * Add persistent encryption service Relay key to collector so persistent encryption can be used Encrypt any private data in chunkSources used for replay during resync jobs * update jsDOC * Linting and organization * update modal copy for feature --------- Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2024-06-21 13:38:50 -07:00
Timothy Carambat	a598c8e04c	1347 human readable confluence url (#1706 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * refactor implementation of various types of Confluence URL patterns --------- Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net> Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com> Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>	2024-06-17 16:04:20 -07:00
Timothy Carambat	98cef508a6	Feature/devcontv2 (#1622 ) * Updated apt-packages source for devcontainer Switched the devcontainer's package source to a different repository to align with updated dependencies and package availability. The previous source from 'rocker-org' is replaced with 'devcontainers-contrib', which may offer more recent or relevant development tools. * Subject: Centralize prettier ignores and refine config Body: Centralized all prettier ignore rules by removing individual `.prettierignore` files in subprojects and updating the root `.prettierignore` to include previously ignored patterns, ensuring consistency across the workspace. Additionally, the prettier configuration was refined by making the file pattern for `.config.js` files consistent and adjusting quote styles for better readability. All lint scripts across the project were updated to respect the centralized ignore path, enhancing maintainability. The consolidation simplifies the process of managing ignore rules as the project scales, ensuring developers can focus on writing code without worrying about divergent formatting standards. These changes also align with introducing comprehensive linting across multiple environments to keep the codebase clean and consistent. This adjustment is a foundational step towards a more streamlined and unified code base, making it easier for new contributors to adhere to established coding standards and reducing the cognitive load associated with managing multiple configuration files across the project. * unset package json changes --------- Co-authored-by: Francisco Bischoff <franzbischoff@gmail.com> Co-authored-by: Francisco Bischoff <984592+franzbischoff@users.noreply.github.com>	2024-06-06 12:50:42 -07:00
Chris Daniel	8a4dd2bdf5	[FEAT] add support for TSX files to be parsed as text (#1597 ) add support for TSX files to be parsed as text	2024-06-03 17:01:41 +08:00
Sean Hatfield	9a38b32c74	[FEAT] Add support for R files to be parsed as text (#1577 ) add support for R files to be parsed as text	2024-05-31 13:52:00 +08:00
Sean Hatfield	4324a8bb4f	[FEAT] Github repo loader bug fix (#1558 ) * fix project names with special characters for github repo data connector * linting	2024-05-29 17:01:29 +08:00
Timothy Carambat	a89812703b	repatch path normalization (#1516 )	2024-05-23 12:52:04 -07:00
timothycarambat	05488c81e0	undo path norm whitespace fix	2024-05-23 12:04:00 -07:00
timothycarambat	e208074ef4	patch path normalization	2024-05-22 11:50:01 -05:00
Timothy Carambat	1a5aacb001	Support multi-model whispers (#1444 )	2024-05-17 21:31:29 -07:00
Timothy Carambat	7e0b638a2c	Patch confluence URL patterns(#1426 ) * patch confluence patterns --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-05-16 14:15:59 -07:00
timothycarambat	87b41a60e9	refactor spaceKey url pattern for custom domains	2024-05-16 11:01:34 -07:00
Predrag Stojadinović	cf969adf37	1362 custom display confluence url (#1423 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: adding /display/ url matching to confluence data connector	2024-05-16 10:46:18 -07:00
timothycarambat	b5ac944475	patch: bulk-scraper, update when folder is made and path creation params	2024-05-14 12:57:23 -07:00
Sean Hatfield	612a7e1662	[FEAT] Website depth scraping data connector (#1191 ) * WIP website depth scraping, (sort of works) * website depth data connector stable + add maxLinks option * linting + loading small ui tweak * refactor website depth data connector for stability, speed, & readability * patch: remove console log Guard clause on URL validitiy check reasonable overrides --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 12:49:14 -07:00
jazelly	d71db22799	fix: skip undefined confluence pageContent (#1383 ) Refs: https://github.com/Mintplex-Labs/anything-llm/issues/1381 Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-05-14 10:22:13 -07:00
Predrag Stojadinović	78e3e35d27	[FEAT] Confluence Data Connector handles custom Confluence urls (#1362 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint	2024-05-14 10:21:04 -07:00
timothycarambat	2d215acb75	patch storage dirs for extensions	2024-05-02 14:03:10 -07:00
timothycarambat	1aa8e5766f	duplicate key (no impact)	2024-05-02 13:05:20 -07:00
Timothy Carambat	547d4859ef	Bump `openai` package to latest (#1234 ) * Bump `openai` package to latest Tested all except localai * bump LocalAI support with latest image * add deprecation notice * linting	2024-04-30 12:33:42 -07:00
Timothy Carambat	94017e2b51	bump langchain deps (#1231 ) * bump langchain deps * patch native and ollama providers remove deprecated deps --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-04-30 12:04:24 -07:00
Sean Hatfield	348b36bf85	[FEAT] Confluence data connector (#1181 ) * WIP Confluence data connector backend * confluence data connector complete * confluence citations * fix citation for confluence * Patch confulence integration * fix Citation Icon for confluence --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-04-25 17:53:38 -07:00

1 2 3 4 5

213 Commits