merlyn

Author	SHA1	Message	Date
Timothy Carambat	092b1b45f8	Upgrade YT Scraper (#4820 )	2026-01-02 15:41:22 -08:00
Sean Hatfield	c76b0708c3	Fix pagination bug in paperless-ngx data connector (#4757 ) * iterate over all pages in paperless-ngx data connector * add error handling and data validation * refactor to handle edge cases and null values * catch edge case to prevent infinite loop --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-12-12 10:23:32 -08:00
timothycarambat	758db6b677	fix lint	2025-11-25 14:42:10 -08:00
Neha Prasad	3ecf218eea	feat: Add SSL certificate bypass support for self-hosted Confluence instances (#4219 ) * Added bypassSSL parameter to constructor and implemented SSL bypass logic in fetchConfluenceData method * Updated generateChunkSource function to include bypassSSL in the encrypted payload * Updated the request body to include bypassSSL in the JSON payload sent to the backend * Updated form submission to include bypassSSL parameter from the checkbox * Added bypass_ssl: "Bypass SSL Certificate Validation" translation * passed these parameters to fetchconfluencepage function for proper resync functionality * allow ignore of SSL cert for Confluence * add translations --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-25 14:32:10 -08:00
Sean Hatfield	05df4ac72b	Paperless ngx data connector (#4121 ) * paperless ngx data connector * wip resync paperless ngx * fix generateChunkSource for resyncing paperless ngx * lint * Refactor Paperless-NGX connector Fix issue with date rendering in tooltip + extended width Move tooltip details to be column for more space --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-20 11:27:38 -08:00
Timothy Carambat	b3b261e15d	Fix loop logic for `fetchNextPage` use in GitLabLoader (#4662 ) resolves #4626 closes #4627	2025-11-19 13:53:26 -08:00
Marcello Fitton	d3619689db	Refactor `loadYouTubeTranscript()` to include YouTube Video Metadata in Content When `parseOnly` is `true` (#4552 ) * Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true * extract to function --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-15 15:42:00 -07:00
Timothy Carambat	5edc1bea42	Add ability to auto-handle YT video URLs in uploader & chat (#4547 ) * Add ability to auto-handle YT video URLs in uploader & chat * move YT validator to URL utils * update comment	2025-10-15 12:18:57 -07:00
AoiYamada	8fc1f24d1b	fix: youtube transcript collector not work well with non en or non asr caption (#4442 ) * fix: youtube transcript collector not work well with non en or non asr caption * stub YT test in Github actions --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-29 13:22:50 -07:00
Timothy Carambat	70a07b743b	Update `writeToServerDocuments` to take config object (#4213 )	2025-07-29 17:53:05 -07:00
timothycarambat	ff34c8cefc	use documentsFolder path for simplification	2025-07-16 11:14:18 -07:00
Sean Hatfield	5485c58b44	Sanitize youtube transcription file paths (#4148 ) sanitize youtube transcription file paths	2025-07-14 13:53:34 -07:00
rexjohannes	14fa079953	Fix/drupal wiki (improve table & url handling) (#4097 ) * feat: add support for custom table formatting in htmlToText conversion * fix tables * feat: improve plain text table formatting for AI readability * fix options * improve drupal wiki connector * final fix * adjust leading slash to match code * linting --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:39:38 -07:00
bobbercheng	d0978fa363	Fix broken YT scraping with YT API (#4005 ) * Fix broken YT scraping with YT API * refactor youtube transcript class/add jsdoc comments * fix test --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:06:18 -07:00
timothycarambat	3d5e8602a8	lint	2025-05-27 13:54:13 -07:00
rexjohannes	dc80d3e535	fixed drupal connector (#3893 ) https://github.com/Mintplex-Labs/anything-llm/issues/3875#issuecomment-2913211343	2025-05-27 13:15:43 -07:00
Timothy Carambat	245a5969b8	normalize path on drupal to use documentsFolder constant normalize path on drupal to use documentsFolder constant	2025-05-27 09:25:48 -07:00
Sean Hatfield	2b274c62b7	Obsidian data connector (#3798 ) * add obsidian vault data connector * lint * add english translations * normalize translations * improve file parser and reader --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-05-12 13:45:27 -07:00
timothycarambat	9d661bb96e	linting	2025-05-07 09:40:31 -07:00
mr-chenguang	eff9d24cb9	feat: support fetch wikis for gitlab data connectors (#3271 ) * feat: support fetch wikis for gitlab data connectors * gitlab connector button spacing * add docAuthor and description metadata for GitLab wiki pages --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-05-06 14:09:53 -07:00
Timothy Carambat	fd4929b4d2	Feature/drupalwiki collector (#3693 ) * Implement DrupalWiki collector * Add attachment downloading and processing functionality (#3) * linting * Linting Add citation image small refactors add URL for citation identifier --------- Co-authored-by: em <eugen.mayer@kontextwork.de> Co-authored-by: rexjohannes <53578137+rexjohannes@users.noreply.github.com> Co-authored-by: Eugen Mayer <136934+EugenMayer@users.noreply.github.com>	2025-04-21 09:17:24 -07:00
Timothy Carambat	fd174cab86	Apply `.git` logic handler for repo URLs (#3655 ) * Apply `.git` logic handler for repo URLs * remove comment	2025-04-15 18:01:14 -07:00
t2	0eb86e2c12	for projects in gitlab subgroup (#3075 ) (#3247 ) * for projects in gitlab subgroup (#3075) * fix: false condition * refactor pattern matching logic --------- Co-authored-by: t2 <> Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-17 12:25:11 -08:00
mr-chenguang	6ffdbf074d	feat(dataconnectors): support confluence personal access token (#3206 ) * feat(dataconnectors): support confluence personal access token * fix: change select option * linting change name on accesstype field --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-02-14 12:12:01 -08:00
Adam Setch	d63438fa61	chore: rename Github to GitHub (#3199 ) * chore: rename Github to GitHub Signed-off-by: Adam Setch <adam.setch@outlook.com> * chore: rename Github to GitHub Signed-off-by: Adam Setch <adam.setch@outlook.com> * Undo some code changes for references --------- Signed-off-by: Adam Setch <adam.setch@outlook.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-02-13 10:45:43 -08:00
Timothy Carambat	d1ca16f7f8	Add tokenizer improvments via Singleton class and estimation (#3072 ) * Add tokenizer improvments via Singleton class linting * dev build * Estimation fallback when string exceeds a fixed byte size * Add notice to tiktoken on backend	2025-01-30 17:55:03 -08:00
Sean Hatfield	9bc01afa7d	Fix scraping failed bug in link/bulk link scrapers (#2807 ) * fix scraping failed bug in link/bulk link scrapers * reset submodule * swap to networkidle2 as a safe mix for SPA and API-loaded pages, but also not hang on request heavy pages * lint --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-12-11 14:01:52 -08:00
Sean Hatfield	0074ededdd	Github data connector improvements (#2439 ) * fix tree/blob github urls from branches not being loaded * improve ux of github data connector * lint * patch Github URL parser to just validate with `URL` native parser * uncheck LocalStorage of PAT for security reasons --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-10-21 15:25:35 -07:00
timothycarambat	ab6f03ce1c	linting	2024-10-18 11:44:14 -07:00
Sean Hatfield	41522cdfb4	Handle non-ascii characters in single and bulk link scraper URLs (#2495 ) handle non-ascii characters in urls	2024-10-17 17:04:00 -07:00
Timothy Carambat	30645831a1	1959 filetype filters (#2378 ) * Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly. * add @langchain/community to collector package.json * fix: Improve handling of complex ignore patterns in GitLabRepoLoader * refactor: use ignore package for simplified ignore logic * run yarn lint * add @langchain/community@^0.2.23 * remove unused dep lint --------- Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>	2024-09-26 12:50:35 -07:00
Blazej Owczarczyk	b2123b13b0	Added an option to fetch issues from gitlab. Made the file fetching a… (#2335 ) * Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. #2334 * Fixed a typo in loadGitlabRepo. * Convert issues to markdown. * Fixed an issue with time estimate field names in issueToMarkdown. * handle rate limits more gracefully + update checkbox to toggle switch * lint --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com> Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-09-26 11:45:18 -07:00
Timothy Carambat	961b567541	Add dropdown for confluence connector deployment (#2376 )	2024-09-26 08:49:05 -07:00
Sean Hatfield	4488744850	Support more Confluence URL formats (#2118 ) * support more confluence url formats * use pattern matching for confluence urls and manual splitting as fallback * rework entire Confluence flow to prevent issues with custom, local, and cloud spaces * remove dep --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-09-25 16:12:17 -07:00
Sean Hatfield	5a3d55db67	Fix custom domain in confluence (#2328 ) confluence custom domain fix	2024-09-19 15:36:07 -05:00
Timothy Carambat	4fa3d6d333	Load all branches in gitlab data connector (#2325 ) * Fix gitlab data connector for self-hosted instances (#2315) * Linting fix. * Load all branches in the GitLab data connector #2319 * #2319 lint fixes. * update fetch on fail --------- Co-authored-by: Błażej Owczarczyk <blazeyy@gmail.com>	2024-09-19 13:34:38 -05:00
Blazej Owczarczyk	b25298c04a	Fix gitlab data connector for self-hosted instances (#2315 ) (#2316 ) * Fix gitlab data connector for self-hosted instances (#2315) * Linting fix.	2024-09-18 16:12:15 -05:00
timothycarambat	9aa77dfb8d	Add verbose logging to GH loader connect #2243	2024-09-09 14:36:37 -07:00
Sean Hatfield	2797298507	Fix depth handling in bulk link scraper (#2096 ) fix depth handling in bulk link scraper	2024-08-12 11:44:35 -07:00
Mehmet Ünlü	0d4560b9e4	2049 remove break that prevents fetching files from gitlab repo (#2050 ) fix: remove unnecessary break Remove unnecessary break that prevents checking next pages for blob objects.	2024-08-06 10:17:55 -07:00
Sean Hatfield	be3b0b4916	Youtube loader whitespace fix (#2051 ) youtube loader whitespace fix	2024-08-06 10:16:17 -07:00
Timothy Carambat	42235fcd8a	GitLab Hosted and Local Connector (#1932 ) * Add support for GitLab repo collection as well as Github Repo collection * Refactor for repo collectors to be more compact --------- Co-authored-by: Emil Rofors <emirof@gmail.com>	2024-07-23 12:23:51 -07:00
Sean Hatfield	f205d51fe9	[FIX] Confluence code snippet blocks not being extracted (#1804 ) implement custom confluence loader to extract code blocks properly from documents Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-07-03 14:00:44 -07:00
Sean Hatfield	fc375f4036	[FIX] Bulk link scraper bug fix (#1800 ) patch website depth data connector to work for other links that are not root url	2024-07-01 16:59:28 -07:00
Jason Zhang	fa4ab0f65f	fix: sanitize filename before writing (#1743 ) * fix: sanitize filename before writing Fixes: https://github.com/Mintplex-Labs/anything-llm/issues/1737 * fixup * fixup	2024-06-25 15:45:09 -07:00
Timothy Carambat	dc4ad6b5a9	[BETA] Live document sync (#1719 ) * wip bg workers for live document sync * Add ability to re-embed specific documents across many workspaces via background queue bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled UI for watching/unwatching docments that are embedded. TODO: UI to easily manage all bg tasks and see run results TODO: UI to enable this feature and background endpoints to manage it * create frontend views and paths Move elements to correct experimental scope * update migration to delete runs on removal of watched document * Add watch support to YouTube transcripts (#1716) * Add watch support to YouTube transcripts refactor how sync is done for supported types * Watch specific files in Confluence space (#1718) Add failure-prune check for runs * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * create tmp workflow modifications for beta image * dual build update copy of alert modals * update job interval * Add support for live-sync of Github files * update copy for document sync feature * hide Experimental features from UI * update docs links * [FEAT] Implement new settings menu for experimental features (#1735) * implement new settings menu for experimental features * remove unused context save bar --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> * dont run job on boot * unset workflow changes * Add persistent encryption service Relay key to collector so persistent encryption can be used Encrypt any private data in chunkSources used for replay during resync jobs * update jsDOC * Linting and organization * update modal copy for feature --------- Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2024-06-21 13:38:50 -07:00
Timothy Carambat	a598c8e04c	1347 human readable confluence url (#1706 ) * chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones * chore: formatting as per yarn lint * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * chore: fixing the human readable confluence url fetch baseUrl * refactor implementation of various types of Confluence URL patterns --------- Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net> Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com> Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>	2024-06-17 16:04:20 -07:00
Sean Hatfield	4324a8bb4f	[FEAT] Github repo loader bug fix (#1558 ) * fix project names with special characters for github repo data connector * linting	2024-05-29 17:01:29 +08:00
Timothy Carambat	7e0b638a2c	Patch confluence URL patterns(#1426 ) * patch confluence patterns --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-05-16 14:15:59 -07:00
timothycarambat	87b41a60e9	refactor spaceKey url pattern for custom domains	2024-05-16 11:01:34 -07:00

1 2

64 Commits