merlyn

Author	SHA1	Message	Date
Asish Kumar	91e75c27c2	fix: preserve Confluence context paths (#5415 ) * fix: preserve confluence context paths * lint and minor changes --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-04-13 13:10:40 -07:00
Timothy Carambat	dc0bdf112b	linting & show descriptive error for bad `addtoWorkspace` request body resolves #5172	2026-03-09 11:30:53 -07:00
Maxwell Calkin	563f95167d	fix: add missing /wiki to Confluence cloud citation URLs (#5167 ) fix: add /wiki to Confluence cloud page URLs in citations	2026-03-09 10:24:56 -07:00
Marcello Fitton	8f33203ade	chore: add ESLint to `/collector` (#5128 ) * add eslint config to /collector * prettier formatting * fix unused * fix undefined * disable lines * lockfile --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-03-05 16:25:23 -08:00
Timothy Carambat	d58ff0ea3e	Normalize scraper runtimeargs for bulk-scraper (#5083 ) resolves #5078 closes #5079	2026-02-27 09:15:17 -08:00
Marcello Fitton	c927eda18f	fix: GitLab connector infinite loop and rate limit crash for large repos (#5021 ) * Fix infinite loop and rate limit crashes * simplify logic \| add max-retries to fetchNextPage and fetchSingleFileContents --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-02-19 12:42:21 -08:00
Timothy Carambat	2dc625193e	4825 patch yt file collector api (#4904 ) Patch YT links in API document collector closes #4825	2026-01-26 14:36:21 -08:00
j0rDy	f52e2866ac	Update common.js (#4894 ) * Update common.js Added missing translations in Dutch. * linting --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-01-23 17:12:17 -08:00
Timothy Carambat	4de5e30ac6	Merge commit from fork	2026-01-23 17:06:44 -08:00
Timothy Carambat	092b1b45f8	Upgrade YT Scraper (#4820 )	2026-01-02 15:41:22 -08:00
Sean Hatfield	c76b0708c3	Fix pagination bug in paperless-ngx data connector (#4757 ) * iterate over all pages in paperless-ngx data connector * add error handling and data validation * refactor to handle edge cases and null values * catch edge case to prevent infinite loop --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-12-12 10:23:32 -08:00
timothycarambat	758db6b677	fix lint	2025-11-25 14:42:10 -08:00
Neha Prasad	3ecf218eea	feat: Add SSL certificate bypass support for self-hosted Confluence instances (#4219 ) * Added bypassSSL parameter to constructor and implemented SSL bypass logic in fetchConfluenceData method * Updated generateChunkSource function to include bypassSSL in the encrypted payload * Updated the request body to include bypassSSL in the JSON payload sent to the backend * Updated form submission to include bypassSSL parameter from the checkbox * Added bypass_ssl: "Bypass SSL Certificate Validation" translation * passed these parameters to fetchconfluencepage function for proper resync functionality * allow ignore of SSL cert for Confluence * add translations --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-25 14:32:10 -08:00
Sean Hatfield	05df4ac72b	Paperless ngx data connector (#4121 ) * paperless ngx data connector * wip resync paperless ngx * fix generateChunkSource for resyncing paperless ngx * lint * Refactor Paperless-NGX connector Fix issue with date rendering in tooltip + extended width Move tooltip details to be column for more space --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-20 11:27:38 -08:00
Timothy Carambat	b3b261e15d	Fix loop logic for `fetchNextPage` use in GitLabLoader (#4662 ) resolves #4626 closes #4627	2025-11-19 13:53:26 -08:00
Marcello Fitton	d3619689db	Refactor `loadYouTubeTranscript()` to include YouTube Video Metadata in Content When `parseOnly` is `true` (#4552 ) * Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true * extract to function --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-15 15:42:00 -07:00
Timothy Carambat	5edc1bea42	Add ability to auto-handle YT video URLs in uploader & chat (#4547 ) * Add ability to auto-handle YT video URLs in uploader & chat * move YT validator to URL utils * update comment	2025-10-15 12:18:57 -07:00
AoiYamada	8fc1f24d1b	fix: youtube transcript collector not work well with non en or non asr caption (#4442 ) * fix: youtube transcript collector not work well with non en or non asr caption * stub YT test in Github actions --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-29 13:22:50 -07:00
Timothy Carambat	70a07b743b	Update `writeToServerDocuments` to take config object (#4213 )	2025-07-29 17:53:05 -07:00
timothycarambat	ff34c8cefc	use documentsFolder path for simplification	2025-07-16 11:14:18 -07:00
Sean Hatfield	5485c58b44	Sanitize youtube transcription file paths (#4148 ) sanitize youtube transcription file paths	2025-07-14 13:53:34 -07:00
rexjohannes	14fa079953	Fix/drupal wiki (improve table & url handling) (#4097 ) * feat: add support for custom table formatting in htmlToText conversion * fix tables * feat: improve plain text table formatting for AI readability * fix options * improve drupal wiki connector * final fix * adjust leading slash to match code * linting --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:39:38 -07:00
bobbercheng	d0978fa363	Fix broken YT scraping with YT API (#4005 ) * Fix broken YT scraping with YT API * refactor youtube transcript class/add jsdoc comments * fix test --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:06:18 -07:00
timothycarambat	3d5e8602a8	lint	2025-05-27 13:54:13 -07:00
rexjohannes	dc80d3e535	fixed drupal connector (#3893 ) https://github.com/Mintplex-Labs/anything-llm/issues/3875#issuecomment-2913211343	2025-05-27 13:15:43 -07:00
Timothy Carambat	245a5969b8	normalize path on drupal to use documentsFolder constant normalize path on drupal to use documentsFolder constant	2025-05-27 09:25:48 -07:00
Sean Hatfield	2b274c62b7	Obsidian data connector (#3798 ) * add obsidian vault data connector * lint * add english translations * normalize translations * improve file parser and reader --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-05-12 13:45:27 -07:00
timothycarambat	9d661bb96e	linting	2025-05-07 09:40:31 -07:00
mr-chenguang	eff9d24cb9	feat: support fetch wikis for gitlab data connectors (#3271 ) * feat: support fetch wikis for gitlab data connectors * gitlab connector button spacing * add docAuthor and description metadata for GitLab wiki pages --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-05-06 14:09:53 -07:00
Timothy Carambat	fd4929b4d2	Feature/drupalwiki collector (#3693 ) * Implement DrupalWiki collector * Add attachment downloading and processing functionality (#3) * linting * Linting Add citation image small refactors add URL for citation identifier --------- Co-authored-by: em <eugen.mayer@kontextwork.de> Co-authored-by: rexjohannes <53578137+rexjohannes@users.noreply.github.com> Co-authored-by: Eugen Mayer <136934+EugenMayer@users.noreply.github.com>	2025-04-21 09:17:24 -07:00
Timothy Carambat	fd174cab86	Apply `.git` logic handler for repo URLs (#3655 ) * Apply `.git` logic handler for repo URLs * remove comment	2025-04-15 18:01:14 -07:00
t2	0eb86e2c12	for projects in gitlab subgroup (#3075 ) (#3247 ) * for projects in gitlab subgroup (#3075) * fix: false condition * refactor pattern matching logic --------- Co-authored-by: t2 <> Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-17 12:25:11 -08:00
mr-chenguang	6ffdbf074d	feat(dataconnectors): support confluence personal access token (#3206 ) * feat(dataconnectors): support confluence personal access token * fix: change select option * linting change name on accesstype field --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-02-14 12:12:01 -08:00
Adam Setch	d63438fa61	chore: rename Github to GitHub (#3199 ) * chore: rename Github to GitHub Signed-off-by: Adam Setch <adam.setch@outlook.com> * chore: rename Github to GitHub Signed-off-by: Adam Setch <adam.setch@outlook.com> * Undo some code changes for references --------- Signed-off-by: Adam Setch <adam.setch@outlook.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-02-13 10:45:43 -08:00
Timothy Carambat	d1ca16f7f8	Add tokenizer improvments via Singleton class and estimation (#3072 ) * Add tokenizer improvments via Singleton class linting * dev build * Estimation fallback when string exceeds a fixed byte size * Add notice to tiktoken on backend	2025-01-30 17:55:03 -08:00
Sean Hatfield	9bc01afa7d	Fix scraping failed bug in link/bulk link scrapers (#2807 ) * fix scraping failed bug in link/bulk link scrapers * reset submodule * swap to networkidle2 as a safe mix for SPA and API-loaded pages, but also not hang on request heavy pages * lint --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2024-12-11 14:01:52 -08:00
Sean Hatfield	0074ededdd	Github data connector improvements (#2439 ) * fix tree/blob github urls from branches not being loaded * improve ux of github data connector * lint * patch Github URL parser to just validate with `URL` native parser * uncheck LocalStorage of PAT for security reasons --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-10-21 15:25:35 -07:00
timothycarambat	ab6f03ce1c	linting	2024-10-18 11:44:14 -07:00
Sean Hatfield	41522cdfb4	Handle non-ascii characters in single and bulk link scraper URLs (#2495 ) handle non-ascii characters in urls	2024-10-17 17:04:00 -07:00
Timothy Carambat	30645831a1	1959 filetype filters (#2378 ) * Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly. * add @langchain/community to collector package.json * fix: Improve handling of complex ignore patterns in GitLabRepoLoader * refactor: use ignore package for simplified ignore logic * run yarn lint * add @langchain/community@^0.2.23 * remove unused dep lint --------- Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>	2024-09-26 12:50:35 -07:00
Blazej Owczarczyk	b2123b13b0	Added an option to fetch issues from gitlab. Made the file fetching a… (#2335 ) * Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. #2334 * Fixed a typo in loadGitlabRepo. * Convert issues to markdown. * Fixed an issue with time estimate field names in issueToMarkdown. * handle rate limits more gracefully + update checkbox to toggle switch * lint --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com> Co-authored-by: shatfield4 <seanhatfield5@gmail.com>	2024-09-26 11:45:18 -07:00
Timothy Carambat	961b567541	Add dropdown for confluence connector deployment (#2376 )	2024-09-26 08:49:05 -07:00
Sean Hatfield	4488744850	Support more Confluence URL formats (#2118 ) * support more confluence url formats * use pattern matching for confluence urls and manual splitting as fallback * rework entire Confluence flow to prevent issues with custom, local, and cloud spaces * remove dep --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2024-09-25 16:12:17 -07:00
Sean Hatfield	5a3d55db67	Fix custom domain in confluence (#2328 ) confluence custom domain fix	2024-09-19 15:36:07 -05:00
Timothy Carambat	4fa3d6d333	Load all branches in gitlab data connector (#2325 ) * Fix gitlab data connector for self-hosted instances (#2315) * Linting fix. * Load all branches in the GitLab data connector #2319 * #2319 lint fixes. * update fetch on fail --------- Co-authored-by: Błażej Owczarczyk <blazeyy@gmail.com>	2024-09-19 13:34:38 -05:00
Blazej Owczarczyk	b25298c04a	Fix gitlab data connector for self-hosted instances (#2315 ) (#2316 ) * Fix gitlab data connector for self-hosted instances (#2315) * Linting fix.	2024-09-18 16:12:15 -05:00
timothycarambat	9aa77dfb8d	Add verbose logging to GH loader connect #2243	2024-09-09 14:36:37 -07:00
Sean Hatfield	2797298507	Fix depth handling in bulk link scraper (#2096 ) fix depth handling in bulk link scraper	2024-08-12 11:44:35 -07:00
Mehmet Ünlü	0d4560b9e4	2049 remove break that prevents fetching files from gitlab repo (#2050 ) fix: remove unnecessary break Remove unnecessary break that prevents checking next pages for blob objects.	2024-08-06 10:17:55 -07:00
Sean Hatfield	be3b0b4916	Youtube loader whitespace fix (#2051 ) youtube loader whitespace fix	2024-08-06 10:16:17 -07:00

1 2

73 Commits