merlyn

Author	SHA1	Message	Date
Neha Prasad	3ecf218eea	feat: Add SSL certificate bypass support for self-hosted Confluence instances (#4219 ) * Added bypassSSL parameter to constructor and implemented SSL bypass logic in fetchConfluenceData method * Updated generateChunkSource function to include bypassSSL in the encrypted payload * Updated the request body to include bypassSSL in the JSON payload sent to the backend * Updated form submission to include bypassSSL parameter from the checkbox * Added bypass_ssl: "Bypass SSL Certificate Validation" translation * passed these parameters to fetchconfluencepage function for proper resync functionality * allow ignore of SSL cert for Confluence * add translations --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-25 14:32:10 -08:00
Sean Hatfield	05df4ac72b	Paperless ngx data connector (#4121 ) * paperless ngx data connector * wip resync paperless ngx * fix generateChunkSource for resyncing paperless ngx * lint * Refactor Paperless-NGX connector Fix issue with date rendering in tooltip + extended width Move tooltip details to be column for more space --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-20 11:27:38 -08:00
Timothy Carambat	b3b261e15d	Fix loop logic for `fetchNextPage` use in GitLabLoader (#4662 ) resolves #4626 closes #4627	2025-11-19 13:53:26 -08:00
Marcello Fitton	376c9f7f3f	Install `patch-package` in `/collector` and Apply Patch to Fix EPub Upload Bug (#4630 ) * Install patch-package and postinstall-postinstall * Implement patch to ensure title is always a string in EPub class	2025-11-19 13:17:58 -08:00
Marcello Fitton	d3619689db	Refactor `loadYouTubeTranscript()` to include YouTube Video Metadata in Content When `parseOnly` is `true` (#4552 ) * Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true * extract to function --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-15 15:42:00 -07:00
Timothy Carambat	5edc1bea42	Add ability to auto-handle YT video URLs in uploader & chat (#4547 ) * Add ability to auto-handle YT video URLs in uploader & chat * move YT validator to URL utils * update comment	2025-10-15 12:18:57 -07:00
timothycarambat	71cd46ce1b	1.9.0 tag	2025-10-09 15:11:59 -07:00
Marcello Fitton	d48c76919c	Fix: File pulling fails with uppercase URL characters (#4516 ) * fix: remove unnecessary toLowerCase in URL validation * test: enhance URL validation tests to preserve case sensitivity and format * test: update URL validation tests to ensure domain normalization to lowercase while preserving path case * small formatting * fix filenames when downloading live URI --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-08 14:00:02 -07:00
timothycarambat	8bc6aa7126	missed lint	2025-10-08 12:57:31 -07:00
timothycarambat	5173c75113	rescope validatedLink to local var	2025-10-07 12:08:53 -07:00
Timothy Carambat	cf3fbcbf0f	Improve URL handler for collector processes (#4504 ) * Improve URL handler for collector processes * dev build	2025-10-07 11:03:27 -07:00
timothycarambat	bdfa0328db	update comment about parseOnly	2025-10-01 20:45:52 -07:00
Marcello Fitton	f7b90571be	Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 ) * Add capability to web scraping feature for document creation to download and parse statically hosted files * lint * Remove unneeded comment * Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files * Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js * Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent \| Return explicit argument of captureAs into scrapeGenericUrl in processLink fn * Return debug log for scrapeGenericUrl * Change conditional to a guard clause. * Add error handling, validation, and JSDOC to getContentType helper fn * remove unneeded comments * Simplify URL validation by reusing module * Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module \| Add URL valuidation to downloadURIToFile * refactor * add support for webp remove unused imports --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-01 15:49:05 -07:00
Marcello Fitton	eb77876127	Add HTTP request/response logging middleware for development mode (#4425 ) * Add HTTP request logging middleware for development mode - Introduced httpLogger middleware to log HTTP requests and responses. - Enabled logging only in development mode to assist with debugging. * Update httpLogger middleware to disable time logging by default * Add httpLogger middleware for development mode in collector service * Refactor httpLogger middleware to rename timeLogs parameter to enableTimestamps for clarity * Make HTTP Logger only mount in development and environment flag is enabled. * Update .env.example to clarify HTTP Logger configuration comments --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-29 13:33:15 -07:00
AoiYamada	8fc1f24d1b	fix: youtube transcript collector not work well with non en or non asr caption (#4442 ) * fix: youtube transcript collector not work well with non en or non asr caption * stub YT test in Github actions --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-29 13:22:50 -07:00
Timothy Carambat	95557ee16f	Allow user to specify args for chromium process so they dont need SYS_ADMIN on container. (#4397 ) * allow user to specify args for chromium process so they dont need SYS_ADMIN perms * use arg flag content * update console outputs	2025-09-17 16:31:08 -07:00
Jonas Stawski	b8d4cc3454	Added metadata parameter to document/upload, document/upload/{folderName}, and document/upload-link (#4342 ) * Added the ability to pass in metadata to the /document/upload/{folderName} endpoint * Added the ability to pass in metadata to the /document/upload-link endpoint * feat: added metadata to document/upload api endpoint * simplify optional metadata in document dev api endpoints * lint * patch handling of metadata in dev api * Linting, small comments --------- Co-authored-by: jstawskigmi <jstawski@getmyinterns.org> Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-17 11:17:29 -07:00
timothycarambat	bb7d65f0eb	patch missing options resolves #4316	2025-08-20 10:51:14 -07:00
timothycarambat	a4a84f9bdd	forgot 1.8.5 tag :)	2025-08-14 17:43:55 -07:00
timothycarambat	0200e647b8	add back normalization + docs link	2025-08-14 11:43:04 -07:00
Timothy Carambat	0fb33736da	Workspace Chat with documents overhaul (#4261 ) * Create parse endpoint in collector (#4212) * create parse endpoint in collector * revert cleanup temp util call * lint * remove unused cleanupTempDocuments function * revert slug change minor change for destinations --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> * Add parsed files table and parse server endpoints (#4222) * add workspace_parsed_files table + parse endpoints/models * remove dev api parse endpoint * remove unneeded imports * iterate over all files + remove unneeded update function + update telemetry debounce * Upload UI/UX context window check + frontend alert (#4230) * prompt user to embed if exceeds prompt window + handle embed + handle cancel * add tokenCountEstimate to workspace_parsed_files + optimizations * use util for path locations + use safeJsonParse * add modal for user decision on overflow of context window * lint * dynamic fetching of provider/model combo + inject parsed documents * remove unneeded comments * popup ui for attaching/removing files + warning to embed + wip fetching states on update * remove prop drilling, fetch files/limits directly in attach files popup * rework ux of FE + BE optimizations * fix ux of FE + BE optimizations * Implement bidirectional sync for parsed file states linting small changes and comments * move parse support to another endpoint file simplify calls and loading of records * button borders * enable default users to upload parsed files but NOT embed * delete cascade on user/workspace/thread deletion to remove parsedFileRecord * enable bgworker with "always" jobs and optional document sync jobs orphan document job: Will find any broken reference files to prevent overpollution of the storage folder. This will run 10s after boot and every 12hr after * change run timeout for orphan job to 1m to allow settling before spawning a worker * linting and cleanup pr --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com> * dev build * fix tooltip hiding during embedding overflow files * prevent crash log from ERRNO on parse files * unused import * update docs link * Migrate parsed-files to GET endpoint patch logic for grabbing models names from utils better handling for undetermined context windows (null instead of Pos_INIFI) UI placeholder for null context windows * patch URL --------- Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2025-08-11 09:26:19 -07:00
Timothy Carambat	70a07b743b	Update `writeToServerDocuments` to take config object (#4213 )	2025-07-29 17:53:05 -07:00
timothycarambat	7692775942	minor change to XLSX parse and upload output folder	2025-07-29 17:44:47 -07:00
timothycarambat	ff34c8cefc	use documentsFolder path for simplification	2025-07-16 11:14:18 -07:00
timothycarambat	c535c69345	1.8.4 tag update	2025-07-16 10:40:39 -07:00
Sean Hatfield	5485c58b44	Sanitize youtube transcription file paths (#4148 ) sanitize youtube transcription file paths	2025-07-14 13:53:34 -07:00
Sean Hatfield	5d60047dc7	Handle BigInt in message response (#4110 ) * wip handle bigints in message response * extend bigint protoype to handle bigint stringification + add test * unset unrelated file * update tests, simplify implementation; --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-07-10 12:33:34 -07:00
Timothy Carambat	8001d9ddeb	update 1.8.3 tags for release (#4109 ) * update 1.8.3 tags for release * whoops, botched news	2025-07-09 12:17:56 -07:00
rexjohannes	14fa079953	Fix/drupal wiki (improve table & url handling) (#4097 ) * feat: add support for custom table formatting in htmlToText conversion * fix tables * feat: improve plain text table formatting for AI readability * fix options * improve drupal wiki connector * final fix * adjust leading slash to match code * linting --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:39:38 -07:00
bobbercheng	d0978fa363	Fix broken YT scraping with YT API (#4005 ) * Fix broken YT scraping with YT API * refactor youtube transcript class/add jsdoc comments * fix test --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:06:18 -07:00
Timothy Carambat	64d9fbc8f0	Show app version in system settings sidebar (#4044 ) * Add version tagging resolves #4038 closes #4034 closes #4028 * add hook * add build * patch	2025-06-24 13:56:12 -07:00
timothycarambat	3d5e8602a8	lint	2025-05-27 13:54:13 -07:00
rexjohannes	dc80d3e535	fixed drupal connector (#3893 ) https://github.com/Mintplex-Labs/anything-llm/issues/3875#issuecomment-2913211343	2025-05-27 13:15:43 -07:00
Timothy Carambat	245a5969b8	normalize path on drupal to use documentsFolder constant normalize path on drupal to use documentsFolder constant	2025-05-27 09:25:48 -07:00
Sean Hatfield	2b274c62b7	Obsidian data connector (#3798 ) * add obsidian vault data connector * lint * add english translations * normalize translations * improve file parser and reader --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-05-12 13:45:27 -07:00
Timothy Carambat	6fc0a6a644	Enable workflow rule for package verification (#3778 ) enable workflow rule	2025-05-07 12:51:14 -07:00
timothycarambat	3f4fda86bf	match openai versions across collector/backend	2025-05-07 12:30:09 -07:00
timothycarambat	9d661bb96e	linting	2025-05-07 09:40:31 -07:00
mr-chenguang	eff9d24cb9	feat: support fetch wikis for gitlab data connectors (#3271 ) * feat: support fetch wikis for gitlab data connectors * gitlab connector button spacing * add docAuthor and description metadata for GitLab wiki pages --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-05-06 14:09:53 -07:00
Sean Hatfield	610bdd4673	Allow custom headers in upload-link endpoint (#3695 ) * allow custom headers in upload-link endpoint * override loader.scrape to allow for passing of headers in langchain puppeteer * lint * Rename some variables move positional args to named args update documentation to reflect arg changes and funciton sigs validate header object before attempting to end to forward to request * update header validation for custom headers --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-04-22 12:47:12 -07:00
Timothy Carambat	1601eb986c	Enable bypass of ip limitations via ENV in collector processing (#3652 ) * Enable bypass of ip limitations via ENV in collector startup resolves #3625 connect #3626 * dev build * bump dockerx build action * enable runtime setting config of collector requests * comments and linting for option passing * unset * unset * update docs link * linting and docs	2025-04-21 11:10:41 -07:00
Timothy Carambat	fd4929b4d2	Feature/drupalwiki collector (#3693 ) * Implement DrupalWiki collector * Add attachment downloading and processing functionality (#3) * linting * Linting Add citation image small refactors add URL for citation identifier --------- Co-authored-by: em <eugen.mayer@kontextwork.de> Co-authored-by: rexjohannes <53578137+rexjohannes@users.noreply.github.com> Co-authored-by: Eugen Mayer <136934+EugenMayer@users.noreply.github.com>	2025-04-21 09:17:24 -07:00
Timothy Carambat	fd174cab86	Apply `.git` logic handler for repo URLs (#3655 ) * Apply `.git` logic handler for repo URLs * remove comment	2025-04-15 18:01:14 -07:00
Timothy Carambat	fab74037fa	Prevent collector crash when blocked by CDN (#3373 ) resolves #3365	2025-02-28 10:27:05 -08:00
AbelDuan	df166eb64e	feat: Add multilingual support for ocr module (#3325 ) * Add multilingual support for ocr mudule * Add OCR langauge as server var that is passed into Collector Support all valid tesseract language codes Filter and parse only valid codes with fallbacks' * persist TARGET_OCR_LANG * update docker example env --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-27 12:31:17 -08:00
Kristofer Bourro	b07240deee	Windows development environment variables support (#3354 ) * Windows development environment variables support * moved cross-env to dev dependencies --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-27 10:43:31 -08:00
t2	0eb86e2c12	for projects in gitlab subgroup (#3075 ) (#3247 ) * for projects in gitlab subgroup (#3075) * fix: false condition * refactor pattern matching logic --------- Co-authored-by: t2 <> Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-17 12:25:11 -08:00
Timothy Carambat	4545ce24cd	Drop Node `canvas` for manual `sharp` conversion (#3221 ) * Drop Node `canvas` for manual `sharp` conversion * bump dev	2025-02-14 17:38:13 -08:00
mr-chenguang	6ffdbf074d	feat(dataconnectors): support confluence personal access token (#3206 ) * feat(dataconnectors): support confluence personal access token * fix: change select option * linting change name on accesstype field --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-02-14 12:12:01 -08:00
Timothy Carambat	89bba68219	Add OCR of image support (#3219 ) * OCR PDFs as fallback in spawn thread * wip * build our own worker fanout and wrapper * norm pkgs * Add image OCR support	2025-02-14 12:07:33 -08:00

1 2 3 4

179 Commits