merlyn

Author	SHA1	Message	Date
Asish Kumar	91e75c27c2	fix: preserve Confluence context paths (#5415 ) * fix: preserve confluence context paths * lint and minor changes --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-04-13 13:10:40 -07:00
Yitong Li	2f7a818744	fix(collector): infer file extension from Content-Type for URLs without explicit extensions (#5252 ) * fix(collector): infer file extension from Content-Type for URLs without explicit extensions When downloading files from URLs like https://arxiv.org/pdf/2307.10265, the path has no recognizable file extension. The downloaded file gets saved without an extension (or with a nonsensical one like .10265), causing processSingleFile to reject it with 'File extension .10265 not supported for parsing'. Fix: after downloading, check if the filename has a supported file extension. If not, inspect the response Content-Type header and map it to the correct extension using the existing ACCEPTED_MIMES table. For example, a response with Content-Type: application/pdf will cause the file to be saved with a .pdf extension, allowing it to be processed correctly. Fixes #4513 * small refactor --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-03-23 09:40:22 -07:00
Timothy Carambat	feb039ea70	Adjust fix path to use ESM import (#4867 ) * Adjust fix path to use ESM import * normalize fix-path imports and usage across the app * extract path fix logic to utils for server and collector * add helpers * repin strip-ansi in collector * fix log for localWhisper lint	2026-01-15 16:13:21 -08:00
Timothy Carambat	092b1b45f8	Upgrade YT Scraper (#4820 )	2026-01-02 15:41:22 -08:00
Sean Hatfield	6c1f8a38ce	Refactor localWhisper to use custom FFMPEGWrapper class (#4775 ) * refactor localWhisper to use new custom FFMPEGWrapper class * stub tests in github actions * add back wavefile conversion to 16khz 32f to fix docker builds * use afterEach for cleanup in ffmpeg tests * remove unused FFMPEG_PATH env check * use spawnSync for ffmpeg to capture and log output * lint * revert removal of try/catch around validateAudioFile for more helpful error msgs * use readFileSync instead of createReadStream for less overhead * change import to require for fix-path and stub import in tests * refactor to singleton to preserve ffmpeg path dev build --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-12-18 11:41:45 -08:00
Timothy Carambat	5edc1bea42	Add ability to auto-handle YT video URLs in uploader & chat (#4547 ) * Add ability to auto-handle YT video URLs in uploader & chat * move YT validator to URL utils * update comment	2025-10-15 12:18:57 -07:00
Marcello Fitton	d48c76919c	Fix: File pulling fails with uppercase URL characters (#4516 ) * fix: remove unnecessary toLowerCase in URL validation * test: enhance URL validation tests to preserve case sensitivity and format * test: update URL validation tests to ensure domain normalization to lowercase while preserving path case * small formatting * fix filenames when downloading live URI --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-08 14:00:02 -07:00
Timothy Carambat	cf3fbcbf0f	Improve URL handler for collector processes (#4504 ) * Improve URL handler for collector processes * dev build	2025-10-07 11:03:27 -07:00
AoiYamada	8fc1f24d1b	fix: youtube transcript collector not work well with non en or non asr caption (#4442 ) * fix: youtube transcript collector not work well with non en or non asr caption * stub YT test in Github actions --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-29 13:22:50 -07:00
Sean Hatfield	5d60047dc7	Handle BigInt in message response (#4110 ) * wip handle bigints in message response * extend bigint protoype to handle bigint stringification + add test * unset unrelated file * update tests, simplify implementation; --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-07-10 12:33:34 -07:00
bobbercheng	d0978fa363	Fix broken YT scraping with YT API (#4005 ) * Fix broken YT scraping with YT API * refactor youtube transcript class/add jsdoc comments * fix test --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:06:18 -07:00

11 Commits