merlyn

History

Marcello Fitton f7b90571be Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 ) * Add capability to web scraping feature for document creation to download and parse statically hosted files * lint * Remove unneeded comment * Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files * Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js * Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent \| Return explicit argument of captureAs into scrapeGenericUrl in processLink fn * Return debug log for scrapeGenericUrl * Change conditional to a guard clause. * Add error handling, validation, and JSDOC to getContentType helper fn * remove unneeded comments * Simplify URL validation by reusing module * Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module \| Add URL valuidation to downloadURIToFile * refactor * add support for webp remove unused imports --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>		2025-10-01 15:49:05 -07:00
..
comKey	[BETA] Live document sync (#1719 )	2024-06-21 13:38:50 -07:00
downloadURIToFile	Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 )	2025-10-01 15:49:05 -07:00
EncryptionWorker	[BETA] Live document sync (#1719 )	2024-06-21 13:38:50 -07:00
extensions	fix: youtube transcript collector not work well with non en or non asr caption (#4442 )	2025-09-29 13:22:50 -07:00
files	add back normalization + docs link	2025-08-14 11:43:04 -07:00
http	Feature/drupalwiki collector (#3693 )	2025-04-21 09:17:24 -07:00
logger	patch logger for full logs	2024-07-19 18:35:41 -07:00
OCRLoader	feat: Add multilingual support for ocr module (#3325 )	2025-02-27 12:31:17 -08:00
runtimeSettings	Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 )	2025-10-01 15:49:05 -07:00
tokenizer	Add tokenizer improvments via Singleton class and estimation (#3072 )	2025-01-30 17:55:03 -08:00
url	Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 )	2025-10-01 15:49:05 -07:00
WhisperProviders	Prevent collector crash when blocked by CDN (#3373 )	2025-02-28 10:27:05 -08:00
constants.js	Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 )	2025-10-01 15:49:05 -07:00