merlyn/collector/processSingleFile
Marcello Fitton f7b90571be
Fetch, Parse, and Create Documents for Statically Hosted Files (#4398)
* Add capability to web scraping feature for document creation to download and parse statically hosted files

* lint

* Remove unneeded comment

* Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files

* Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js

* Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent | Return explicit argument of captureAs into scrapeGenericUrl in processLink fn

* Return debug log for scrapeGenericUrl

* Change conditional to a guard clause.

* Add error handling, validation, and JSDOC to getContentType helper fn

* remove unneeded comments

* Simplify URL validation by reusing module

* Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module | Add URL valuidation to downloadURIToFile

* refactor

* add support for webp
remove unused imports

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-10-01 15:49:05 -07:00
..
convert Added metadata parameter to document/upload, document/upload/{folderName}, and document/upload-link (#4342) 2025-09-17 11:17:29 -07:00
index.js Fetch, Parse, and Create Documents for Statically Hosted Files (#4398) 2025-10-01 15:49:05 -07:00