merlyn

michael/merlyn

Fork 0

Commit Graph

Author	SHA1	Message	Date
Marcello Fitton	f7b90571be	Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 ) * Add capability to web scraping feature for document creation to download and parse statically hosted files * lint * Remove unneeded comment * Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files * Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js * Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent \| Return explicit argument of captureAs into scrapeGenericUrl in processLink fn * Return debug log for scrapeGenericUrl * Change conditional to a guard clause. * Add error handling, validation, and JSDOC to getContentType helper fn * remove unneeded comments * Simplify URL validation by reusing module * Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module \| Add URL valuidation to downloadURIToFile * refactor * add support for webp remove unused imports --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-01 15:49:05 -07:00
Timothy Carambat	95557ee16f	Allow user to specify args for chromium process so they dont need SYS_ADMIN on container. (#4397 ) * allow user to specify args for chromium process so they dont need SYS_ADMIN perms * use arg flag content * update console outputs	2025-09-17 16:31:08 -07:00
Timothy Carambat	1601eb986c	Enable bypass of ip limitations via ENV in collector processing (#3652 ) * Enable bypass of ip limitations via ENV in collector startup resolves #3625 connect #3626 * dev build * bump dockerx build action * enable runtime setting config of collector requests * comments and linting for option passing * unset * unset * update docs link * linting and docs	2025-04-21 11:10:41 -07:00

Author

SHA1

Message

Date

Marcello Fitton

f7b90571be

Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 )

* Add capability to web scraping feature for document creation to download and parse statically hosted files

* lint

* Remove unneeded comment

* Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files

* Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js

* Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent | Return explicit argument of captureAs into scrapeGenericUrl in processLink fn

* Return debug log for scrapeGenericUrl

* Change conditional to a guard clause.

* Add error handling, validation, and JSDOC to getContentType helper fn

* remove unneeded comments

* Simplify URL validation by reusing module

* Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module | Add URL valuidation to downloadURIToFile

* refactor

* add support for webp
remove unused imports

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

2025-10-01 15:49:05 -07:00

Timothy Carambat

95557ee16f

Allow user to specify args for chromium process so they dont need SYS_ADMIN on container. (#4397 )

* allow user to specify args for chromium process so they dont need SYS_ADMIN perms

* use arg flag content

* update console outputs

2025-09-17 16:31:08 -07:00

Timothy Carambat

1601eb986c

Enable bypass of ip limitations via ENV in collector processing (#3652 )

* Enable bypass of ip limitations via ENV in collector startup
resolves #3625
connect #3626

* dev build

* bump dockerx build action

* enable runtime setting config of collector requests

* comments and linting for option passing

* unset

* unset

* update docs link

* linting and docs

2025-04-21 11:10:41 -07:00

3 Commits