* Add capability to web scraping feature for document creation to download and parse statically hosted files
* lint
* Remove unneeded comment
* Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files
* Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js
* Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent | Return explicit argument of captureAs into scrapeGenericUrl in processLink fn
* Return debug log for scrapeGenericUrl
* Change conditional to a guard clause.
* Add error handling, validation, and JSDOC to getContentType helper fn
* remove unneeded comments
* Simplify URL validation by reusing module
* Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module | Add URL valuidation to downloadURIToFile
* refactor
* add support for webp
remove unused imports
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* Added the ability to pass in metadata to the /document/upload/{folderName} endpoint
* Added the ability to pass in metadata to the /document/upload-link endpoint
* feat: added metadata to document/upload api endpoint
* simplify optional metadata in document dev api endpoints
* lint
* patch handling of metadata in dev api
* Linting, small comments
---------
Co-authored-by: jstawskigmi <jstawski@getmyinterns.org>
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Add support for fetching single document in documents folder
* Add document object to upload + support link scraping via API
* hotfixes for documentation
* update api docs
* wip: init refactor of document processor to JS
* add NodeJs PDF support
* wip: partity with python processor
feat: add pptx support
* fix: forgot files
* Remove python scripts totally
* wip:update docker to boot new collector
* add package.json support
* update dockerfile for new build
* update gitignore and linting
* add more protections on file lookup
* update package.json
* test build
* update docker commands to use cap-add=SYS_ADMIN so web scraper can run
update all scripts to reflect this
remove docker build for branch