merlyn/collector
Sean Hatfield 9bc01afa7d
Fix scraping failed bug in link/bulk link scrapers (#2807)
* fix scraping failed bug in link/bulk link scrapers

* reset submodule

* swap to networkidle2 as a safe mix for SPA and API-loaded pages, but also not hang on request heavy pages

* lint

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-12-11 14:01:52 -08:00
..
extensions Allow 127.0.0.1 as valid URL for scraping (#2560) 2024-10-31 09:57:28 -07:00
hotdir Document Processor v2 (#442) 2023-12-14 15:14:56 -08:00
middleware [BETA] Live document sync (#1719) 2024-06-21 13:38:50 -07:00
processLink Fix scraping failed bug in link/bulk link scrapers (#2807) 2024-12-11 14:01:52 -08:00
processRawText Add support to upload rawText document via api (#692) 2024-02-07 15:17:32 -08:00
processSingleFile Support XLSX files (#2403) 2024-10-03 13:45:23 -07:00
storage feat: Embed on-instance Whisper model for audio/mp4 transcribing (#449) 2023-12-15 11:20:13 -08:00
utils Fix scraping failed bug in link/bulk link scrapers (#2807) 2024-12-11 14:01:52 -08:00
.env.example devcontainer v1 (#297) 2024-01-08 15:31:06 -08:00
.gitignore Document Processor v2 (#442) 2023-12-14 15:14:56 -08:00
.nvmrc Document Processor v2 (#442) 2023-12-14 15:14:56 -08:00
index.js Add 3GB file size limit to body parser middlewares (#2390) 2024-09-30 11:19:41 -07:00
nodemon.json Document Processor v2 (#442) 2023-12-14 15:14:56 -08:00
package.json Support XLSX files (#2403) 2024-10-03 13:45:23 -07:00
yarn.lock Support XLSX files (#2403) 2024-10-03 13:45:23 -07:00