Commit Graph

11 Commits

Author SHA1 Message Date
Asish Kumar
91e75c27c2
fix: preserve Confluence context paths (#5415)
* fix: preserve confluence context paths

* lint and minor changes

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-04-13 13:10:40 -07:00
Yitong Li
2f7a818744
fix(collector): infer file extension from Content-Type for URLs without explicit extensions (#5252)
* fix(collector): infer file extension from Content-Type for URLs without explicit extensions

When downloading files from URLs like https://arxiv.org/pdf/2307.10265,
the path has no recognizable file extension. The downloaded file gets
saved without an extension (or with a nonsensical one like .10265),
causing processSingleFile to reject it with 'File extension .10265
not supported for parsing'.

Fix: after downloading, check if the filename has a supported file
extension. If not, inspect the response Content-Type header and map
it to the correct extension using the existing ACCEPTED_MIMES table.

For example, a response with Content-Type: application/pdf will cause
the file to be saved with a .pdf extension, allowing it to be processed
correctly.

Fixes #4513

* small refactor

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-03-23 09:40:22 -07:00
Timothy Carambat
feb039ea70
Adjust fix path to use ESM import (#4867)
* Adjust fix path to use ESM import

* normalize fix-path imports and usage across the app

* extract path fix logic to utils for server and collector

* add helpers

* repin strip-ansi in collector

* fix log for localWhisper
lint
2026-01-15 16:13:21 -08:00
Timothy Carambat
092b1b45f8
Upgrade YT Scraper (#4820) 2026-01-02 15:41:22 -08:00
Sean Hatfield
6c1f8a38ce
Refactor localWhisper to use custom FFMPEGWrapper class (#4775)
* refactor localWhisper to use new custom FFMPEGWrapper class

* stub tests in github actions

* add back wavefile conversion to 16khz 32f to fix docker builds

* use afterEach for cleanup in ffmpeg tests

* remove unused FFMPEG_PATH env check

* use spawnSync for ffmpeg to capture and log output

* lint

* revert removal of try/catch around validateAudioFile for more helpful error msgs

* use readFileSync instead of createReadStream for less overhead

* change import to require for fix-path and stub import in tests

* refactor to singleton to preserve ffmpeg path
dev build

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-12-18 11:41:45 -08:00
Timothy Carambat
5edc1bea42
Add ability to auto-handle YT video URLs in uploader & chat (#4547)
* Add ability to auto-handle YT video URLs in uploader & chat

* move YT validator to URL utils

* update comment
2025-10-15 12:18:57 -07:00
Marcello Fitton
d48c76919c
Fix: File pulling fails with uppercase URL characters (#4516)
* fix: remove unnecessary toLowerCase in URL validation

* test: enhance URL validation tests to preserve case sensitivity and format

* test: update URL validation tests to ensure domain normalization to lowercase while preserving path case

* small formatting

* fix filenames when downloading live URI

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-10-08 14:00:02 -07:00
Timothy Carambat
cf3fbcbf0f
Improve URL handler for collector processes (#4504)
* Improve URL handler for collector processes

* dev build
2025-10-07 11:03:27 -07:00
AoiYamada
8fc1f24d1b
fix: youtube transcript collector not work well with non en or non asr caption (#4442)
* fix: youtube transcript collector not work well with non en or non asr caption

* stub YT test in Github actions

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-09-29 13:22:50 -07:00
Sean Hatfield
5d60047dc7
Handle BigInt in message response (#4110)
* wip handle bigints in message response

* extend bigint protoype to handle bigint stringification + add test

* unset unrelated file

* update tests, simplify implementation;

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-07-10 12:33:34 -07:00
bobbercheng
d0978fa363
Fix broken YT scraping with YT API (#4005)
* Fix broken YT scraping with YT API

* refactor youtube transcript class/add jsdoc comments

* fix test

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-07-07 13:06:18 -07:00