Commit Graph

4 Commits

Author SHA1 Message Date
Timothy Carambat
2a9066e83a
OCR PDFs as fallback during upload (#3204)
* OCR PDFs as fallback in spawn thread

* wip

* build our own worker fanout and wrapper

* norm pkgs

* bump dev
2025-02-14 11:57:31 -08:00
Timothy Carambat
d1ca16f7f8
Add tokenizer improvments via Singleton class and estimation (#3072)
* Add tokenizer improvments via Singleton class
linting

* dev build

* Estimation fallback when string exceeds a fixed byte size

* Add notice to tiktoken on backend
2025-01-30 17:55:03 -08:00
Sean Hatfield
9b86bbd2b8
[FIX] PDFLoader module bug fix (#1879)
use pdf.js by importing it from pdf-parse and fix custom PDFLoader module
2024-07-16 13:09:43 -07:00
Sean Hatfield
79656718b2
[FEAT] Create custom pdfloader (#1852)
* implement custom PDFLoader to remove LC dep

* remove unneeded comment

* remove pdfjs as dep and fix page splitting using pdf-parse

* linting + export rename for desktop compat

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-07-11 12:26:11 -07:00