merlyn/collector/processSingleFile/convert
AbelDuan df166eb64e
feat: Add multilingual support for ocr module (#3325)
* Add multilingual support for ocr mudule

* Add OCR langauge as server var that is passed into Collector
Support all valid tesseract language codes
Filter and parse only valid codes with fallbacks'

* persist TARGET_OCR_LANG

* update docker example env

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-02-27 12:31:17 -08:00
..
asPDF feat: Add multilingual support for ocr module (#3325) 2025-02-27 12:31:17 -08:00
asAudio.js Add tokenizer improvments via Singleton class and estimation (#3072) 2025-01-30 17:55:03 -08:00
asDocx.js Add tokenizer improvments via Singleton class and estimation (#3072) 2025-01-30 17:55:03 -08:00
asEPub.js Add tokenizer improvments via Singleton class and estimation (#3072) 2025-01-30 17:55:03 -08:00
asImage.js feat: Add multilingual support for ocr module (#3325) 2025-02-27 12:31:17 -08:00
asMbox.js Add tokenizer improvments via Singleton class and estimation (#3072) 2025-01-30 17:55:03 -08:00
asOfficeMime.js Add tokenizer improvments via Singleton class and estimation (#3072) 2025-01-30 17:55:03 -08:00
asTxt.js Add tokenizer improvments via Singleton class and estimation (#3072) 2025-01-30 17:55:03 -08:00
asXlsx.js Add tokenizer improvments via Singleton class and estimation (#3072) 2025-01-30 17:55:03 -08:00