Commit Graph

10 Commits

Author SHA1 Message Date
Sean Hatfield
192ca411f2
Telegram bot connector (#5190)
* wip telegram bot connector

* encrypt bot token, reorg telegram bot modules, secure pairing codes

* offload telegram chat to background worker, add @agent support with chart png rendering, reconnect ui

* refactor telegram bot settings page into subcomponents

* response.locals for mum, telemetry for connecting to telegram

* simplify telegram command registration

* improve telegram bot ux: rework switch/history/resume commands

* add voice, photo, and TTS support to telegram bot with long message handling

* lint

* rename external_connectors to external_communication_connectors, add voice response mode, persist chat workspace/thread selection

* lint

* fix telegram bot connect/disconnect bugs, kill telegram bot on multiuser mode enable

* add english translations

* fix qr code in light mode

* repatch migration

* WIP checkpoint

* pipeline overhaul for using response obj

* format functions

* fix comment block

* remove conditional dumpENV + lint

* remove .end() from sendStatus calls

* patch broken streaming where streaming only first chunk

* refactor

* use Ephemeral handler now

* show metrics and citations in real GUI

* bugfixes

* prevent MuM persistence, UI cleanup, styling for status

* add new workspace flow in UI
Add thread chat count
fix 69 byte payload callback limit bug

* handle pagination for workspaces, threads, and models

* modularize commands and navigation

* add /proof support for citation recall

* handle backlog message spam

* support abort of response streams

* code cleanup

* spam prevention

* fix translations, update voice typing indicator, fix token bug

* frontend refactor, update tips on /status and voice response improvements

* collapse agent though blocks

* support images

* Fix mime issues with audio from other devices

* fix config issue post server stop

* persist image on agentic chats

* 5189 i18n (#5245)

* i18n translations
connect #5189

* prune translations

* fix errors

* fix translation gaps

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-03-23 15:10:21 -07:00
Marcello Fitton
f7b90571be
Fetch, Parse, and Create Documents for Statically Hosted Files (#4398)
* Add capability to web scraping feature for document creation to download and parse statically hosted files

* lint

* Remove unneeded comment

* Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files

* Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js

* Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent | Return explicit argument of captureAs into scrapeGenericUrl in processLink fn

* Return debug log for scrapeGenericUrl

* Change conditional to a guard clause.

* Add error handling, validation, and JSDOC to getContentType helper fn

* remove unneeded comments

* Simplify URL validation by reusing module

* Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module | Add URL valuidation to downloadURIToFile

* refactor

* add support for webp
remove unused imports

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-10-01 15:49:05 -07:00
Timothy Carambat
89bba68219
Add OCR of image support (#3219)
* OCR PDFs as fallback in spawn thread

* wip

* build our own worker fanout and wrapper

* norm pkgs

* Add image OCR support
2025-02-14 12:07:33 -08:00
Sean Hatfield
b658f5012d
Support XLSX files (#2403)
* support xlsx files

* lint

* create seperate docs for each xlsx sheet

* lint

* use node-xlsx pkg for parsing xslx files

* lint

* update error handling

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-10-03 13:45:23 -07:00
Sean Hatfield
79656718b2
[FEAT] Create custom pdfloader (#1852)
* implement custom PDFLoader to remove LC dep

* remove unneeded comment

* remove pdfjs as dep and fix page splitting using pdf-parse

* linting + export rename for desktop compat

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-07-11 12:26:11 -07:00
Timothy Carambat
4fb4aa2041
Add epub support for parsing (#1017) 2024-04-02 14:25:52 -07:00
Timothy Carambat
49fbd09af4
Support more plaintext filetypes (#757)
* Add more plaintext document types

org-mode, asciidoc, and reStructuredText are all text formats

Signed-off-by: Christian Romney <christian.a.romney@gmail.com>

* lint

---------

Signed-off-by: Christian Romney <christian.a.romney@gmail.com>
Co-authored-by: Christian Romney <christian.a.romney@gmail.com>
2024-02-19 10:44:01 -08:00
timothycarambat
d2e3506bb9 fix: transition on LLM and embedding screen
linting
2023-12-15 12:40:11 -08:00
Timothy Carambat
61db981017
feat: Embed on-instance Whisper model for audio/mp4 transcribing (#449)
* feat: Embed on-instance Whisper model for audio/mp4 transcribing
resolves #329

* additional logging

* add placeholder for tmp folder in collector storage
Add cleanup of hotdir and tmp on collector boot to prevent hanging files
split loading of model and file conversion into concurrency

* update README

* update model size

* update supported filetypes
2023-12-15 11:20:13 -08:00
Timothy Carambat
719521c307
Document Processor v2 (#442)
* wip: init refactor of document processor to JS

* add NodeJs PDF support

* wip: partity with python processor
feat: add pptx support

* fix: forgot files

* Remove python scripts totally

* wip:update docker to boot new collector

* add package.json support

* update dockerfile for new build

* update gitignore and linting

* add more protections on file lookup

* update package.json

* test build

* update docker commands to use cap-add=SYS_ADMIN so web scraper can run
update all scripts to reflect this
remove docker build for branch
2023-12-14 15:14:56 -08:00