Commit Graph

14 Commits

Author SHA1 Message Date
Timothy Carambat
2dc625193e
4825 patch yt file collector api (#4904)
Patch YT links in API document collector
closes #4825
2026-01-26 14:36:21 -08:00
Marcello Fitton
d3619689db
Refactor loadYouTubeTranscript() to include YouTube Video Metadata in Content When parseOnly is true (#4552)
* Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true

* extract to function

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-10-15 15:42:00 -07:00
Timothy Carambat
5edc1bea42
Add ability to auto-handle YT video URLs in uploader & chat (#4547)
* Add ability to auto-handle YT video URLs in uploader & chat

* move YT validator to URL utils

* update comment
2025-10-15 12:18:57 -07:00
Timothy Carambat
70a07b743b
Update writeToServerDocuments to take config object (#4213) 2025-07-29 17:53:05 -07:00
timothycarambat
ff34c8cefc use documentsFolder path for simplification 2025-07-16 11:14:18 -07:00
Sean Hatfield
5485c58b44
Sanitize youtube transcription file paths (#4148)
sanitize youtube transcription file paths
2025-07-14 13:53:34 -07:00
Timothy Carambat
d1ca16f7f8
Add tokenizer improvments via Singleton class and estimation (#3072)
* Add tokenizer improvments via Singleton class
linting

* dev build

* Estimation fallback when string exceeds a fixed byte size

* Add notice to tiktoken on backend
2025-01-30 17:55:03 -08:00
Timothy Carambat
dc4ad6b5a9
[BETA] Live document sync (#1719)
* wip bg workers for live document sync

* Add ability to re-embed specific documents across many workspaces via background queue
bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled
UI for watching/unwatching docments that are embedded.
TODO: UI to easily manage all bg tasks and see run results
TODO: UI to enable this feature and background endpoints to manage it

* create frontend views and paths
Move elements to correct experimental scope

* update migration to delete runs on removal of watched document

* Add watch support to YouTube transcripts (#1716)

* Add watch support to YouTube transcripts
refactor how sync is done for supported types

* Watch specific files in Confluence space (#1718)

Add failure-prune check for runs

* create tmp workflow modifications for beta image

* create tmp workflow modifications for beta image

* create tmp workflow modifications for beta image

* dual build
update copy of alert modals

* update job interval

* Add support for live-sync of Github files

* update copy for document sync feature

* hide Experimental features from UI

* update docs links

* [FEAT] Implement new settings menu for experimental features (#1735)

* implement new settings menu for experimental features

* remove unused context save bar

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* dont run job on boot

* unset workflow changes

* Add persistent encryption service
Relay key to collector so persistent encryption can be used
Encrypt any private data in chunkSources used for replay during resync jobs

* update jsDOC

* Linting and organization

* update modal copy for feature

---------

Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
2024-06-21 13:38:50 -07:00
timothycarambat
2d215acb75 patch storage dirs for extensions 2024-05-02 14:03:10 -07:00
Timothy Carambat
1f8ab0d245
Remove YoutubeLoader dependency (#1050)
* WIP data connector redesign

* new UI for data connectors complete

* remove old data connector page/cleanup imports

* cleanup of UI and imports

* Remove Youtube Transcript dep and move in-house

* lang pref default to en

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2024-04-05 16:33:01 -07:00
Timothy Carambat
d89610586a
improve error messages from YT scraping (#768)
parse & enforce URL to allow multiple URL schemas
2024-02-21 10:47:10 -08:00
Timothy Carambat
d52f8aafd4
689 links in citation (#715)
* Include links in citations
force ChunkSource key to retain this information
old links will be unsupported

* show special icons depending on source

* remove console log

* reset server documents writeTo
2024-02-13 14:11:57 -08:00
Sean Hatfield
288ff0d18c
fix vector cache not deleting cache after unembedding items with folders (#630) 2024-01-22 13:03:05 -08:00
Timothy Carambat
ecf4295537
Add ability to grab youtube transcripts via doc processor (#470)
* Add ability to grab youtube transcripts via doc processor

* dynamic imports
swap out Github for Youtube in placeholder text
2023-12-18 17:17:26 -08:00