* iterate over all pages in paperless-ngx data connector
* add error handling and data validation
* refactor to handle edge cases and null values
* catch edge case to prevent infinite loop
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Added bypassSSL parameter to constructor and implemented SSL bypass logic in fetchConfluenceData method
* Updated generateChunkSource function to include bypassSSL in the encrypted payload
* Updated the request body to include bypassSSL in the JSON payload sent to the backend
* Updated form submission to include bypassSSL parameter from the checkbox
* Added bypass_ssl: "Bypass SSL Certificate Validation" translation
* passed these parameters to fetchconfluencepage function for proper resync functionality
* allow ignore of SSL cert for Confluence
* add translations
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* paperless ngx data connector
* wip resync paperless ngx
* fix generateChunkSource for resyncing paperless ngx
* lint
* Refactor Paperless-NGX connector
Fix issue with date rendering in tooltip + extended width
Move tooltip details to be column for more space
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true
* extract to function
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* fix: youtube transcript collector not work well with non en or non asr caption
* stub YT test in Github actions
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* feat: add support for custom table formatting in htmlToText conversion
* fix tables
* feat: improve plain text table formatting for AI readability
* fix options
* improve drupal wiki connector
* final fix
* adjust leading slash to match code
* linting
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* feat(dataconnectors): support confluence personal access token
* fix: change select option
* linting
change name on accesstype field
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* chore: rename Github to GitHub
Signed-off-by: Adam Setch <adam.setch@outlook.com>
* chore: rename Github to GitHub
Signed-off-by: Adam Setch <adam.setch@outlook.com>
* Undo some code changes for references
---------
Signed-off-by: Adam Setch <adam.setch@outlook.com>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* Add tokenizer improvments via Singleton class
linting
* dev build
* Estimation fallback when string exceeds a fixed byte size
* Add notice to tiktoken on backend
* fix scraping failed bug in link/bulk link scrapers
* reset submodule
* swap to networkidle2 as a safe mix for SPA and API-loaded pages, but also not hang on request heavy pages
* lint
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* fix tree/blob github urls from branches not being loaded
* improve ux of github data connector
* lint
* patch Github URL parser to just validate with `URL` native parser
* uncheck LocalStorage of PAT for security reasons
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly.
* add @langchain/community to collector package.json
* fix: Improve handling of complex ignore patterns in GitLabRepoLoader
* refactor: use ignore package for simplified ignore logic
* run yarn lint
* add @langchain/community@^0.2.23
* remove unused dep
lint
---------
Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>
* Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. #2334
* Fixed a typo in loadGitlabRepo.
* Convert issues to markdown.
* Fixed an issue with time estimate field names in issueToMarkdown.
* handle rate limits more gracefully + update checkbox to toggle switch
* lint
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
* support more confluence url formats
* use pattern matching for confluence urls and manual splitting as fallback
* rework entire Confluence flow to prevent issues with custom, local, and cloud spaces
* remove dep
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Add support for GitLab repo collection as well as Github Repo collection
* Refactor for repo collectors to be more compact
---------
Co-authored-by: Emil Rofors <emirof@gmail.com>
* wip bg workers for live document sync
* Add ability to re-embed specific documents across many workspaces via background queue
bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled
UI for watching/unwatching docments that are embedded.
TODO: UI to easily manage all bg tasks and see run results
TODO: UI to enable this feature and background endpoints to manage it
* create frontend views and paths
Move elements to correct experimental scope
* update migration to delete runs on removal of watched document
* Add watch support to YouTube transcripts (#1716)
* Add watch support to YouTube transcripts
refactor how sync is done for supported types
* Watch specific files in Confluence space (#1718)
Add failure-prune check for runs
* create tmp workflow modifications for beta image
* create tmp workflow modifications for beta image
* create tmp workflow modifications for beta image
* dual build
update copy of alert modals
* update job interval
* Add support for live-sync of Github files
* update copy for document sync feature
* hide Experimental features from UI
* update docs links
* [FEAT] Implement new settings menu for experimental features (#1735)
* implement new settings menu for experimental features
* remove unused context save bar
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* dont run job on boot
* unset workflow changes
* Add persistent encryption service
Relay key to collector so persistent encryption can be used
Encrypt any private data in chunkSources used for replay during resync jobs
* update jsDOC
* Linting and organization
* update modal copy for feature
---------
Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
* chore: confluence data connector can now handle custom urls, in addition to default {subdomain}.atlassian.net ones
* chore: formatting as per yarn lint
* chore: fixing the human readable confluence url fetch baseUrl
* chore: fixing the human readable confluence url fetch baseUrl
* chore: fixing the human readable confluence url fetch baseUrl
* chore: fixing the human readable confluence url fetch baseUrl
* chore: fixing the human readable confluence url fetch baseUrl
* refactor implementation of various types of Confluence URL patterns
---------
Co-authored-by: Predrag Stojadinovic <predrag@stojadinovic.net>
Co-authored-by: Predrag Stojadinović <cope@users.noreply.github.com>
Co-authored-by: Predrag Stojadinovic <predrags@nvidia.com>