* iterate over all pages in paperless-ngx data connector
* add error handling and data validation
* refactor to handle edge cases and null values
* catch edge case to prevent infinite loop
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Added bypassSSL parameter to constructor and implemented SSL bypass logic in fetchConfluenceData method
* Updated generateChunkSource function to include bypassSSL in the encrypted payload
* Updated the request body to include bypassSSL in the JSON payload sent to the backend
* Updated form submission to include bypassSSL parameter from the checkbox
* Added bypass_ssl: "Bypass SSL Certificate Validation" translation
* passed these parameters to fetchconfluencepage function for proper resync functionality
* allow ignore of SSL cert for Confluence
* add translations
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* paperless ngx data connector
* wip resync paperless ngx
* fix generateChunkSource for resyncing paperless ngx
* lint
* Refactor Paperless-NGX connector
Fix issue with date rendering in tooltip + extended width
Move tooltip details to be column for more space
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true
* extract to function
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* fix: youtube transcript collector not work well with non en or non asr caption
* stub YT test in Github actions
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* feat: add support for custom table formatting in htmlToText conversion
* fix tables
* feat: improve plain text table formatting for AI readability
* fix options
* improve drupal wiki connector
* final fix
* adjust leading slash to match code
* linting
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* feat(dataconnectors): support confluence personal access token
* fix: change select option
* linting
change name on accesstype field
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* chore: rename Github to GitHub
Signed-off-by: Adam Setch <adam.setch@outlook.com>
* chore: rename Github to GitHub
Signed-off-by: Adam Setch <adam.setch@outlook.com>
* Undo some code changes for references
---------
Signed-off-by: Adam Setch <adam.setch@outlook.com>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* Add tokenizer improvments via Singleton class
linting
* dev build
* Estimation fallback when string exceeds a fixed byte size
* Add notice to tiktoken on backend
* fix scraping failed bug in link/bulk link scrapers
* reset submodule
* swap to networkidle2 as a safe mix for SPA and API-loaded pages, but also not hang on request heavy pages
* lint
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* fix tree/blob github urls from branches not being loaded
* improve ux of github data connector
* lint
* patch Github URL parser to just validate with `URL` native parser
* uncheck LocalStorage of PAT for security reasons
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly.
* add @langchain/community to collector package.json
* fix: Improve handling of complex ignore patterns in GitLabRepoLoader
* refactor: use ignore package for simplified ignore logic
* run yarn lint
* add @langchain/community@^0.2.23
* remove unused dep
lint
---------
Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>
* Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. #2334
* Fixed a typo in loadGitlabRepo.
* Convert issues to markdown.
* Fixed an issue with time estimate field names in issueToMarkdown.
* handle rate limits more gracefully + update checkbox to toggle switch
* lint
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
* support more confluence url formats
* use pattern matching for confluence urls and manual splitting as fallback
* rework entire Confluence flow to prevent issues with custom, local, and cloud spaces
* remove dep
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>