213 Commits

Author SHA1 Message Date
Timothy Carambat
f144692903 1.12.1 release tags (#5483) 2026-04-22 15:15:59 -07:00
Asish Kumar
91e75c27c2 fix: preserve Confluence context paths (#5415)
* fix: preserve confluence context paths

* lint and minor changes

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-04-13 13:10:40 -07:00
Timothy Carambat
0645f3c4bf Reapply "Remove illegal chars for Windows on files (#5364)"
This reverts commit 869be87ef6.
2026-04-06 14:05:25 -07:00
Timothy Carambat
869be87ef6 Revert "Remove illegal chars for Windows on files (#5364)"
This reverts commit 8ed1d35ab3.
2026-04-06 14:03:53 -07:00
Timothy Carambat
8ed1d35ab3 Remove illegal chars for Windows on files (#5364) 2026-04-06 11:12:13 -07:00
Timothy Carambat
3dedcede34 Filesystem Agent Skill overhaul (#5260)
* wip

* collector parse fixes

* refactor for class and also operation for reading

* add skill management panel

* management panel + lint

* management panel + lint

* Hide skill in non-docker context

* add ask-prompt for edit tool calls

* fix dep

* fix execa pkg (unused in codebase)

* simplify search with ripgrep only and build deps

* Fs skill i18n (#5264)

i18n

* add copy file support

* fix translations
2026-03-26 14:07:46 -07:00
Sean Hatfield
192ca411f2 Telegram bot connector (#5190)
* wip telegram bot connector

* encrypt bot token, reorg telegram bot modules, secure pairing codes

* offload telegram chat to background worker, add @agent support with chart png rendering, reconnect ui

* refactor telegram bot settings page into subcomponents

* response.locals for mum, telemetry for connecting to telegram

* simplify telegram command registration

* improve telegram bot ux: rework switch/history/resume commands

* add voice, photo, and TTS support to telegram bot with long message handling

* lint

* rename external_connectors to external_communication_connectors, add voice response mode, persist chat workspace/thread selection

* lint

* fix telegram bot connect/disconnect bugs, kill telegram bot on multiuser mode enable

* add english translations

* fix qr code in light mode

* repatch migration

* WIP checkpoint

* pipeline overhaul for using response obj

* format functions

* fix comment block

* remove conditional dumpENV + lint

* remove .end() from sendStatus calls

* patch broken streaming where streaming only first chunk

* refactor

* use Ephemeral handler now

* show metrics and citations in real GUI

* bugfixes

* prevent MuM persistence, UI cleanup, styling for status

* add new workspace flow in UI
Add thread chat count
fix 69 byte payload callback limit bug

* handle pagination for workspaces, threads, and models

* modularize commands and navigation

* add /proof support for citation recall

* handle backlog message spam

* support abort of response streams

* code cleanup

* spam prevention

* fix translations, update voice typing indicator, fix token bug

* frontend refactor, update tips on /status and voice response improvements

* collapse agent though blocks

* support images

* Fix mime issues with audio from other devices

* fix config issue post server stop

* persist image on agentic chats

* 5189 i18n (#5245)

* i18n translations
connect #5189

* prune translations

* fix errors

* fix translation gaps

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-03-23 15:10:21 -07:00
Yitong Li
2f7a818744 fix(collector): infer file extension from Content-Type for URLs without explicit extensions (#5252)
* fix(collector): infer file extension from Content-Type for URLs without explicit extensions

When downloading files from URLs like https://arxiv.org/pdf/2307.10265,
the path has no recognizable file extension. The downloaded file gets
saved without an extension (or with a nonsensical one like .10265),
causing processSingleFile to reject it with 'File extension .10265
not supported for parsing'.

Fix: after downloading, check if the filename has a supported file
extension. If not, inspect the response Content-Type header and map
it to the correct extension using the existing ACCEPTED_MIMES table.

For example, a response with Content-Type: application/pdf will cause
the file to be saved with a .pdf extension, allowing it to be processed
correctly.

Fixes #4513

* small refactor

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-03-23 09:40:22 -07:00
Marcello Fitton
456738bbda chore: add ESLint CI workflow (#5160)
add lint CI GitHub Action
2026-03-09 14:27:08 -07:00
Timothy Carambat
dc0bdf112b linting & show descriptive error for bad addtoWorkspace request body
resolves #5172
2026-03-09 11:30:53 -07:00
Maxwell Calkin
563f95167d fix: add missing /wiki to Confluence cloud citation URLs (#5167)
fix: add /wiki to Confluence cloud page URLs in citations
2026-03-09 10:24:56 -07:00
Marcello Fitton
8f33203ade chore: add ESLint to /collector (#5128)
* add eslint config to /collector

* prettier formatting

* fix unused

* fix undefined

* disable lines

* lockfile

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-03-05 16:25:23 -08:00
Timothy Carambat
e145c391e9 v1.11.1 Release tags (#5107)
bump tag
2026-03-02 09:25:06 -08:00
Timothy Carambat
d58ff0ea3e Normalize scraper runtimeargs for bulk-scraper (#5083)
resolves #5078
closes #5079
2026-02-27 09:15:17 -08:00
Marcello Fitton
c927eda18f fix: GitLab connector infinite loop and rate limit crash for large repos (#5021)
* Fix infinite loop and rate limit crashes

* simplify logic | add max-retries to fetchNextPage and fetchSingleFileContents

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-02-19 12:42:21 -08:00
Timothy Carambat
40853e4e43 1.11.0 release tag (#5014) 2026-02-18 08:47:47 -08:00
Timothy Carambat
2dc625193e 4825 patch yt file collector api (#4904)
Patch YT links in API document collector
closes #4825
2026-01-26 14:36:21 -08:00
j0rDy
f52e2866ac Update common.js (#4894)
* Update common.js

Added missing translations in Dutch.

* linting

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-01-23 17:12:17 -08:00
Timothy Carambat
4de5e30ac6 Merge commit from fork 2026-01-23 17:06:44 -08:00
Timothy Carambat
ac8248e08d bump versions to 1.10.0 2026-01-21 16:10:07 -08:00
Timothy Carambat
feb039ea70 Adjust fix path to use ESM import (#4867)
* Adjust fix path to use ESM import

* normalize fix-path imports and usage across the app

* extract path fix logic to utils for server and collector

* add helpers

* repin strip-ansi in collector

* fix log for localWhisper
lint
2026-01-15 16:13:21 -08:00
Sean Hatfield
e4ee9f2731 Make XLSX spreadsheets visible in chat by combining sheets (#4847)
* fix bug with xlsx files not being added as context

* lint

* fix console logs/warn/error

* abstract sheet processing to function + normalize error handling

* fix jsdoc

* patch xlsx filename to prevent orphaned doc

* reduce tokens

* correct pluralization

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2026-01-13 15:46:16 -08:00
Timothy Carambat
092b1b45f8 Upgrade YT Scraper (#4820) 2026-01-02 15:41:22 -08:00
Timothy Carambat
b2f49b6036 patch ESM import issue (#4819) 2026-01-02 14:11:13 -08:00
Sean Hatfield
6c1f8a38ce Refactor localWhisper to use custom FFMPEGWrapper class (#4775)
* refactor localWhisper to use new custom FFMPEGWrapper class

* stub tests in github actions

* add back wavefile conversion to 16khz 32f to fix docker builds

* use afterEach for cleanup in ffmpeg tests

* remove unused FFMPEG_PATH env check

* use spawnSync for ffmpeg to capture and log output

* lint

* revert removal of try/catch around validateAudioFile for more helpful error msgs

* use readFileSync instead of createReadStream for less overhead

* change import to require for fix-path and stub import in tests

* refactor to singleton to preserve ffmpeg path
dev build

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-12-18 11:41:45 -08:00
Sean Hatfield
c76b0708c3 Fix pagination bug in paperless-ngx data connector (#4757)
* iterate over all pages in paperless-ngx data connector

* add error handling and data validation

* refactor to handle edge cases and null values

* catch edge case to prevent infinite loop

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-12-12 10:23:32 -08:00
Timothy Carambat
692fa755ee Bump expressJS from 4.18.2 -> 4.21.2 (#4760)
Bump expressJS from 4.18.2 -> 4.21.2 to patch body-parser CVE-2024-45590 as general maintence task'
2025-12-10 18:54:18 -08:00
Timothy Carambat
d22b7fc4e2 Remove bcrypt from collector - not used (#4747) 2025-12-09 15:23:42 -08:00
Timothy Carambat
cc7c876efc bump body-parser patch version (#4746) 2025-12-09 15:21:22 -08:00
Timothy Carambat
cd263337f8 fix: bump version tag 2025-12-09 13:18:51 -08:00
Timothy Carambat
5efaeab839 remove no longer needed patches folder 2025-12-09 13:01:55 -08:00
Timothy Carambat
e8257941f7 Patch dev pupeeteer crash for MacOS 15 (#4713)
* Patch dev pupeeteer crash for MacOS 15

* simplify fix

* update comment

* reenable failover
2025-12-05 12:11:32 -08:00
Timothy Carambat
155900eae7 dev build with new epub2 build target and remove patch work (#4694) 2025-11-26 17:36:34 -08:00
timothycarambat
758db6b677 fix lint 2025-11-25 14:42:10 -08:00
Neha Prasad
3ecf218eea feat: Add SSL certificate bypass support for self-hosted Confluence instances (#4219)
* Added bypassSSL parameter to constructor and implemented SSL bypass logic in fetchConfluenceData method

* Updated generateChunkSource function to include bypassSSL in the encrypted payload

* Updated the request body to include bypassSSL in the JSON payload sent to the backend

* Updated form submission to include bypassSSL parameter from the checkbox

* Added bypass_ssl: "Bypass SSL Certificate Validation" translation

* passed these parameters to fetchconfluencepage function for proper resync functionality

* allow ignore of SSL cert for Confluence

* add translations

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-11-25 14:32:10 -08:00
Sean Hatfield
05df4ac72b Paperless ngx data connector (#4121)
* paperless ngx data connector

* wip resync paperless ngx

* fix generateChunkSource for resyncing paperless ngx

* lint

* Refactor Paperless-NGX connector
Fix issue with date rendering in tooltip + extended width
Move tooltip details to be column for more space

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-11-20 11:27:38 -08:00
Timothy Carambat
b3b261e15d Fix loop logic for fetchNextPage use in GitLabLoader (#4662)
resolves #4626
closes #4627
2025-11-19 13:53:26 -08:00
Marcello Fitton
376c9f7f3f Install patch-package in /collector and Apply Patch to Fix EPub Upload Bug (#4630)
* Install patch-package and postinstall-postinstall

* Implement patch to ensure title is always a string in EPub class
2025-11-19 13:17:58 -08:00
Marcello Fitton
d3619689db Refactor loadYouTubeTranscript() to include YouTube Video Metadata in Content When parseOnly is true (#4552)
* Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true

* extract to function

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-10-15 15:42:00 -07:00
Timothy Carambat
5edc1bea42 Add ability to auto-handle YT video URLs in uploader & chat (#4547)
* Add ability to auto-handle YT video URLs in uploader & chat

* move YT validator to URL utils

* update comment
2025-10-15 12:18:57 -07:00
timothycarambat
71cd46ce1b 1.9.0 tag 2025-10-09 15:11:59 -07:00
Marcello Fitton
d48c76919c Fix: File pulling fails with uppercase URL characters (#4516)
* fix: remove unnecessary toLowerCase in URL validation

* test: enhance URL validation tests to preserve case sensitivity and format

* test: update URL validation tests to ensure domain normalization to lowercase while preserving path case

* small formatting

* fix filenames when downloading live URI

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-10-08 14:00:02 -07:00
timothycarambat
8bc6aa7126 missed lint 2025-10-08 12:57:31 -07:00
timothycarambat
5173c75113 rescope validatedLink to local var 2025-10-07 12:08:53 -07:00
Timothy Carambat
cf3fbcbf0f Improve URL handler for collector processes (#4504)
* Improve URL handler for collector processes

* dev build
2025-10-07 11:03:27 -07:00
timothycarambat
bdfa0328db update comment about parseOnly 2025-10-01 20:45:52 -07:00
Marcello Fitton
f7b90571be Fetch, Parse, and Create Documents for Statically Hosted Files (#4398)
* Add capability to web scraping feature for document creation to download and parse statically hosted files

* lint

* Remove unneeded comment

* Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files

* Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js

* Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent | Return explicit argument of captureAs into scrapeGenericUrl in processLink fn

* Return debug log for scrapeGenericUrl

* Change conditional to a guard clause.

* Add error handling, validation, and JSDOC to getContentType helper fn

* remove unneeded comments

* Simplify URL validation by reusing module

* Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module | Add URL valuidation to downloadURIToFile

* refactor

* add support for webp
remove unused imports

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2025-10-01 15:49:05 -07:00
Marcello Fitton
eb77876127 Add HTTP request/response logging middleware for development mode (#4425)
* Add HTTP request logging middleware for development mode

- Introduced httpLogger middleware to log HTTP requests and responses.
- Enabled logging only in development mode to assist with debugging.

* Update httpLogger middleware to disable time logging by default

* Add httpLogger middleware for development mode in collector service

* Refactor httpLogger middleware to rename timeLogs parameter to enableTimestamps for clarity

* Make HTTP Logger only mount in development and environment flag is enabled.

* Update .env.example to clarify HTTP Logger configuration comments

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-09-29 13:33:15 -07:00
AoiYamada
8fc1f24d1b fix: youtube transcript collector not work well with non en or non asr caption (#4442)
* fix: youtube transcript collector not work well with non en or non asr caption

* stub YT test in Github actions

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
2025-09-29 13:22:50 -07:00
Timothy Carambat
95557ee16f Allow user to specify args for chromium process so they dont need SYS_ADMIN on container. (#4397)
* allow user to specify args for chromium process so they dont need SYS_ADMIN perms

* use arg flag content

* update console outputs
2025-09-17 16:31:08 -07:00