anything-llm

mirror of https://github.com/Mintplex-Labs/anything-llm synced 2026-04-25 17:15:37 +02:00

Author	SHA1	Message	Date
Asish Kumar	91e75c27c2	fix: preserve Confluence context paths (#5415 ) * fix: preserve confluence context paths * lint and minor changes --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-04-13 13:10:40 -07:00
Timothy Carambat	0645f3c4bf	Reapply "Remove illegal chars for Windows on files (#5364 )" This reverts commit `869be87ef6`.	2026-04-06 14:05:25 -07:00
Timothy Carambat	869be87ef6	Revert "Remove illegal chars for Windows on files (#5364 )" This reverts commit `8ed1d35ab3`.	2026-04-06 14:03:53 -07:00
Timothy Carambat	8ed1d35ab3	Remove illegal chars for Windows on files (#5364 )	2026-04-06 11:12:13 -07:00
Sean Hatfield	192ca411f2	Telegram bot connector (#5190 ) * wip telegram bot connector * encrypt bot token, reorg telegram bot modules, secure pairing codes * offload telegram chat to background worker, add @agent support with chart png rendering, reconnect ui * refactor telegram bot settings page into subcomponents * response.locals for mum, telemetry for connecting to telegram * simplify telegram command registration * improve telegram bot ux: rework switch/history/resume commands * add voice, photo, and TTS support to telegram bot with long message handling * lint * rename external_connectors to external_communication_connectors, add voice response mode, persist chat workspace/thread selection * lint * fix telegram bot connect/disconnect bugs, kill telegram bot on multiuser mode enable * add english translations * fix qr code in light mode * repatch migration * WIP checkpoint * pipeline overhaul for using response obj * format functions * fix comment block * remove conditional dumpENV + lint * remove .end() from sendStatus calls * patch broken streaming where streaming only first chunk * refactor * use Ephemeral handler now * show metrics and citations in real GUI * bugfixes * prevent MuM persistence, UI cleanup, styling for status * add new workspace flow in UI Add thread chat count fix 69 byte payload callback limit bug * handle pagination for workspaces, threads, and models * modularize commands and navigation * add /proof support for citation recall * handle backlog message spam * support abort of response streams * code cleanup * spam prevention * fix translations, update voice typing indicator, fix token bug * frontend refactor, update tips on /status and voice response improvements * collapse agent though blocks * support images * Fix mime issues with audio from other devices * fix config issue post server stop * persist image on agentic chats * 5189 i18n (#5245) * i18n translations connect #5189 * prune translations * fix errors * fix translation gaps --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-03-23 15:10:21 -07:00
Yitong Li	2f7a818744	fix(collector): infer file extension from Content-Type for URLs without explicit extensions (#5252 ) * fix(collector): infer file extension from Content-Type for URLs without explicit extensions When downloading files from URLs like https://arxiv.org/pdf/2307.10265, the path has no recognizable file extension. The downloaded file gets saved without an extension (or with a nonsensical one like .10265), causing processSingleFile to reject it with 'File extension .10265 not supported for parsing'. Fix: after downloading, check if the filename has a supported file extension. If not, inspect the response Content-Type header and map it to the correct extension using the existing ACCEPTED_MIMES table. For example, a response with Content-Type: application/pdf will cause the file to be saved with a .pdf extension, allowing it to be processed correctly. Fixes #4513 * small refactor --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-03-23 09:40:22 -07:00
Timothy Carambat	dc0bdf112b	linting & show descriptive error for bad `addtoWorkspace` request body resolves #5172	2026-03-09 11:30:53 -07:00
Maxwell Calkin	563f95167d	fix: add missing /wiki to Confluence cloud citation URLs (#5167 ) fix: add /wiki to Confluence cloud page URLs in citations	2026-03-09 10:24:56 -07:00
Marcello Fitton	8f33203ade	chore: add ESLint to `/collector` (#5128 ) * add eslint config to /collector * prettier formatting * fix unused * fix undefined * disable lines * lockfile --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-03-05 16:25:23 -08:00
Timothy Carambat	d58ff0ea3e	Normalize scraper runtimeargs for bulk-scraper (#5083 ) resolves #5078 closes #5079	2026-02-27 09:15:17 -08:00
Marcello Fitton	c927eda18f	fix: GitLab connector infinite loop and rate limit crash for large repos (#5021 ) * Fix infinite loop and rate limit crashes * simplify logic \| add max-retries to fetchNextPage and fetchSingleFileContents --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-02-19 12:42:21 -08:00
Timothy Carambat	2dc625193e	4825 patch yt file collector api (#4904 ) Patch YT links in API document collector closes #4825	2026-01-26 14:36:21 -08:00
j0rDy	f52e2866ac	Update common.js (#4894 ) * Update common.js Added missing translations in Dutch. * linting --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2026-01-23 17:12:17 -08:00
Timothy Carambat	4de5e30ac6	Merge commit from fork	2026-01-23 17:06:44 -08:00
Timothy Carambat	feb039ea70	Adjust fix path to use ESM import (#4867 ) * Adjust fix path to use ESM import * normalize fix-path imports and usage across the app * extract path fix logic to utils for server and collector * add helpers * repin strip-ansi in collector * fix log for localWhisper lint	2026-01-15 16:13:21 -08:00
Timothy Carambat	092b1b45f8	Upgrade YT Scraper (#4820 )	2026-01-02 15:41:22 -08:00
Sean Hatfield	6c1f8a38ce	Refactor localWhisper to use custom FFMPEGWrapper class (#4775 ) * refactor localWhisper to use new custom FFMPEGWrapper class * stub tests in github actions * add back wavefile conversion to 16khz 32f to fix docker builds * use afterEach for cleanup in ffmpeg tests * remove unused FFMPEG_PATH env check * use spawnSync for ffmpeg to capture and log output * lint * revert removal of try/catch around validateAudioFile for more helpful error msgs * use readFileSync instead of createReadStream for less overhead * change import to require for fix-path and stub import in tests * refactor to singleton to preserve ffmpeg path dev build --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-12-18 11:41:45 -08:00
Sean Hatfield	c76b0708c3	Fix pagination bug in paperless-ngx data connector (#4757 ) * iterate over all pages in paperless-ngx data connector * add error handling and data validation * refactor to handle edge cases and null values * catch edge case to prevent infinite loop --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-12-12 10:23:32 -08:00
timothycarambat	758db6b677	fix lint	2025-11-25 14:42:10 -08:00
Neha Prasad	3ecf218eea	feat: Add SSL certificate bypass support for self-hosted Confluence instances (#4219 ) * Added bypassSSL parameter to constructor and implemented SSL bypass logic in fetchConfluenceData method * Updated generateChunkSource function to include bypassSSL in the encrypted payload * Updated the request body to include bypassSSL in the JSON payload sent to the backend * Updated form submission to include bypassSSL parameter from the checkbox * Added bypass_ssl: "Bypass SSL Certificate Validation" translation * passed these parameters to fetchconfluencepage function for proper resync functionality * allow ignore of SSL cert for Confluence * add translations --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-25 14:32:10 -08:00
Sean Hatfield	05df4ac72b	Paperless ngx data connector (#4121 ) * paperless ngx data connector * wip resync paperless ngx * fix generateChunkSource for resyncing paperless ngx * lint * Refactor Paperless-NGX connector Fix issue with date rendering in tooltip + extended width Move tooltip details to be column for more space --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-11-20 11:27:38 -08:00
Timothy Carambat	b3b261e15d	Fix loop logic for `fetchNextPage` use in GitLabLoader (#4662 ) resolves #4626 closes #4627	2025-11-19 13:53:26 -08:00
Marcello Fitton	d3619689db	Refactor `loadYouTubeTranscript()` to include YouTube Video Metadata in Content When `parseOnly` is `true` (#4552 ) * Enhance YouTube transcript loading to include video metadata in parsed content when parseOnly is true * extract to function --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-15 15:42:00 -07:00
Timothy Carambat	5edc1bea42	Add ability to auto-handle YT video URLs in uploader & chat (#4547 ) * Add ability to auto-handle YT video URLs in uploader & chat * move YT validator to URL utils * update comment	2025-10-15 12:18:57 -07:00
Marcello Fitton	d48c76919c	Fix: File pulling fails with uppercase URL characters (#4516 ) * fix: remove unnecessary toLowerCase in URL validation * test: enhance URL validation tests to preserve case sensitivity and format * test: update URL validation tests to ensure domain normalization to lowercase while preserving path case * small formatting * fix filenames when downloading live URI --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-08 14:00:02 -07:00
Timothy Carambat	cf3fbcbf0f	Improve URL handler for collector processes (#4504 ) * Improve URL handler for collector processes * dev build	2025-10-07 11:03:27 -07:00
Marcello Fitton	f7b90571be	Fetch, Parse, and Create Documents for Statically Hosted Files (#4398 ) * Add capability to web scraping feature for document creation to download and parse statically hosted files * lint * Remove unneeded comment * Simplified process by using key of ACCEPTED_MIMES to validate the response content type, as a result unlocked all supported files * Add TODO comments for future implementation of asDoc.js to handle standard MS Word files in constants.js * Return captureAs argument to be exposed by scrapeGenericUrl and passed into getPageContent \| Return explicit argument of captureAs into scrapeGenericUrl in processLink fn * Return debug log for scrapeGenericUrl * Change conditional to a guard clause. * Add error handling, validation, and JSDOC to getContentType helper fn * remove unneeded comments * Simplify URL validation by reusing module * Rename downloadFileToHotDir to downloadURIToFile and moved up to a global module \| Add URL valuidation to downloadURIToFile * refactor * add support for webp remove unused imports --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-10-01 15:49:05 -07:00
AoiYamada	8fc1f24d1b	fix: youtube transcript collector not work well with non en or non asr caption (#4442 ) * fix: youtube transcript collector not work well with non en or non asr caption * stub YT test in Github actions --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-09-29 13:22:50 -07:00
Timothy Carambat	95557ee16f	Allow user to specify args for chromium process so they dont need SYS_ADMIN on container. (#4397 ) * allow user to specify args for chromium process so they dont need SYS_ADMIN perms * use arg flag content * update console outputs	2025-09-17 16:31:08 -07:00
timothycarambat	0200e647b8	add back normalization + docs link	2025-08-14 11:43:04 -07:00
Timothy Carambat	0fb33736da	Workspace Chat with documents overhaul (#4261 ) * Create parse endpoint in collector (#4212) * create parse endpoint in collector * revert cleanup temp util call * lint * remove unused cleanupTempDocuments function * revert slug change minor change for destinations --------- Co-authored-by: timothycarambat <rambat1010@gmail.com> * Add parsed files table and parse server endpoints (#4222) * add workspace_parsed_files table + parse endpoints/models * remove dev api parse endpoint * remove unneeded imports * iterate over all files + remove unneeded update function + update telemetry debounce * Upload UI/UX context window check + frontend alert (#4230) * prompt user to embed if exceeds prompt window + handle embed + handle cancel * add tokenCountEstimate to workspace_parsed_files + optimizations * use util for path locations + use safeJsonParse * add modal for user decision on overflow of context window * lint * dynamic fetching of provider/model combo + inject parsed documents * remove unneeded comments * popup ui for attaching/removing files + warning to embed + wip fetching states on update * remove prop drilling, fetch files/limits directly in attach files popup * rework ux of FE + BE optimizations * fix ux of FE + BE optimizations * Implement bidirectional sync for parsed file states linting small changes and comments * move parse support to another endpoint file simplify calls and loading of records * button borders * enable default users to upload parsed files but NOT embed * delete cascade on user/workspace/thread deletion to remove parsedFileRecord * enable bgworker with "always" jobs and optional document sync jobs orphan document job: Will find any broken reference files to prevent overpollution of the storage folder. This will run 10s after boot and every 12hr after * change run timeout for orphan job to 1m to allow settling before spawning a worker * linting and cleanup pr --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com> * dev build * fix tooltip hiding during embedding overflow files * prevent crash log from ERRNO on parse files * unused import * update docs link * Migrate parsed-files to GET endpoint patch logic for grabbing models names from utils better handling for undetermined context windows (null instead of Pos_INIFI) UI placeholder for null context windows * patch URL --------- Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>	2025-08-11 09:26:19 -07:00
Timothy Carambat	70a07b743b	Update `writeToServerDocuments` to take config object (#4213 )	2025-07-29 17:53:05 -07:00
timothycarambat	7692775942	minor change to XLSX parse and upload output folder	2025-07-29 17:44:47 -07:00
timothycarambat	ff34c8cefc	use documentsFolder path for simplification	2025-07-16 11:14:18 -07:00
Sean Hatfield	5485c58b44	Sanitize youtube transcription file paths (#4148 ) sanitize youtube transcription file paths	2025-07-14 13:53:34 -07:00
rexjohannes	14fa079953	Fix/drupal wiki (improve table & url handling) (#4097 ) * feat: add support for custom table formatting in htmlToText conversion * fix tables * feat: improve plain text table formatting for AI readability * fix options * improve drupal wiki connector * final fix * adjust leading slash to match code * linting --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:39:38 -07:00
bobbercheng	d0978fa363	Fix broken YT scraping with YT API (#4005 ) * Fix broken YT scraping with YT API * refactor youtube transcript class/add jsdoc comments * fix test --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-07-07 13:06:18 -07:00
timothycarambat	3d5e8602a8	lint	2025-05-27 13:54:13 -07:00
rexjohannes	dc80d3e535	fixed drupal connector (#3893 ) https://github.com/Mintplex-Labs/anything-llm/issues/3875#issuecomment-2913211343	2025-05-27 13:15:43 -07:00
Timothy Carambat	245a5969b8	normalize path on drupal to use documentsFolder constant normalize path on drupal to use documentsFolder constant	2025-05-27 09:25:48 -07:00
Sean Hatfield	2b274c62b7	Obsidian data connector (#3798 ) * add obsidian vault data connector * lint * add english translations * normalize translations * improve file parser and reader --------- Co-authored-by: timothycarambat <rambat1010@gmail.com>	2025-05-12 13:45:27 -07:00
timothycarambat	9d661bb96e	linting	2025-05-07 09:40:31 -07:00
mr-chenguang	eff9d24cb9	feat: support fetch wikis for gitlab data connectors (#3271 ) * feat: support fetch wikis for gitlab data connectors * gitlab connector button spacing * add docAuthor and description metadata for GitLab wiki pages --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-05-06 14:09:53 -07:00
Timothy Carambat	1601eb986c	Enable bypass of ip limitations via ENV in collector processing (#3652 ) * Enable bypass of ip limitations via ENV in collector startup resolves #3625 connect #3626 * dev build * bump dockerx build action * enable runtime setting config of collector requests * comments and linting for option passing * unset * unset * update docs link * linting and docs	2025-04-21 11:10:41 -07:00
Timothy Carambat	fd4929b4d2	Feature/drupalwiki collector (#3693 ) * Implement DrupalWiki collector * Add attachment downloading and processing functionality (#3) * linting * Linting Add citation image small refactors add URL for citation identifier --------- Co-authored-by: em <eugen.mayer@kontextwork.de> Co-authored-by: rexjohannes <53578137+rexjohannes@users.noreply.github.com> Co-authored-by: Eugen Mayer <136934+EugenMayer@users.noreply.github.com>	2025-04-21 09:17:24 -07:00
Timothy Carambat	fd174cab86	Apply `.git` logic handler for repo URLs (#3655 ) * Apply `.git` logic handler for repo URLs * remove comment	2025-04-15 18:01:14 -07:00
Timothy Carambat	fab74037fa	Prevent collector crash when blocked by CDN (#3373 ) resolves #3365	2025-02-28 10:27:05 -08:00
AbelDuan	df166eb64e	feat: Add multilingual support for ocr module (#3325 ) * Add multilingual support for ocr mudule * Add OCR langauge as server var that is passed into Collector Support all valid tesseract language codes Filter and parse only valid codes with fallbacks' * persist TARGET_OCR_LANG * update docker example env --------- Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-27 12:31:17 -08:00
t2	0eb86e2c12	for projects in gitlab subgroup (#3075 ) (#3247 ) * for projects in gitlab subgroup (#3075) * fix: false condition * refactor pattern matching logic --------- Co-authored-by: t2 <> Co-authored-by: shatfield4 <seanhatfield5@gmail.com> Co-authored-by: Timothy Carambat <rambat1010@gmail.com>	2025-02-17 12:25:11 -08:00
Timothy Carambat	4545ce24cd	Drop Node `canvas` for manual `sharp` conversion (#3221 ) * Drop Node `canvas` for manual `sharp` conversion * bump dev	2025-02-14 17:38:13 -08:00

1 2 3

132 Commits