* add chroma cloud as new vector db provider
* update docker example env
* extend chroma class to chroma cloud
* update readme
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* Added exa-search case to the search provider switch in web-browsing.js
* Added ExaSearchOptions component for API key input
* update
* Patch missing image crashing UI
Fix issue where ENV key did not exist or was saved on click
Update copy for provider
Add Docs for ENV keys for manual placements
update systemssettings for returning key saved to UI
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* WIP on embedder selection
TODO: apply splitting and query prefixes (if applicable)
* wip on upsert
* Support base model
support nomic-text-embed-v1
support multilingual-e5-small
Add prefixing for both embedding and query for RAG tasks
Add chunking prefix to all vector dbs to apply prefix when possible
Show dropdown and auto-pull on new selection
* norm translations
* move supported models to constants
handle null seelction or invalid selection on dropdown
update comments
* dev
* patch text splitter maximums for now
* normalize translations
* add tests for splitter functionality
* normalize
---------
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
* Add context window finder from litellm maintained list
apply to all cloud providers, have client cache for 3 days
* docker container bootup warning
* update invalid ENV warning
* rebased with current HEAD
* update newline printing
* PGVector support for vector db storage
* forgot files
* comments
* dev build
* Add ENV connection and table schema validations for vector table
add .reset call to drop embedding table when changing the AnythingLLM embedder
update instrutions
Add preCheck error reporting in UpdateENV
add timeout to pg connection
* update setup
* update README
* update doc
* Fixed two primary issues discovered while using AWS Bedrock with Anthropic Claude Sonnet models:
- Context Window defaults to 8192 maximum, which isn't correct
- Multimodal stopped working when removing langchain, which was transparently handling image_url to a format sonnet expects.
* Ran `yarn lint`
* Updated .env.example to have aws bedrock examples too
* Refactor for readability
move utils for AWS specific functionality to subfile
add token output max to ENV so setting persits
---------
Co-authored-by: Tristan Stahnke <tristan.stahnke+gpsec@guidepointsecurity.com>
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* Enable bypass of ip limitations via ENV in collector startup
resolves#3625
connect #3626
* dev build
* bump dockerx build action
* enable runtime setting config of collector requests
* comments and linting for option passing
* unset
* unset
* update docs link
* linting and docs
* WIP MCP full compatibility layer
* implement MCP agent function wrapping and invocation methods
* Add `uvx` to docker bin for MCP executions
* dev build
* prune removed data
* Wrap MCP servers to lazy load items to not block the UI
Mobile bug fixes
* arm64 test build
* reset dev builder
* remove unused prop
* Add multilingual support for ocr mudule
* Add OCR langauge as server var that is passed into Collector
Support all valid tesseract language codes
Filter and parse only valid codes with fallbacks'
* persist TARGET_OCR_LANG
* update docker example env
---------
Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
* feat: add new model provider PPIO
* fix: fix ppio model fetching
* fix: code lint
* reorder LLM
update interface for streaming and chats to use valid keys
linting
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* Add support for Google Generative AI (Gemini) embedder
* Add missing example in docker
Fix UI key elements in options
Add Gemini to data handling section
Patch issues with chunk handling during embedding
* remove dupe in env
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* exposes `maxConcurrentChunks` parameter for the generic openai embedder through configuration. This allows setting a batch size for endpoints which don't support the default of 500
* Update new field to new UI
make getting to ensure proper type and format
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* feat: add new model provider: Novita AI
* feat: finished novita AI
* fix: code lint
* remove unneeded logging
* add back log for novita stream not self closing
* Clarify ENV vars for LLM/embedder seperation for future
Patch ENV check for workspace/agent provider
---------
Co-authored-by: Jason <ggbbddjm@gmail.com>
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
* Update OpenAI TTS config to allow a custom BaseURL
* uncheck config file
* break openai generic TTS into its own provider
* add space
* hide TTS on user msg
---------
Co-authored-by: Adam <phazei@gmail.com>
* Issue #1943: Add support for LLM provider - Fireworks AI
* Update UI selection boxes
Update base AI keys for future embedder support if needed
Add agent capabilites for FireworksAI
* class only return
---------
Co-authored-by: Aaron Van Doren <vandoren96+1@gmail.com>
* Revert "Patch ODBC support from missing binary/headers for node-odbc"
This reverts commit 9de6b1cc26.
* Revert "OBDC Support (#1933)"
This reverts commit cd597a361e.
* bump base image
* Bump mysql
* attestations
* attestations perms
* attestations perms
* fix permissions for attestetions for master push
* test cleanup
* repin base image
* revert base
* patch frontend-clean
* force resolve braces to fixed version
* rebump
* wip bg workers for live document sync
* Add ability to re-embed specific documents across many workspaces via background queue
bgworkser is gated behind expieremental system setting flag that needs to be explictly enabled
UI for watching/unwatching docments that are embedded.
TODO: UI to easily manage all bg tasks and see run results
TODO: UI to enable this feature and background endpoints to manage it
* create frontend views and paths
Move elements to correct experimental scope
* update migration to delete runs on removal of watched document
* Add watch support to YouTube transcripts (#1716)
* Add watch support to YouTube transcripts
refactor how sync is done for supported types
* Watch specific files in Confluence space (#1718)
Add failure-prune check for runs
* create tmp workflow modifications for beta image
* create tmp workflow modifications for beta image
* create tmp workflow modifications for beta image
* dual build
update copy of alert modals
* update job interval
* Add support for live-sync of Github files
* update copy for document sync feature
* hide Experimental features from UI
* update docs links
* [FEAT] Implement new settings menu for experimental features (#1735)
* implement new settings menu for experimental features
* remove unused context save bar
---------
Co-authored-by: timothycarambat <rambat1010@gmail.com>
* dont run job on boot
* unset workflow changes
* Add persistent encryption service
Relay key to collector so persistent encryption can be used
Encrypt any private data in chunkSources used for replay during resync jobs
* update jsDOC
* Linting and organization
* update modal copy for feature
---------
Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>