Native Embedder model selection (incl: Multilingual support) (#3835)

* WIP on embedder selection TODO: apply splitting and query prefixes (if applicable) * wip on upsert * Support base model support nomic-text-embed-v1 support multilingual-e5-small Add prefixing for both embedding and query for RAG tasks Add chunking prefix to all vector dbs to apply prefix when possible Show dropdown and auto-pull on new selection * norm translations * move supported models to constants handle null seelction or invalid selection on dropdown update comments * dev * patch text splitter maximums for now * normalize translations * add tests for splitter functionality * normalize --------- Co-authored-by: shatfield4 <seanhatfield5@gmail.com>
2026-04-25 17:15:37 +02:00 · 2025-07-22 10:07:20 -07:00
parent 31a8ead823
commit 2c19dd09ed
44 changed files with 463 additions and 80 deletions
--- a/server/.env.example
+++ b/server/.env.example
@@ -138,6 +138,10 @@ SIG_SALT='salt' # Please generate random string at least 32 chars long.
 ###########################################
 ######## Embedding API SElECTION ##########
 ###########################################
+# This will be the assumed default embedding seleciton and model
+# EMBEDDING_ENGINE='native'
+# EMBEDDING_MODEL_PREF='Xenova/all-MiniLM-L6-v2'
+
 # Only used if you are using an LLM that does not natively support embedding (openai or Azure)
 # EMBEDDING_ENGINE='openai'
 # OPEN_AI_KEY=sk-xxxx