The UploadReady event's FileRef.ResourceId.OpaqueId is set to the space
root ID (required for CS3 gateway path resolution via WalkPath). This
means consumers that need the file's actual node ID for Graph API URLs
get the space root instead.
Add a separate ResourceID field (following the BytesReceived pattern)
that carries the file's actual resource identifier with the correct
OpaqueId set to session.NodeID().
Upstream: https://github.com/owncloud/reva/pull/XXXXX
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the per-file Search() call in IndexSpace with a direct Lookup()
using Bleve's DocIDQuery. The old approach parsed a KQL query string,
compiled it, and ran a full-text search for each file — taking 600-950ms
per file on large indexes. The new approach does an O(1) document lookup
by ID and compares mtime/Extracted fields in memory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add an Optimizer optional interface and Bleve.Optimize() method that
triggers ForceMerge to compact all index segments into one. Called
automatically after IndexSpace completes its walk. Over time, writes
create multiple segments that degrade query performance — compaction
consolidates them for faster searches.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update acceptance tests and all deployment example CSP configs to include
'data:' in font-src, consistent with the default csp.yaml change.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The bundled Web UI CSS (from owncloud/web) inlines the KaTeX_Size3 font
as a base64 data:font/woff2 URI. The default CSP sets font-src to 'self'
only, which blocks these data URIs and produces a console error on every
page load:
Loading the font 'data:font/woff2;base64,...' violates the following
Content Security Policy directive: "font-src 'self'".
Add 'data:' to font-src, matching the existing pattern where img-src
already permits data: URIs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extracts DateTimeNode handling into dateTimeQuery(), same pattern as
numericQuery(). Brings walk() cyclomatic complexity from 36 to ~30,
well below the SonarCloud threshold of 35.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes to fix permanently stuck index entries:
1. (Option D) Validate Tika responses: if MetaRecursive() returns an
empty metadata list, treat it as an error. This prevents Tika HTTP 200
responses with no actual content from being accepted as successful
extractions.
2. (Option B) Add Extracted field to Resource: a boolean that is set to
true only when UpsertItem completes successfully. The IndexSpace skip
check now requires Extracted:true in addition to matching id+mtime.
Documents that were written with incomplete extraction data will be
automatically re-processed on the next reindex.
Background: When Tika returns HTTP 200 but its child processes (OCR,
ImageMagick) fail due to resource limits (EAGAIN, OOM), the Go Tika
client receives metadata headers but no X-TIKA:content. The doc gets
written to Bleve with the correct mtime, and subsequent reindexes skip
it forever because the id+mtime check passes.
Fixes: #12093
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes SonarCloud gocyclo finding — walk() cyclomatic complexity
was 40 (threshold 35). Extracted numeric range query building
into a separate numericQuery() function.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add NumericRestrictionNode to the KQL PEG grammar so that range
operators work with numeric values, not just DateTimes. This enables
queries like `size>=1048576`, `photo.iso>=100`, `photo.fNumber>=2.8`,
and `photo.focalLength<50`.
Changes:
- ast: add NumericNode with Key, Operator, Value (float64)
- kql/dictionary.peg: add NumericRestrictionNode and Number rules
- kql/factory.go: add buildNumericNode()
- kql/cast.go: add toFloat64()
- bleve/compiler.go: compile NumericNode to NumericRangeQuery
Fixes: #12093
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- UpdateTags now updates only the tags field instead of all basic
metadata (name, size, mimetype, path, parentID), per reviewer
feedback that those are structural fields not metadata.
- Removed refreshMetadata helper that was over-reaching.
- Introduced ErrResourceNotFound sentinel in engine package so
UpdateTags can distinguish "not indexed" (fallback to UpsertItem)
from connectivity or other errors (log and stop).
- Added GoDoc example for the Update mutateFn callback.
- Added test for non-retriable error path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address reviewer feedback:
- Remove checksum field from index (bloats index, not searched)
- Remove skipUnchangedContent guard (not needed with UpdateTags path)
- Replace engine.Retrieve() with engine.Update(id, mutateFn) that does
get+mutate+set internally, avoiding public exposure of raw retrieval
- Simplify UpdateTags to use engine.Update() with mutation function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Moves the metadata-storage block from UpsertItem into a dedicated helper
method to bring cognitive complexity below the SonarCloud threshold of 15.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the checksum fast-path logic into a dedicated helper method to
bring UpsertItem below the SonarCloud cognitive complexity threshold
of 15.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests for:
- UpsertItem skips Tika when checksum unchanged
- UpsertItem runs Tika when checksum differs
- UpsertItem runs Tika when resource not yet indexed
- UpdateTags preserves Tika content when resource is indexed
- UpdateTags falls back to full UpsertItem when not indexed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tag events (TagsAdded/TagsRemoved) previously called UpsertItem which
triggered a full Tika content extraction for metadata-only changes.
Additionally, UpsertItem's SetArbitraryMetadata writeback bumped the
mtime, causing IndexSpace to call UpsertItem again — running Tika
twice per file. On servers with large libraries this caused a
denial-of-service from runaway convert/tesseract processes.
Fix by:
- Adding UpdateTags() to the Searcher interface for tag-only updates
that preserve existing Tika-extracted content
- Adding Retrieve() to the Engine interface to load indexed resources
- Comparing the content checksum (already computed by decomposedfs on
upload) in UpsertItem to skip Tika and the metadata writeback when
the blob has not been written to
- Extracting refreshMetadata() helper shared by both code paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>