Let it die, let it die! You shall die
Instead of creating an environment that lives throughout the duration of the process, we now hold the environment options we commit, and create the environment from those options (or grab the current env) whenever we need it. When all holders of the environment are dropped, the environment is dropped as well.
Previously, we held `Environment` as a static variable. Statics are never dropped, but ONNX Runtime's own destructors assumes that the environment is long gone by the time the process exits, which was not the case in `ort`! This led to issues like #441 and the dumb `0003-leak-logger-mutex.patch` from `ort-artifacts`.
I'm pretty sure the modnet photo was taken by Charlotte May, not "Tyler Nix". The link leftover from cudarc's COPYRIGHT.md (btw, why did I delete that file from modnet?) is now a 404. Reverse image search led me to Charlotte May's photo on Pexels, and their profile features more photos of the same person. Does this mean 'Tyler Nix' is a THIEF...? Nixgate!!1!
Also, I once again apologize for saying I was gonna do this however long ago and then promptly forgetting to do it
This has all sorts of fun breaking changes:
- `ort::inputs!` no longer yields an `ort::Result<...>` (thank God)
- `Tensor::from_array` now only accepts owned data.
- Introduce `TensorRef::from_array_view` and `TensorRefMut::from_array_view_mut`.
- `TryFrom<A>` is no longer implemented for `Tensor<T>` for any variants.
This opens the door to new optimizations on top of fixing a few unsoundness issues.
TODO: update docs
I did that thing again!
Features in this commit:
- `ThreadManager` allows you to define custom thread creation functions for environments & sessions.
- Sessions can now opt-out of using the environment's global thread pool.
- Implemented the safe `ShapeInferenceContext` wrapper for custom operators.
- Prepacked weights allow the CPU execution provider to share one allocation for identical weights between sessions.
- Customize workload type to prioritize efficiency; useful for background tasks.
- Configurable per-session log identifiers
- Dynamic dimension overrides
Breaking changes:
- `EnvironmentGlobalThreadPoolOptions` is now `GlobalThreadPoolOptions` and uses the builder pattern instead of exposed struct fields.
Breaking because `extract_tensor_*` now returns `&[i64]` for dimensions, and `dtype()` and `memory_info()` also return references.
Each tensor extract call not only had multiple FFI calls to determine the `ValueType`, but also had to determine `MemoryInfo` to ensure the data was CPU-accessible. Since neither the data type or memory location can *change* for a given value, it doesn't make sense to compute this on each extract call; it's better to compute it once, when we create the `Value` (and we often already have the types created by this time, so little FFI is actually required).
This should make `extract_tensor_raw` zero-alloc, most benefitting usages of `IoBinding`/`OutputSelector`. This does mean usages of `Value` without ever extracting said value (like HF Transformers hidden state outputs which go ignored) incur slightly more overhead, but the tradeoff of having less overhead at extraction time seems worth it.
Shaves off the `thiserror` dependency and should improve compile times slightly.
Unfortunately this does mean we can't match on `Error` anymore, though I'm not sure if that was ever useful to begin with.
aka The Cleanening, part 2
- Add clearer documentation and examples for more things.
- Rework string tensors by introducing `PrimitiveTensorElementType` for primitive (i.e. f32) types, and again re-implementing `IntoTensorElementType` for `String`. This allows string tensors to be used via `Tensor<String>` instead of exclusively via `DynTensor`. Additionally, string tensors no longer require an `Allocator` to be created (which didn't make sense, since string data in Rust can only ever be stored on the CPU anyway). This also now applies to `Map`s, since their data also needed to be on the CPU anyway. (`Sequence`s are currently unaffected because I think a custom allocator could be useful for them?)
- Rework the `IoBinding` interface, and add an example clarifying the intended usage of it (ref #209). Thanks to AAce from the pyke Discord for pointing out the mutability issue in the old interface, which should be addressed now.
- Refactor `OperatorDomain::add` from the slightly-nicer-looking-but-more-confusing `fn<T>(t: T)` to just `fn<T>()` to further enforce the fact that `Operator`s are zero-sized.
- Maps can now have `String` keys.
- Remove some unused errors.