Files
ort/docs/pages/troubleshooting/performance.mdx
2024-06-08 13:20:07 -05:00

63 lines
2.9 KiB
Plaintext

---
title: 'Troubleshoot: Performance'
---
import { Callout, Tabs, Steps } from 'nextra/components';
## Execution providers don't seem to register
`ort` is designed to fail gracefully when an execution provider is not available. It logs failure events through [`tracing`](https://crates.io/crates/tracing), thus you'll need a library that subscribes to `tracing` events to see the logs. The simplest way to do this is to use [`tracing-subscriber`](https://crates.io/crates/tracing-subscriber).
<Steps>
### Add tracing-subscriber to your dependencies
```toml Cargo.toml
[dependencies]
tracing-subscriber = { version = "0.3", features = [ "env-filter", "fmt" ] }
```
### Initialize the subscriber in the main function
```rust main.rs
fn main() {
tracing_subscriber::fmt::init();
}
```
### Show debug messages from ort
Set the environment variable `RUST_LOG` to `ort=debug` to see all debug messages from `ort`.
<Tabs items={['Windows (PowerShell)', 'Windows (Command Prompt)', 'Linux', 'macOS']}>
<Tabs.Tab title="Windows (PowerShell)">
```powershell
$env:RUST_LOG = 'ort=debug';
cargo run
```
</Tabs.Tab>
<Tabs.Tab title="Windows (Command Prompt)">
```cmd
set RUST_LOG=ort=debug
cargo run
```
</Tabs.Tab>
<Tabs.Tab title="Linux">
```shell
RUST_LOG="ort=debug" cargo run
```
</Tabs.Tab>
<Tabs.Tab title="macOS">
```shell
RUST_LOG="ort=debug" cargo run
```
</Tabs.Tab>
</Tabs>
</Steps>
<Callout type='info'>You can also detect EP regsitration failures programmatically. See [Execution providers: Fallback behavior](/perf/execution-providers#fallback-behavior) for more info.</Callout>
## Inference is slower than expected
There are a few things you could try to improve performance:
- **Run `onnxsim` on the model.** Direct graph exports from some frameworks can leave a lot of junk nodes in the graph, which could hinder performance. [`onnxsim`](https://github.com/daquexian/onnx-simplifier) is a neat tool that can be used to simplify the ONNX graph and potentially improve performance.
- **Try different [execution providers](/perf/execution-providers)** for your hardware.
- **Use the [transformer optimization tool](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers).** This is another neat tool that converts certain transformer-based models to far more optimized graphs.
- **Use [I/O binding](/perf/io-binding).** This can reduce latency caused by copying the session inputs/outputs to/from devices.
- **[Quantize your model.](https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html)** You could try quantizing your model to 8-bit precision. This comes with a small accuracy loss, but can sometimes provide a large performance boost. If the accuracy loss is too high, you could also use [float16/mixed precision](https://onnxruntime.ai/docs/performance/model-optimizations/float16.html).