(Click these buttons for more interesting results!)
Benchmark Settings
Shared evaluation configuration across all datasets.
- Hardware
- NVIDIA H200
- Draft TP
- 1
- Target TP
- 2
- Num questions / dataset
- 128
- Temperature
- 0
- Max tokens
- 200
- Dtype
- bfloat16
- Random seed
- 0
All Qwen2.5 and Llama3 series entries use their respective Instruct checkpoints.
| Target model | Draft model | Throughput | MAT | Speedup |
|---|