Results

BenchMark

Interactively explore throughput, MAT, and speedup across model pairs.

(Click these buttons for more interesting results!)

Benchmark Settings

Shared evaluation configuration across all datasets.

Hardware
NVIDIA H200
Draft TP
1
Target TP
2
Num questions / dataset
128
Temperature
0
Max tokens
200
Dtype
bfloat16
Random seed
0

All Qwen2.5 and Llama3 series entries use their respective Instruct checkpoints.

Target model Draft model Throughput MAT Speedup