following the testing methodology mentioned in the previous blog, here's an example result:
CPU: R7 7840U
GPU: RTX 3080 @ PCIe Gen4x4 (this is a HUGE limitation, see below for why)
Model: Qwen3.6-35B-A3B-MXFP4_MOE.gguf, target ctx = 69632
| ub | ncm min | pp4k@d8k | tg128@d8k |
|---|---|---|---|
| 512 | 32 | 294.34 | 49.2 |
| 1024 | 33 | 498.29 | 48.41 |
| 1536 | 34 | 609.46 | 46.83 |
| 2048 | 35 | 809.02 | 46.52 |
| 2560 | 36 | 793.59 | 45.86 |
| 3072 | 37 | 785.74 | (inconclusive) |
| 3584 | 38 | (not tested) | (not tested) |
| 4096 | 39 | (not tested) | (not tested) |
As you can see ub=2048 only loses about 7% of TG performance for 2.7X PP performance.
Peak power is only 210W and prompt processing TPS is only 800tps, even slower than 2xT4 while the card on itself should be way better. As you can see ub=2048 already saturates the (pretty slow) PCIe bus. The result will definitely look a lot better given a proper link (4x the speed), since a GTX 1660 Ti already can saturate this link.
Bình luận
i need a cpu upgrade so bad