0

tuning test

đã đăng vào 15, Tháng 5, 2026, 22:46

following the testing methodology mentioned in the previous blog, here's an example result:

CPU: R7 7840U

GPU: RTX 3080 @ PCIe Gen4x4 (this is a HUGE limitation, see below for why)

Model: Qwen3.6-35B-A3B-MXFP4_MOE.gguf, target ctx = 69632

ub ncm min pp4k@d8k tg128@d8k
512 32 294.34 49.2
1024 33 498.29 48.41
1536 34 609.46 46.83
2048 35 809.02 46.52
2560 36 793.59 45.86
3072 37 785.74 (inconclusive)
3584 38 (not tested) (not tested)
4096 39 (not tested) (not tested)

As you can see ub=2048 only loses about 7% of TG performance for 2.7X PP performance.

Peak power is only 210W and prompt processing TPS is only 800tps, even slower than 2xT4 while the card on itself should be way better. As you can see ub=2048 already saturates the (pretty slow) PCIe bus. The result will definitely look a lot better given a proper link (4x the speed), since a GTX 1660 Ti already can saturate this link.


Bình luận

Hãy đọc nội quy trước khi bình luận.



  • 0
    yoshi_fp36  đã bình luận lúc 15, Tháng 5, 2026, 15:53

    i need a cpu upgrade so bad