Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test”

1 sources2 storiesFirst seen 2/26/2026Score36Mixed Progress

Single Source

Bigness

Coverage

Recency

Engagement

Velocity

Confidence

Clipability

Polarization

Claims

Contradictions

Breakthrough

Sentiment Mix

Positive0%

Neutral100%

Negative0%

Geography

North America

Expert Signals

r/LocalLLaMA

source • 2 mentions

_w0n

author • 1 mention

liviuberechet

author • 1 mention

Extracted Claims

Running Qwen 3.5 (122B) with ~72GB of VRAM - Setup and results so far.

Supported by 1 story

Hi everyone, I've been closely following the latest releases and wanted to share my hardware configuration for running the new Qwen3.5 122B model.

Supported by 1 story

**The Model** * **Model:** `Qwen3.5-122B-A10B-UD-Q4_K_XL` (Unsloth) * **Source:** [https://huggingface.co/unsloth/Qwen3.5-122B-A10B-GGUF](https://huggingface.co/unsloth/Qwen3.5-122B-A10B-GGUF) **Hardware Setup** * **GPU 1:** NVIDIA RTX A6000 (48GB VRAM) * **GPU 2:** NVIDIA RTX 3090 Ti (24GB VRAM) * **CPU:** AMD Ryzen Threadripper 3960X (24-Core @ 3.80 GHz) * **RAM:** 64 GiB DDR4 **Software Stack** * **Backend:** llama.cpp * **Version:** b8148 (Compiled Feb 25th) * **Environment:** Docker (`ghcr.io/ggml-org/llama.cpp:server-cuda`) **llama.cpp Server Flags** -m /models/Qwen3.5-122B-UD-Q4_K_XL-00001-of-00003.gguf \ -ngl 999 \ --alias "Qwen3.5-122B" \ --split-mode layer \ --tensor-split 2,1 \ --seed 3407 \ --jinja \ --reasoning-format deepseek \ --temp...

Supported by 1 story

Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test”.

Supported by 1 story

I am absolutely loving Qwen3.5 122B!

Supported by 1 story

It’s the best model I can run on my 72GB VRAM setup, fully loaded on GPU including context.

Supported by 1 story

Very good speed at 25 tok/s.

Supported by 1 story

If you are experiencing endless “but wait” loops, this is what worked for me: * Thinking mode on * Temperature 0.6 * K Sampling 20 * Top P sampling 0.8 * Min P sampling 0 * Repeat penalty 1.3 Running it in Q3\_K it’s a bit slower than GLM Air (30 t/s in IQ4\_NL) and GPT-OSS-120B (30-38 t/s in MXFP4), but because it has a smaller footprint in Q3 I am able to push the context to 120k which is great!

Supported by 1 story