Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test”
Sentiment Mix
Geography
Expert Signals
r/LocalLLaMA
source • 2 mentions
_w0n
author • 1 mention
liviuberechet
author • 1 mention
Extracted Claims
Running Qwen 3.5 (122B) with ~72GB of VRAM - Setup and results so far.
Supported by 1 story
Hi everyone, I've been closely following the latest releases and wanted to share my hardware configuration for running the new Qwen3.5 122B model.
Supported by 1 story
**The Model** * **Model:** `Qwen3.5-122B-A10B-UD-Q4_K_XL` (Unsloth) * **Source:** [https://huggingface.co/unsloth/Qwen3.5-122B-A10B-GGUF](https://huggingface.co/unsloth/Qwen3.5-122B-A10B-GGUF) **Hardware Setup** * **GPU 1:** NVIDIA RTX A6000 (48GB VRAM) * **GPU 2:** NVIDIA RTX 3090 Ti (24GB VRAM) * **CPU:** AMD Ryzen Threadripper 3960X (24-Core @ 3.80 GHz) * **RAM:** 64 GiB DDR4 **Software Stack** * **Backend:** llama.cpp * **Version:** b8148 (Compiled Feb 25th) * **Environment:** Docker (`ghcr.io/ggml-org/llama.cpp:server-cuda`) **llama.cpp Server Flags** -m /models/Qwen3.5-122B-UD-Q4_K_XL-00001-of-00003.gguf \ -ngl 999 \ --alias "Qwen3.5-122B" \ --split-mode layer \ --tensor-split 2,1 \ --seed 3407 \ --jinja \ --reasoning-format deepseek \ --temp...
Supported by 1 story
Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test”.
Supported by 1 story
I am absolutely loving Qwen3.5 122B!
Supported by 1 story
It’s the best model I can run on my 72GB VRAM setup, fully loaded on GPU including context.
Supported by 1 story
Very good speed at 25 tok/s.
Supported by 1 story
If you are experiencing endless “but wait” loops, this is what worked for me: * Thinking mode on * Temperature 0.6 * K Sampling 20 * Top P sampling 0.8 * Min P sampling 0 * Repeat penalty 1.3 Running it in Q3\_K it’s a bit slower than GLM Air (30 t/s in IQ4\_NL) and GPT-OSS-120B (30-38 t/s in MXFP4), but because it has a smaller footprint in Q3 I am able to push the context to 120k which is great!
Supported by 1 story
Related Events
Qwen3.5-35B-A3B Q4 Quantization Comparison
Uncategorized • 2/26/2026
Sora 2 System Card
Computer Vision • 2/26/2026
Gemini 2.5 Pro Preview: even better coding performance
LLMs • 2/26/2026
OpenAI o3 and o4-mini System Card
LLMs • 2/27/2026
Gemini 2.5: Updates to our family of thinking models
LLMs • 2/26/2026