AMD Explains How to run DeepSeek R1 Distilled Reasoning Models on AMD Ryzen AI and Radeon GPUs


teaser

DeepSeek R1 Distilled Reasoning models use chain-of-thought reasoning to analyze complex prompts in detail. Instead of producing immediate replies, they spend time generating a “thinking” sequence, which often involves processing hundreds or thousands of tokens internally. This approach helps the model to evaluate various perspectives before generating a final response. Although this increases the wait time, it typically delivers more thorough results, which can be valuable for tasks in scientific research, mathematics, and other technical fields.

AMD supports different sizes of DeepSeek R1 distillations across its processor and graphics card lineup. Larger processors, such as the Ryzen AI Max+ 395 Series, can run bigger distills like Qwen-32B, while mid-range products like Ryzen AI HX 370 or 7040/8040 often handle Qwen-14B or Llama-14B. For graphics cards, models like the Radeon RX 7900 XTX can accommodate Qwen-32B, but lower-tier cards generally work best with smaller versions. It is recommended to quantize these models in Q4 K M format to reduce memory usage and make the most of the available GPU resources.

 To deploy a DeepSeek R1 distill, install the Adrenalin 25.1.1 driver or newer and download LM Studio 0.3.8 or above. Use the “Discover” tab in LM Studio to select your preferred model, confirm Q4 K M quantization, and adjust GPU offload layers to suit your system’s capacity. Once everything is configured, load the model in the “Chat” tab to start interacting with its chain-of-thought process. This local deployment approach can enhance data security and reduce latency since all reasoning is performed directly on AMD hardware. For reliable performance, consult official documentation to confirm your system meets driver and memory requirements.  

Step 1: Make sure you are on the 25.1.1 Optional or higher Adrenalin driver.

Step 2: Download LM Studio 0.3.8 or above from lmstudio.ai/ryzenai

Step 3: Install LM Studio and skip the onboarding screen.

Step 4:
 Click on the discover tab.

Step 5: Choose your DeepSeek R1 Distill. Smaller distills like the Qwen 1.5B offer blazing fast performance (and are the recommended starting point) while bigger distills will offer superior reasoning capability. All of them are extremely capable. The table below details the maximum recommended DeepSeek R1 Distill size:

 

Processor

DeepSeek R1 Distill* (Max Supported)

AMD Ryzen™ AI Max+ 395 32GB1, 64 GBand 128 GB

DeepSeek-R1-Distill-Llama-70B (64GB and 128GB only)
DeepSeek-R1-Distill-Qwen-32B

AMD Ryzen™ AI HX 370 and 365 24GB and 32 GB

DeepSeek-R1-Distill-Qwen-14B

AMD Ryzen™ 8040 and Ryzen™ 7040 32 GB

DeepSeek-R1-Distill-Llama-14B

*= AMD recommends running all distills in Q4 K M quantization.
1= Requires Variable Graphics Memory set to Custom: 24GB.  

2= Requires Variable Graphics Memory set to High.

 

Graphics Card

DeepSeek R1 Distill* (Max Supported1)

AMD Radeon™ RX 7900 XTX

DeepSeek-R1-Distill-Qwen-32B

AMD Radeon™ RX 7900 XT

DeepSeek-R1-Distill-Qwen-14B

AMD Radeon™ RX 7900 GRE

DeepSeek-R1-Distill-Qwen-14B

AMD Radeon™ RX 7800 XT

DeepSeek-R1-Distill-Qwen-14B

AMD Radeon™ RX 7700 XT

DeepSeek-R1-Distill-Qwen-14B

AMD Radeon™ RX 7600 XT

DeepSeek-R1-Distill-Qwen-14B

AMD Radeon™ RX 7600

DeepSeek-R1-Distill-Llama-8B

*= AMD recommends running all distills in Q4 K M quantization.

1= Lists the maximum supported distill without partial GPU offload. 

Step 6: On the right-hand side, make sure the “Q4 K M” quantization is selected and click “Download”.

Step 7: Once downloaded, head back to the chat tab and select the DeepSeek R1 distill from the drop-down menu and make sure “manually select parameters” is checked.

Step 8: In the GPU offload layers – move the slider all the way to the max.

Step 9: Click model load.

Step 10: Interact with a reasoning model running completely on your local AMD hardware

You can read all about it in the blog here.



Source link