How To Run Deepseek R1 671b Fully Locally On a $2000 EPYC Server – Digital Spaceport

Posted on February 1, 2025 by oxm6k

Deepseek Ai Rig Build for Local Inference

Let’s start with the good news. I got very solid performance off the same baseline AMD EPYC Rome system that has been at the core of our entire journey 😁 That initial parts selection has remained fantastic! Owners of that system are going to get some great news today also as they can hit between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b full model. This is important as the distilled versions are simply not the same at all. They are vastly inferior and other models out perform them handily. Running the full model, with a 16K or greater context window, is indeed the pathway to the real experience and it is worthwhile. Since this runs decent enough on CPU alone it also can hang out while you run smaller models, like vision models, at the same time. To reiterate, this will not fully run in only GPU VRAM alone unless you have a massive rig. I will show you all the tips and tricks to get this working. It is not “easy” but if you like tech tinkering it is a lot of fun.

Local AI CPU Compute Hardware

If you followed the original build guide for quad 3090s I put out, you are in luck. That 7702 still packs a punch. I will recommend a better CPU as it is now in that same price bracket and performance should be a good deal better, but these results you see are from the 7702 in my machine. The MZ32-AR0 was also a very good board recommendation to start with as well as it lowers the price of hitting 512GB to 1TB of System Ram dramatically with 16 dimm slots that can run at a full 3200 speed. The ram I am running is actually 2400 DDR4 dimms but you would likely get additional performance improvement by going with 3200 speed DDR4 ECC dimms. 16x 32GB dimms gets you to 512 GB. 16x 64GB dimms gets you to 1TB RAM. You cannot mix LRDIMM and RDIMMS!

Local Ai Rig Components

(Prices as of 1/29/2025)
Total Cost: Around $2000 if you use 512GB 2400 RAM and EPYC 7702. I would get the 7C13 or 7V13 vs upgraded speed of RAM. Getting 768GB RAM would be my second choice in upgrade and finally getting 3200 RAM would be the last option I would do. If you price it in at the top with the 7C13 or 7V13 and 1TB DDR4 2400 it’s more like $2500.

Rig Rack Assembly

Assembly is as it was in the prior video still minus GPUs and Risers. If you are going to add in GPUs later, I would recommend getting a 1500w or 1600w PSU up front. The rest all stays the same if you add in GPUs and Risers. You can watch that video from here. Just ignore the GPU parts of this video, the rest is the exact same.

Additionally, you want to get a little fan wall made and zip tied together that blows directly over the RAM sticks to keep them cooler. They will not melt but they will thermally throttle and impact performance negatively as you churn data in them nonstop. I used 4x 80mm little ones.

Motherboard Upgrades Notes

You are better off buying a V3 version of the MZ32-AR0 motherboard vs getting V1 and upgrading it if you are going with the AMD EPYC 7V13. The V1 may not out of the box support a Milan CPU until it is at V3, so you would need a V2 CPU possibly. I can not confirm that but I suspect it is likely. You can upgrade the V1 to the V3 in my experience by using the bios updates to jump a V1 board all the way up to the latest version, then grabbing a V3 early bios update and running that. Then you can update to later V3 bios versions from that page. Current as of time of writing is M23_R40 bios revision.

Local AI Self Hosted Software Setup

This is where things get a bit on the tricky side vs the prior guides I have used. Yes you can deploy Ollama on a bare metal proxmox installation. Should you? Not ideally, no. You have two options at this point and I will show you one of them now. I need to test and see performance impacts before I recommend the other, but running Ollama inside a stand alone LXC or VM is the other option. If you have followed my prior LXC and DOCKER guide, follow along with this but install in a VM is my advice for now. I will be working on a more unified approach to get this all working in our happy little Ai server self contained environment but that will take time.

Install Our Ubuntu 24 on Bare Metal or Proxmox VM?

Basically you should install this on a bare metal Ubuntu 24.04 server base if you want to eliminate extra things and are setting this up new and fresh, or follow the prior proxmox guide. You have to make this call yourself and live with the results. You can install a desktop if you want but its not needed nor will I demonstrate that. You are running services on top of a server, the CLI is not to be feared at all.

Setting up your BMC MZ32-AR0

Connect your MZ32-AR0 ethernet and BMC ports to your local network. If you have a firewall router, like opnsense or pfsense, you can check your ARP for your port to show up. Grab that ip address. In my instance it was https://192.168.1.XX and when I log in it asks for a username and password right off the bat. The default username is admin. The password should be a sticker on YOUR motherboard under the MZ32-AR0 stamp. Here is mine pictured. It is the barcode bearing label. I forget exactly but it’s something like removing the first 3/C/ part and the next 6 or possibly 11 characters are the initial password. When you log in finally, go to