GitHub – deepseek-ai/DualPipe


DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the profile data.

schedules

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions.
The micro-batches in the reverse direction are symmetric to those in the forward direction, so
we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border
have mutually overlapped computation and communication

Pipeline Bubbles and Memory Usage Comparison

Method Bubble Parameter Activation
1F1B (PP-1)(𝐹+𝐡) 1Γ— PP
ZB1P (PP-1)(𝐹+𝐡-2π‘Š) 1Γ— PP
DualPipe (PP/2-1)(𝐹&𝐡+𝐡-3π‘Š) 2Γ— PP+1

𝐹 denotes the execution time of a forward chunk, 𝐡 denotes the execution time of a
full backward chunk, π‘Š denotes the execution time of a “backward for weights” chunk, and 𝐹&𝐡
denotes the execution time of two mutually overlapped forward and backward chunks.

The usage is shown in the following example:

Note: For real-world applications, you will need to implement a custom overlapped_forward_backward method tailored to your specific module.

DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}



Source link