ScaleGS: Scalable Distributed Framework for Large-Scale 3D Gaussian Splatting with Edge Communication

Yong Kou1, Jinlong He1, Xia Yuan1, 2, Dening Luo2, Yanci Zhang1, *

1Sichuan University, 2Chengdu University of Information Technology

1Merged City Reconstruction       2Merged City Reconstruction

The code will be released soon.

ScaleGS (Ours) achieves perfect rendering capability on the large-scale MatrixCity dataset (5621 images, 1920×1080 resolution) under distributed training using 8 Tesla P40 GPUs and a batch size of 4.

Merged City Reconstruction

On the large-scale MatrixCity and high-resolution Rubble scenes, ScaleGS (Ours) outperforms the state-of-the-art method in performance. We theoretically prove that our framework can achieve \(O(1)\) cross-GPU communication complexity for nearly all GPUs in typical scenes.

Abstract

3D Gaussian Splatting (3DGS) has recently demonstrated outstanding performance in 3D reconstruction. However, its scalability to large scenes remains limited by single-GPU memory constraints. We propose ScaleGS, a scalable distributed training framework for large-scale 3DGS with constant-degree cross-GPU communication. (1) We first present a median-guided binary partitioning algorithm and pixel-tile parallelism to reduce memory pressure on a single GPU. To address the boundary artifacts caused by partitioning, we introduce an autonomous partition growth mechanism that maintains global Gaussian uniqueness and cross-GPU parameter synchronization. (2) To resolve the scalability challenges, we design a greedy GPU-Tile remapping strategy based on pixel-tile parallelism to achieve \(O(1)\) cross-GPU communication complexity for nearly all GPUs in representative scenes. (3) Our framework finally introduces adaptive load balancing that periodically monitors workloads and efficiently migrates Gaussians between neighboring GPUs with negligible overhead. Evaluations show that ScaleGS outperforms state-of-the-art methods, achieving up to 20% faster training and approximately 20% model size reduction on 8 Tesla P40 GPUs without compromising reconstruction quality.

Performance Evaluation

Merged City Reconstruction 1
Merged City Reconstruction 2

On the high-resolution Rubble scene, our method achieves a 20% reduction in training time compared to state-of-the-art approaches, while preserving high rendering fidelity. Although constrained by current hardware resources, we are unable to empirically validate the full extent of scalability. Nonetheless, our analysis demonstrates that the training efficiency of the proposed framework is theoretically independent of the number of GPUs, implying that its advantages are expected to scale favorably with increased computational capacity.

Boundary-Artifact-Free

Merged City Reconstruction

(a) The entire scene is divided into eight subregions, each processed by a different GPU. We render each GPU’s trained model as a separate image, showing that each subregion is contiguous, and there are no overlapping regions. (b) Merging and rendering the eight models without any post-processing produces a complete scene with no boundary artifacts.

More Experiments

1. ScaleGS rendering demonstration on the MatrixCity dataset.

2. ScaleGS rendering demonstration on the MatrixCity dataset.

3. ScaleGS rendering demonstration on the 4K Rubble dataset.

4. ScaleGS rendering demonstration on the 4K Rubble dataset.

BibTeX

BibTex Code Here