CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians

Weihang Liu1,4,*, Yuhui Zhong2,*, Yuke Li1,*, Xi Chen1, Jiadi Cui1,5, Honglong Zhang3, Lan Xu1, Xin Lou1,4, Yujiao Shi1, Jingyi Yu1, Yingliang Zhang2,
1ShanghaiTech University, 2DGene, 3Migu Cultural Technology Co.,Ltd, 4GGU Technology Co., Ltd, 5Stereye,
*Indicates Equal Contribution
Teaser

We present CityGo, an explicit and efficient framework for high-fidelity rendering of large-scale urban scenes. By combining proxy buildings, residual Gaussians, and surrounding Gaussians, we enable efficient, high-quality urban scene rendering on lightweight devices for applications such as in-vehicle navigation and aerial perception.

Abstract

Accurate and efficient modeling of large-scale urban scenes is critical for applications such as AR navigation, UAV-based inspection, and smart city digital twins. While aerial imagery offers broad coverage and complements limitations of ground-based data, reconstructing city-scale environments from such views remains challenging due to occlusions, incomplete geometry, and high memory demands. Recent advances like 3D Gaussian Splatting (3DGS) improve scalability and visual quality but remain limited by dense primitive usage, long training times, and poor suitability for edge devices. We propose CityGo, a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for lightweight, photorealistic rendering of urban scenes from aerial perspectives. Our approach first extracts compact building proxy meshes from MVS point clouds, then uses zero-order SH Gaussians to generate occlusion-free textures via image-based rendering and back-projection. To capture high-frequency details, we introduce residual Gaussians placed based on proxy-photo discrepancies and guided by depth priors. Broader urban context is represented by surrounding Gaussians, with importance-aware downsampling applied to non-critical regions to reduce redundancy. A tailored optimization strategy jointly refines proxy textures and Gaussian parameters, enabling real-time rendering of complex urban scenes on mobile GPUs with significantly reduced training and memory requirements. Extensive experiments on real-world aerial datasets demonstrate that our hybrid representation significantly reduces training time, achieving on average $1.4\times$ speedup, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches. Furthermore, CityGo enables real-time rendering of large-scale urban scenes on mobile consumer GPUs, with substantially reduced memory usage and energy consumption.

Method

Pipeline

Overview of our hybrid representation for large-scale urban scenes. We begin by generating dense point clouds from aerial images and initializing zero-order SH Gaussians to capture the entire scene. Buildings and surrounding areas are then segmented and processed separately. For buildings, we adopt a hybrid representation of textured proxy meshes and residual Gaussians, while simplified Gaussians are used for the surroundings. The final model enables photorealistic rendering with significant speedups for cinematic or real-time performance on lightweight devices.

Video Presentation

Gallery

MY ALT TEXT

Visualization of examples rendered using our CityGO models in the Area-H scene. Comparisons are provided to highlight the proxy building and the final rendered results.

MY ALT TEXT

Visualization of examples rendered using our CityGO models in the Area-L scene. Comparisons are provided to highlight the proxy building and the final rendered results.

Experimental Results

Application

BibTeX

@misc{liu2025citygolightweighturbanmodeling,
      title={{CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians}},
      author={Weihang Liu and Yuhui Zhong and Yuke Li and Xi Chen and Jiadi Cui and Honglong Zhang and Lan Xu and Xin Lou and Yujiao Shi and Jingyi Yu and Yingliang Zhang},
      year={2025},
      eprint={2505.21041},
      archivePrefix={arXiv},
      primaryClass={cs.GR},
      url={https://arxiv.org/abs/2505.21041},
}