NDSplat
ECCV 2026

Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering

United Imaging Intelligence, Boston, MA, USA
500×
Faster prep
2.8 s
CT → render
328+
FPS real-time
Render-FM pipeline comparison

Pipeline comparison: (a) 6DGS requires per-scan optimization (≈1 hour) and shows artifacts under sparse views; (b) our Render-FM produces high-quality renderings via a single 2.8 s feedforward pass.

Abstract

Photorealistic volumetric rendering of CT scans greatly benefits clinical workflows, yet neural approaches such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) require prohibitive per-scan optimization (hours for NeRF, about 30 minutes for 3DGS), making them impractical in clinical settings. We propose Render-FM, a feedforward model that eliminates this bottleneck by directly regressing 6D Gaussian Splatting (6DGS) parameters from a CT volume in a single 2.8-second forward pass, a 500× speedup over per-scan optimization.

To bridge the domain gap between natural scene reconstruction and medical volumetric rendering, we introduce Anatomy-Guided Priming (AGP), which incorporates segmentation masks and transfer functions as structural and appearance priors, information that existing Gaussian splatting methods overlook. Built on an nnU-Net-inspired 3D U-Net trained on diverse CT scans, Render-FM predicts per-voxel 6DGS parameters and supports immediate real-time rendering.

Unlike per-scan methods, Render-FM generalizes to unseen anatomies, novel transfer functions, and enables compositional organ visualization with zero additional preparation time. Optional 89-second fine-tuning further improves quality, surpassing per-scan optimized baselines.

Interactive Demos on a Laptop

Render-FM runs interactively on a single laptop RTX 2000 Ada GPU.

Quantitative Results

Comparison on TotalSegmentator (in-domain) and CT-ORG (out-of-domain). Render-FM matches or beats per-scan optimized 6DGS while reducing preparation time by two to three orders of magnitude.

SettingMethod SSIM↑PSNR↑LPIPS↓Time↓
TotalSeg
ID, Seen TF
6DGS0.91226.630.0961463.9 s
6DGS + AGP (Ours)0.92528.920.0931786.5 s
Render-FM (Ours)0.91927.300.0972.8 s
Render-FM + FT (Ours)0.93731.670.08889.4 s
CT-ORG
OOD, Seen TF
6DGS0.90325.970.1051528.7 s
6DGS + AGP (Ours)0.92629.360.0912261.9 s
Render-FM (Ours)0.91826.210.0922.6 s
Render-FM + FT (Ours)0.94032.480.082136.2 s

Bold = best in block. Full results (Unseen TF, compositional Skeleton group, CTPelvic1K) are in the paper.

Qualitative Comparison

Qualitative comparison of 6DGS 6DGS + AGP (Ours) Render-FM (Ours) Render-FM + FT (Ours) Ground Truth. Drag the dividers to compare. Results are under a sparse-view setting of 20 views for training 6DGS or fine-tuning Render-FM.

BibTeX

@inproceedings{gao2026renderfm,
  title     = {Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering},
  author    = {Gao, Zhongpai and Planche, Benjamin and Zheng, Meng and
               Choudhuri, Anwesa and Nguyen, Van Nguyen and Chen, Terrence and Wu, Ziyan},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2026}
}