Volt: Interactive Volume Rendering with CUDA

March 21st, 2013

Volt: Interactive Volume Rendering with CUDA

Abstract

Real-time high quality direct volume rendering is a computational intense task. Clusters for parallel rendering, specialized volume rendering hardware or the reconstruction of isosurfaces were the main approaches in the past to reach interactive framerates. Emerged GPGPU techniques like Slicing are not able to deliver the same quality. The CUDA architecture enables fine grained access to massively parallel graphics hardware. Volt is an interactive direct volume renderer that takes advantage of these devices’ properties for high quality interactive ray casting.

Volt: CUDA Volume Rendering

The Volt volume rendering application window showing the VisMale dataset

Volt application window and the VisMale dataset

Visualization algorithm

Volt rendering of the Bonsai-H1 volume

The Bonsai-H1 volume

Emitted radiation is accumulated in a front-to-back manner by the approximative integration of eye rays, whilst opacity is taken into account for early ray termination. Post-classification of samples reconstructed by trilinear interpolation is used in the shading process. The transfer function uses opacity correction of colors based on the integration step width. The colors are associated with opacities to avoid color bleeding. The approximation of surface normals is used for Blinn-Phong Shading and damping of homogeneous regions. Start and end points on eye rays are computed with a Kay & Kajiya Slab-Method bounding box test.

Architecture

Volt rendering of the Engine volume

The Engine volume

A CUDA thread matches to one ray. High framerates are achieved by the fine grained partitioning of data and computation. The graphical user interface, work package generation and rendering process are executed in parallel. The configuration data is partitioned according to its needs for updates, thus the transmission of unchanged data is reduced. The host supports each kernel call by precomputing all expressions which yield universal results for all rays. Various expensive divisions and exponentiations on the device are spared. The asynchronous kernel only executes computations which depend on the first ray and result in a pixel’s color.

Optimizations

Volt rendering of the Porsche volume

The Porsche volume

The special function units are used to interpolate the volume data and the transfer function lookup table. Configuration parameters are stored in constant memory as it is cached and broadcast access can be guaranteed. The quadratic arrangement of threads in warps increases performance because of less tight requirements for coalesced operations on global memory with recent hardware. The improved locality results in more cache hits and less diverging execution paths. Rarely used thread-private variables of a warp are stored in successive banks of shared memory. Register usage is also optimized by prohibiting inlining for some functions and by abandoning CUDA vector structures and dynamic array indexation where possible. Loops are unrolled manually if the compiler can’t do this because of conditional statements. Float types and intrinsic functions are preferred in kernel code.

Results

Each CUDA thread uses 32 registers and 44 bytes of local memory. A block width of 4 and a height of 64 yields 256 threads in each block and the kernel allocates 5120 bytes of shared memory and 352 bytes of constant memory. Hence pipeline hazards and operations on off-chip memory are both sufficiently hidden. If compiled without shading, only 14 registers are used. The results are based on an integration step width of 0.49 times the shortest voxel edge. The function values [0,40] are assigned an alpha of 0 for all measurements. Values [41,255] are assigned an alpha of 15 for low, 63 for medium and 255 for high opacity. The integration of a ray is stopped by early ray termination if its saturation exceeds 95% and it is globally limited by the bounding box. All used datasets are kindly made available by [Röt06].

Volt benchmark image with low opacity

Low opacity

Volt benchmark image with medium opacity

Medium opacity

Volt benchmark image with high opacity

High opacity

[Röt06] Röttger S.: The Volume Library. Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany. Online, 2006.
Comments are closed.