Real-time interactive ocean ship-wave simulation

Together with Force Technology we are developing a ship-hydrodynamic model for real-time ship-wave interaction to be used in a full mission marine simulator. The hydrodynamic model is based on unified potential flow theory, linear or non-linear free surface boundary condition and higher-order accurate numerical approximations. Multiple many-core graphical processing units (GPUs) are used for parallel execution and the model is implemented using a combination of C/C++, CUDA and MPI.

Read more about the massively parallel OceanWave3D modeling strategy which is used as the basis here.

Auto-tuning of Dense Linear Algebra on GPUs

We have implemented an auto-tuning framework that can automate the performance tuning process by running a large set of empirical evaluations to configure applications and libraries on the targeted GPU platform. Preliminary work is focused on dense vector and matrix-vector operations, which form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific applications. As an example, we develop a single-precision CUDA kernel for the matrix-vector multiplication (SGEMV). The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture). Our tuned kernels display significantly better performance than the current CUBLAS v.3.2 library.

This teaser was part of a poster for the 2011 Model Based Control conference, at the Technical University of Denmark.

GPUlab Library – a High-performance GPU-based Library for the Development of Scientific Applications

We have an ongoing development of a GPU-based generic C++ library for scientific computing. The two main goals are to create a common playground for the developers at the section and interested network contacts, and to keep an up-to-date platform containing the latest results from the developers. We now have several components for solving large scale partial differential equations. However, this should not be a limitation and we soon expect to have show-cases with dynamic optimization and model control problems as well. In the future we seek to expand the library into a fully distributed tool, in order to achieve maximum performance on cluster-based hardware systems.

This teaser was part of a poster for the 2011 Model Based Control conference, at the Technical University of Denmark.

Accelerating Economic Model Predictive Control using GPUs

As stochastic energy production such as wind becomes more common, it is necessary to either store the energy for later consumption or control the energy consumption to coincide with the energy production. One method to address this problem is the Smart Grid, where Model Predictive Control can be used to optimize energy consumption to match with the predicted stochastic energy production and minimize the cost of energy production from conventional power plants. This can be formulated as a convex optimization problem and solved using primal-dual interior-points methods. The main computational tasks in such a method are matrix-matrix multiplications and Cholesky factorization, both of which are very suitable for GPU acceleration. Initial results of a test case controlling two power plants to match energy consumption show an speed-up of up to ~25 using a Nvidia Tesla C2050 compared to a sequential CPU version running on a Intel i7-920.

This teaser was part of a poster for the 2011 Model Based Control conference, at the Technical University of Denmark.

Glare from 100s of Navigation Lights in a Real-Time Ship Simulator

Particle diffraction in the human eye causes us to see a glare pattern when observing bright light sources in dark surroundings. The appearance of navigation lights observed from the bridge of a ship at night is highly influenced by the glare phenomenon. Thus it is important to simulate glare in a ship simulator used for training ship navigators. Based on our previous work in glare simulation (see below), a method was developed that simulates glare from navigation lights in different lighting environments (see image) and at different distances from the observer. Using an NVIDIA Quadro 600 GPU, we are able to simulate glare from 1000 visible navigation lights in only 11.3 milliseconds.

A Master's thesis on this topic, which we carried out in collaboration with FORCE Technology who develop such ship simulators, is available here.

Computing the Appearance of Teeth in Real-Time on the GPU

New scanners have been developed that enable fast in-clinic dental impression scanning. Such scanners capture the surface geometry of teeth and eliminate the need for difficult and time consuming plaster casts. We have developed a method that computes the appearance of teeth and dentures even if only surface scans are available. With this rendering tool, the dentist can talk to customers about options with respect to crowns or bridges shortly after scanning the oral cavity. Using an NVIDIA GTX 580 GPU, our method runs at 20-30 frames per second when rendering a full set of teeth (see image, top row). With respect to quality, we can fairly closely match the appearance of high quality crowns produced by denturists (see image, bottom row).

A paper published on this topic in collaboration with 3Shape who develop scanners for fast in-clinic dental impression scanning is available here [PDF].

CGLS Solver for ODF Reconstruction

Performance of a linear, iterative Conjugate Gradient Least Squares (CGLS) solver is here plotted against different grid sizes with 10 million rows. Solid lines are single precision and dashed lines are double precision. The test with a Tesla C2050 GPU was done on a 1x Core i7 920 system, while the remaining tests were done on a 2x Xeon E5540 system. PCGLS uses the CPUs for both solving and ray tracing, while PCGLS + CUDA uses the CPUs for solving and the GPUs for ray tracing. CUDA CGLS uses the GPUs for both ray tracing and most of the solving.

This solver was developed as part of a toolbox containing different ray tracing algorithms and linear solvers for high-performance ODF (Orientation Distribution Function) reconstructions on both multi-core and many-core systems. The ODF is used for X-ray analysis of material properties. The toolbox is capable of off-loading most of the reconstruction to a GPU, as well as multiple GPUs, while still using the CPUs. This ensures that all resources are used efficiently thus minimizing the reconstruction time.

A Master's thesis describing this work which was done in GPUlab and at Risø DTU is available here.

Smoke Simulation

In this project we used the powers of the GPU to solve the Navier-Stokes equations. These PDEs were used in the incompressible form to describe the flow of smoke inside a closed domain. Dividing the domain into individual parts using a collocated grid, fits the parallel programming model very well. At each simulation step, the velocity of every cell is updated according to the N-S equations. Solving these equations requires solving large sparse linear systems, which we accomplished using an iterative multigrid approach. Considerable time improvements were achieved while accuracy was still acceptable compared to reference materials.

Work on this subject in which one of us was primary investigator is available here.

Fast Simulation of Unsteady Nonlinear Ocean Water Waves in Three Space Dimensions

In this ongoing project, we are exploring the potential for fast simulation of Nonlinear and Dispersive Ocean water waves in three space dimensions at large-scales using Graphics Processing Units (GPUs) dedicated to scientific computations. The numerical algorithm for water wave simulations is based on a state-of-the-art flexible-order finite difference method employed for the OceanWave3D model. The algorithm requires the solution of a large unsymmetric and very sparse linear system of equations to be solved every time step in order to advance the solution in time. By the use of an explicit discretization strategy combined with a iterative solution method for the involved linear system the algorihtm should be very suitable for parallel processing. The experiences gained in this project will be broadly applicable, since many other models in Science and Engineering are likewise based on the use of finite difference methods. Focus is on designing algorithms that are massively parallel and can achieve a high effective arithmetic throughput.

Work on this subject in which one of us was primary investigator is available here.


The GPU is extremely efficient at computing the Discrete Fourier Transform. Because of its roots in graphics, it is particularly well-suited for working with 4-vectors. This means that two DFTs in parallel is a very good idea (two complex numbers in each 4-vector). If we implement two parallel FFTs in GLSL using the classic Cooley-Tukey algorithm, we compute the 2D FFT of an 512x512 RGB image in 2 milliseconds using only a laptop GPU (NVIDIA Quadro FX 1600M). The FFT is also implemented in the CUDA library called CUFFT.

Source code is available here.

GPU Convolution

Using the GPU FFT implementation, we can easily implement convolution on the GPU. Here the two parallel FFTs become a great advantage since we can compute the FFT of corresponding colour bands in parallel and multiply them immediately after. We need nine 2D FFTs to convolve two RGB images. The image to the right illustrates how we can use convolution to reduce the high-frequency noise which often appears in images computed using classic Monte Carlo path tracing. The GPU is also used, in this example, to compute a filter image procedurally. Convolution of two 512x512 RGB images takes only 5.5 milliseconds using a laptop GPU (NVIDIA Quadro FX 1600M).

Source code is available here.

Real-Time Glare on the GPU

Particle diffraction in the human eye causes us to see a glare pattern when observing bright light sources in dark surroundings. For people with a pathological eye condition, glare often increases and may cause trouble when driving at night, for example. Glare simulation can help estimating the problems that such patients endure and help us develop tools (such as carefully engineered shades) that reduce the problems. Results in Fourier optics enable us to compute the glare pattern around a bright source using the FFT and convolution. With the GPU implementations, we can compute glare in real-time (several frames per second).

Work on this subject which one of us has co-authored is available here.

Asmussens Alle DTU - Building 321 DK-2800 Lyngby Tel +45 4525 3351 EAN 5798000430204