#
RESEARCH

#
Real-time
interactive
ocean ship-wave simulation

Together with Force Technology we are developing a ship-hydrodynamic model for
real-time
ship-wave
interaction
to be
used
in a
full
mission
marine
simulator. The
hydrodynamic
model
is
based
on
unified
potential flow
theory, linear or non-linear free surface boundary condition and higher-order accurate
numerical approximations. Multiple many-core graphical processing units (GPUs) are used for parallel
execution and the model is implemented using a combination of C/C++, CUDA and MPI.

Read
more
about
the
massively parallel
OceanWave3D
modeling
strategy
which
is
used
as the
basis here.

#
Auto-tuning of Dense Linear Algebra on GPUs

We have implemented an auto-tuning framework that can automate the performance tuning
process by running a large set of empirical evaluations to configure applications
and libraries on the targeted GPU platform. Preliminary work is focused on dense
vector and matrix-vector operations, which form the backbone of level 1 and level
2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore
of great importance in many scientific applications. As an example, we develop a
single-precision CUDA kernel for the matrix-vector multiplication (SGEMV). The target
hardware is the most recent Nvidia Tesla 20-series (Fermi architecture). Our tuned
kernels display significantly better performance than the current CUBLAS v.3.2 library.

This teaser was part of a poster for the 2011
Model Based Control conference, at the Technical University of Denmark.

# GPUlab Library – a High-performance GPU-based Library for the Development of Scientific Applications

We have an ongoing development of a GPU-based generic C++ library for scientific
computing. The two main goals are to create a common playground for the developers
at the section and interested network contacts, and to keep an up-to-date platform
containing the latest results from the developers. We now have several components
for solving large scale partial differential equations. However, this should not
be a limitation and we soon expect to have show-cases with dynamic optimization
and model control problems as well. In the future we seek to expand the library
into a fully distributed tool, in order to achieve maximum performance on cluster-based
hardware systems.

This teaser was part of a poster for the 2011 Model Based Control conference, at the Technical University of Denmark.

# Accelerating Economic Model Predictive Control using GPUs

As stochastic energy production such as wind becomes more common, it is necessary
to either store the energy for later consumption or control the energy consumption
to coincide with the energy production. One method to address this problem is the
Smart Grid, where Model Predictive Control can be used to optimize energy consumption
to match with the predicted stochastic energy production and minimize the cost of
energy production from conventional power plants. This can be formulated as a convex
optimization problem and solved using primal-dual interior-points methods. The main
computational tasks in such a method are matrix-matrix multiplications and Cholesky
factorization, both of which are very suitable for GPU acceleration. Initial results
of a test case controlling two power plants to match energy consumption show an
speed-up of up to ~25 using a Nvidia Tesla C2050 compared to a sequential CPU version
running on a Intel i7-920.

This teaser was part of a poster for the 2011 Model Based Control conference, at the Technical University of Denmark.

# Glare from 100s of Navigation Lights in a Real-Time Ship Simulator

Particle diffraction in the human eye causes us to see a glare pattern when observing
bright light sources in dark surroundings. The appearance of navigation lights
observed from the bridge of a ship at night is highly influenced by the glare phenomenon.
Thus it is important to simulate glare in a ship simulator used for training ship navigators.
Based on our previous work in glare simulation (see below), a method was developed that simulates
glare from navigation lights in different lighting environments (see image) and at different distances
from the observer. Using an NVIDIA Quadro 600 GPU, we are able to simulate glare from 1000 visible navigation
lights in only 11.3 milliseconds.

A Master's thesis on this topic, which we carried out in collaboration with
FORCE Technology who develop such ship simulators,
is available here.

# Computing the Appearance of Teeth in Real-Time on the GPU

New scanners have been developed that enable fast in-clinic dental impression scanning.
Such scanners capture the surface geometry of teeth and eliminate the need for difficult
and time consuming plaster casts. We have developed a method that computes the appearance
of teeth and dentures even if only surface scans are available. With this rendering tool,
the dentist can talk to customers about options with respect to crowns or bridges
shortly after scanning the oral cavity. Using an NVIDIA GTX 580 GPU,
our method runs at 20-30 frames per second when rendering a full set of teeth (see image, top row).
With respect to quality, we can fairly closely match the appearance of high quality crowns
produced by denturists (see image, bottom row).

A paper published on this topic in collaboration with
3Shape who develop scanners for fast in-clinic dental impression scanning
is available here
[PDF].

# CGLS Solver for ODF Reconstruction

Performance of a linear, iterative Conjugate Gradient Least Squares (CGLS) solver
is here plotted against different grid sizes with 10 million rows. Solid lines are
single precision and dashed lines are double precision. The test with a Tesla C2050
GPU was done on a 1x Core i7 920 system, while the remaining tests were done on
a 2x Xeon E5540 system. PCGLS uses the CPUs for both solving and ray tracing, while
PCGLS + CUDA uses the CPUs for solving and the GPUs for ray tracing. CUDA CGLS uses
the GPUs for both ray tracing and most of the solving.

This solver was developed as part of a toolbox containing different ray tracing
algorithms and linear solvers for high-performance ODF (Orientation Distribution
Function) reconstructions on both multi-core and many-core systems. The ODF is used
for X-ray analysis of material properties. The toolbox is capable of off-loading
most of the reconstruction to a GPU, as well as multiple GPUs, while still using
the CPUs. This ensures that all resources are used efficiently thus minimizing the
reconstruction time.

A Master's thesis describing this work which was done in GPUlab and at Risø
DTU is available
here.

# Smoke Simulation

In this project we used the powers of the GPU to solve the Navier-Stokes equations.
These PDEs were used in the incompressible form to describe the flow of smoke inside
a closed domain. Dividing the domain into individual parts using a collocated grid,
fits the parallel programming model very well. At each simulation step, the velocity
of every cell is updated according to the N-S equations. Solving these equations
requires solving large sparse linear systems, which we accomplished using an iterative
multigrid approach. Considerable time improvements were achieved while accuracy
was still acceptable compared to reference materials.

Work on this subject in which one of us was primary investigator is available
here.

# Fast Simulation of Unsteady Nonlinear
Ocean Water Waves in Three Space Dimensions

In this ongoing project, we are exploring the
potential for fast simulation of Nonlinear and
Dispersive Ocean
water waves in three space dimensions at large-scales using Graphics Processing
Units (GPUs) dedicated to scientific computations. The numerical algorithm for water
wave simulations is based on a state-of-the-art flexible-order finite difference
method employed for the OceanWave3D model. The algorithm requires the solution of
a large unsymmetric and very sparse linear system of equations to be solved every
time step in order to advance the solution in time. By the use of an explicit discretization
strategy combined with a iterative solution method for the involved linear system
the algorihtm should be very suitable for parallel processing. The experiences gained
in this project will be broadly applicable, since many other models in Science and
Engineering are likewise based on the use of finite difference methods. Focus is
on designing algorithms that are massively parallel and can achieve a high effective
arithmetic throughput.

Work on this subject in which one of us was primary investigator is available here.

# GPU FFT

The GPU is extremely efficient at computing the Discrete Fourier Transform. Because
of its roots in graphics, it is particularly well-suited for working with 4-vectors.
This means that two DFTs in parallel is a very good idea (two complex numbers in
each 4-vector). If we implement two parallel FFTs in GLSL using the classic Cooley-Tukey
algorithm, we compute the 2D FFT of an 512x512 RGB image in 2 milliseconds using
only a laptop GPU (NVIDIA Quadro FX 1600M). The FFT is also implemented in the CUDA
library called
CUFFT.

Source code is available here.

# GPU Convolution

Using the GPU FFT implementation, we can easily implement convolution on the GPU.
Here the two parallel FFTs become a great advantage since we can compute the FFT
of corresponding colour bands in parallel and multiply them immediately after. We
need nine 2D FFTs to convolve two RGB images. The image to the right illustrates
how we can use convolution to reduce the high-frequency noise which often appears
in images computed using classic Monte Carlo path tracing. The GPU is also used,
in this example, to compute a filter image procedurally. Convolution of two 512x512
RGB images takes only 5.5 milliseconds using a laptop GPU (NVIDIA Quadro FX 1600M).

Source code is available here.

# Real-Time Glare on the GPU

Particle diffraction in the human eye causes us to see a glare pattern when observing
bright light sources in dark surroundings. For people with a pathological eye condition,
glare often increases and may cause trouble when driving at night, for example.
Glare simulation can help estimating the problems that such patients endure and
help us develop tools (such as carefully engineered shades) that reduce the problems.
Results in Fourier optics enable us to compute the glare pattern around a bright
source using the FFT and convolution. With the GPU implementations, we can compute
glare in real-time (several frames per second).

Work on this subject which one of us has co-authored is available
here.