CPU and GPU Parallelization of Sparsity Driven Despeckling in SAR Images

Özcan C., Şen B. , Nar F.

IRSC’16 International Remote Sensing Conference, Ar Riyad, Saudi Arabia, 17 - 20 January 2016

  • Publication Type: Conference Paper / Summary Text
  • City: Ar Riyad
  • Country: Saudi Arabia


SAR remote sensing systems provide high-resolution images of the earth's surface as seen from airborne or satellite platforms. SAR imaging has a growing interest in remote sensing applications in many fields with the ability to generate high-resolution radar image regardless of weather conditions and sunlight illumination. However, due to coherent processing of received signals, SAR images are degraded by speckle, which is a specific kind of multiplicative noise. Speckle noise is a signal dependent in nature and causes difficulties for various image interpretation tasks such as target detection, change detection, segmentation, edge detection, and classification. Speckle noise in SAR should be reduced while preserving important details such as edges, textures, and point scatterers in order to find the discriminative features needed especially in remote sensing applications. Despeckling is generally used as a first step in SAR image analysis and many different algorithms are proposed for that purpose. However, in the literature, generally there is a tradeoff between computational load and speckle reduction quality for the proposed methods.


In this paper, we present the implementation details and CPU/GPU parallelization of Sparsity Driven Despeckling (SDD) method which we proposed before for speckle reduction in SAR images. SDD proposes a sparsity driven total variation (TV) approach employing l0-norm, fractional norm, or l1-norm in order to smooth homogeneous regions with minimal degradation in edges and point scatterers. An observation image corrupted by speckle noise is taken into consideration and SAR image despeckling problem is defined as the optimization problem with minimization of the proposed cost function. In order to use convex optimization methods, corresponding cost function is approximated and written in a matrix-vector form. This matrix-vector form enables a special iterative optimization method where a linear system is solved in each step. Linear system which consists of positive definite symmetric matrix is solved efficiently using preconditioned conjugate gradient (PCG) iterative solver. PCG with IC preconditioner is implemented single threaded since construction of IC preconditioner, lower and upper Gaussian eliminations are not efficiently parallelizable. PCG with Jacobi preconditioner is parallelized using OpenMP on CPU and CUDA on GPU since all employed operations are efficiently parallelizable. All step of the preconditioned conjugate gradient algorithm is also parallelized using OpenMP and CUDA. For the implementation of the PCG algorithm, new strategies were developed to minimize the data transfer between CPU and GPU. We used tiling approach for large images since PCG consumes significant amount of memory and does not fit into GPU memory for large images. Therefore, in GPU parallelization streams which have an ability to perform multiple CUDA operations simultaneously are employed. Dynamic parallelism that enables a CUDA kernel to create and synchronize new nested work is designed. We realized our GPU studies with two versions of the developed method with and without dynamic parallelization. We performed our test studies with Nvidia GeForce and Nvidia Tesla GPUs and we also used Nvidia Jetson TK1 development kit for mobile platforms.


In consequence, the results show that our method provides extremely fast (near-real-time) and high-quality SAR despeckling. Despeckling performance, execution time and memory consumption of the proposed method are shown with detailed tables and figures using synthetic images and real-world SAR images. Various SDD implementations for CPU and GPU are tested and information and suggestions are given so that reader can pick the right implementation strategy for a specific need.