Regularized Deep Networks

Strengthening AI With Domain Knowledge

Research Highlights of
Information Processing & Algorithms Laboratory


The Information Processing and Algorithms Laboratory (iPAL) is directed by Prof. Vishal Monga. Graduate research in iPAL broadly encompasses signal and image processing theory and applications with a particular focus on capturing practical real-world constraints via convex optimization theory and algorithms.


GlideNet Architecture
GlideNet for Multi-Category Attributes Prediction (CVPR 2022)

Attaching attributes (such as color, shape, state, action) to object categories is an important computer vision problem. Attribute prediction has seen exciting recent progress and is often formulated as a multi-label classification problem. Yet significant challenges remain in: 1) predicting a large number of attributes over multiple object categories, 2) modeling category-dependence of attributes, 3) methodically capturing both global and local scene context, and 4) robustly predicting attributes of objects with low pixel-count. To address these issues, we propose a novel multi-category attribute prediction deep architecture named GlideNet, which contains three distinct feature extractors. A global feature extractor recognizes what objects are present in a scene, whereas a local one focuses on the area surrounding the object of interest. Meanwhile, an intrinsic feature extractor uses an extension of standard convolution dubbed Informed Convolution to retrieve features of objects with low pixel-count utilizing its binary mask. GlideNet then uses gating mechanisms with binary masks and its self-learned category embedding to combine the dense embeddings. Collectively, the Global-Local-Intrinsic blocks comprehend the scene's global context while attending to the characteristics of the local object of interest. The architecture adapts the feature composition based on the category via category embedding. Finally, using the combined features, an interpreter predicts the attributes, and the length of the output is determined by the category, thereby removing unnecessary attributes. GlideNet can achieve compelling results on two recent and challenging datasets -- VAW and CAR -- for large-scale attribute prediction. For instance, it obtains more than 5\% gain over state of the art in the mean recall (mR) metric. GlideNet's advantages are especially apparent when predicting attributes of objects with low pixel counts as well as attributes that demand global context understanding. Finally, we show that GlideNet excels in training starved real-world scenarios.

Robust Deep 3D Blood Vessel Segmentation Using Structural Priors

Deep learning has enabled significant improvements in the accuracy of 3D blood vessel segmentation. Open challenges remain in scenarios where labeled 3D segmentation maps for training are severely limited, as is often the case in practice, and in ensuring robustness to noise. Inspired by the observation that 3D vessel structures project onto 2D image slices with informative and unique edge profiles, we propose a novel deep 3D vessel segmentation network guided by edge profiles. Our network architecture comprises a shared encoder and two decoders that learn segmentation maps and edge profiles jointly. 3D context is mined in both the segmentation and edge prediction branches by employing bidirectional convolutional long-short term memory (BCLSTM) modules. 3D features from the two branches are concatenated to facilitate learning of the segmentation map. As a key contribution, we introduce new regularization terms that: a) capture the local homogeneity of 3D blood vessel volumes in the presence of biomarkers; and b) ensure performance robustness to domain-specific noise by suppressing false positive responses. Experiments on benchmark datasets with ground truth labels reveal that the proposed approach outperforms state-of-the-art techniques on standard measures such as DICE overlap and mean Intersection-over-Union. The performance gains of our method are even more pronounced when training is limited. Furthermore, the computational cost of our network inference is among the lowest compared with state-of-the-art.

2021 NTIRE-CVPR Runner Up
Physically Inspired Dense Fusion Networks for Relighting

The project proposes a model which enriches neural networks with physical insight. More precisely, the proposed method generates the relighted image with new illumination settings via two different strategies and subsequently fuses them using a weight map. It outperforms many state-of-the-art algorithms.

Structural Prior Driven Regularized Deep Learning For Sonar Image Classification

Deep learning has been recently shown to improve performance in the domain of synthetic aperture sonar (SAS) image classification. Given the constant resolution with range of a SAS, it is no surprise that deep learning techniques perform so well.Despite deep learning's recent success, there are still compelling open challenges in reducing the high false alarm rate and enabling success when training imagery is limited, which is a practical challenge that distinguishes the SAS classification problem from standard image classification set-ups where training imagery may be abundant. We address these challenges by exploiting prior knowledge that humans use to grasp the scene. These include unconscious elimination of the image speckle and localization of objects in the scene. We introduce a new deep learning architecture which incorporates these priors with the goal of improving automatic target recognition (ATR) from SAS imagery. Our proposal -- called SPDRDL, Structural Prior Driven Regularized Deep Learning -- incorporates the previously mentioned priors in a multi-task convolutional neural network (CNN) and requires no additional training data when compared to traditional SAS ATR methods. Two structural priors are enforced via regularization terms in the learning of the network: (1) structural similarity prior -- enhanced imagery (often through despeckling) aids human interpretation and is semantically similar to the original imagery and (2) structural scene context priors -- learned features ideally encapsulate target centering information; hence learning may be enhanced via a regularization that encourages fidelity against known ground truth target shifts (relative target position from scene center). Experiments on a challenging real-world dataset reveal that SPDRDL outperforms state-of-the-art deep learning and other competing methods for SAS image classification.

Iterative, Deep Synthetic Aperture Sonar Image Segmentation

Synthetic aperture sonar (SAS) systems produce high-resolution images of the seabed environment. Moreover, deep learning has demonstrated superior ability in finding robust features for automating imagery analysis. However, the success of deep learning is conditioned on having lots of labeled training data, but obtaining generous pixel-level annotations of SAS imagery is often practically infeasible. This challenge has thus far limited the adoption of deep learning methods for SAS segmentation. Algorithms exist to segment SAS imagery in an unsupervised manner, but they lack the benefit of state-of-the-art learning methods and the results present significant room for improvement. In view of the above, we propose a new iterative algorithm for unsupervised SAS image segmentation combining superpixel formation, deep learning, and traditional clustering methods. We call our method Iterative Deep Unsupervised Segmentation (IDUS). IDUS is an unsupervised learning framework that can be divided into four main steps: 1) A deep network estimates class assignments. 2) Low-level image features from the deep network are clustered into superpixels. 3) Superpixels are clustered into class assignments (which we call pseudo-labels) using k-means. 4) Resulting pseudo-labels are used for loss backpropagation of the deep network prediction. These four steps are performed iteratively until convergence. A comparison of IDUS to current state-of-the-art methods on a realistic benchmark dataset for SAS image segmentation demonstrates the benefits of our proposal even as the IDUS incurs a much lower computational burden during inference (actual labeling of a test image). Because our design combines merits of classical superpixel methods with deep learning, practically we demonstrate a very significant benefit in terms of reduced selection bias, i.e. IDUS shows markedly improved robustness against the choice of training images. Finally, we also develop a semi-supervised (SS) extension of IDUS called IDSS and demonstrate experimentally that it can further enhance performance while outperforming supervised alternatives that exploit the same labeled training imagery.

2020 NTIRE-CVPR Challenge Selection
NonLocal Channel Attention For NonHomogeneous Image Dehazing

This project proposed a a novel network for dehazing for challenging benchmark image datasets of NTIRE'20 and NTIRE'18. The proposed networks `AtJwD' can outperform state-of-the-art alternatives, especially when recovering images corrupted by non-homogeneous haze.

2020 NTIRE-CVPR Challenge Selection
Ensemble Dehazing Networks For Non-Homogeneous Haze

A DenseNet based dehazing network focusing on the recovery of images corrupted with non-homogeneous haze by utilizating a weighted combination of outputs from different decoders.

Group Based Deep Shared Feature Learning for Fine-grained Image Classification

Fine-grained image classification has emerged as a significant challenge because objects in such images have small inter-class visual differences but with large variations in pose, lighting, and viewpoints, etc. We present a new deep network architecture that explicitly models shared features and removes their effect to achieve enhanced classification results. Experiments on benchmark datasets show that GSFL-Net can enhance classification accuracy over the state of the art with a more interpretable architecture.

Deep Retinal Image Segmentation Under Geometrical Priors

Vessel segmentation of retinal images is a key diagnostic capability in ophthalmology. This problem faces several challenges including low contrast, variable vessel size and thickness, and presence of interfering pathology such as micro-aneurysms and hemorrhages. Early approaches addressing this problem employed hand-crafted filters to capture vessel structures, accompanied by morphological post-processing. More recently, deep learning techniques have been employed with significantly enhanced segmentation accuracy. We propose a novel domain enriched deep network that consists of two components: 1) a representation network that learns geometric features specific to retinal images, and 2) a custom designed computationally efficient residual task network that utilizes the features obtained from the representation layer to perform pixel-level segmentation. The representation and task networks are {\em jointly learned} for any given training set. To obtain physically meaningful and practically effective representation filters, we propose two new constraints that are inspired by expected prior structure on these filters: 1) orientation constraint that promotes geometric diversity of curvilinear features, and 2) a data adaptive noise regularizer that penalizes false positives. Multi-scale extensions are developed to enable accurate detection of thin vessels.

Dense Scene Information Estimation Network For Dehazing

This project proposed a scene information estimation network for dehazing for challenging benchmark image datasets of NTIRE'19 and NTIRE'18. The proposed networks `At-DH' and `AtJ-DH' can outperform state-of-the-art alternatives, especially when recovering images corrupted by dense hazes.

Dense '123' Color Enhancement Dehazing Network

A DenseNet based dehazing network focusing on the recovery of the color information that comprises of: a common DenseNet based feature encoder whose output branches into three distinct DensetNet based decoders to yield estimates of the R, G and B color channels of the image.

Deep Wavelet Coefficients Prediction for Super-resolution

Recognizing that a wavelet transform provides a “coarse” as well as “detail” separation of image content, we design a deep CNN to predict the “missing details” of wavelet coefficients of the low-resolution images to obtain the Super-Resolution (SR) results, which we name Deep Wavelet Super-Resolution (DWSR). Out network is trained in the wavelet domain with four input and output channels respectively. The input comprises of 4 sub-bands of the low resolution wavelet coefficients and outputs are residuals (missing details) of 4 sub-bands of high resolution wavelet coefficients. Wavelet coefficients and wavelet residuals are used as input and outputs of our network to further enhance the sparsity of activation maps. A key benefit of such a design is that it greatly reduces the training burden of learning the network that reconstructs low frequency details. The output prediction is added to the input to form the final SR wavelet coefficients. Then the inverse 2d discrete wavelet transformation is applied to transform the predicted details and generate the SR results. We show that DWSR is computationally simpler and yet produces competitive and often better results than state-of-the-art alternatives.

Orthogonally Regularized Deep Networks

We propose a novel network structure for learning the SR mapping function in an image transformation domain, specifically discrete cosine transformation (DCT). The DCT is integrated into the network structure as a convolutional DCT (CDCT) layer which is trainable while maintaining its orthogonality properties with the orthogonality constraints. This orthogonally regularized deep SR network (ORDSR) simplifies the manifold of the SR task by taking advantage of DCT domain. Moreover, the CDCT layer generates the DCT frequency maps allowing the ORDSR to focus on reconstructing the fine details from the LR input.

Deep Image Super-resolution Via Natural Image Priors

We explore the use of image structures and physically meaningful priors in deep structures in order to achieve bet- ter performance. We address the problem of super- resolution from a deep learning standpoint when abundant training is not available. We propose to regularize deep struc- tures with prior knowledge about the images so that they can capture more structural information from the same limited data.

Deep MR Image Super-Resolution Using Structural Priors

Unlike regular optical imagery, for MR image super-resolution generous training is often unavailable. We therefore propose the use of image priors, namely a low-rank structure and a sharpness prior to enhance deep MR image super-resolution. Our contributions are then incorporating these priors in an analytically tractable fashion in the learning of a convolutional neural network (CNN) that accomplishes the super-resolution task. Experiments performed on two publicly available magnetic resonance (MR) brain image databases exhibit promising results particularly when training imagery is limited.

Simultaneous Decomposition and Classification Network

We propose a Simultaneous Decomposition and Classification Network (SDCN) to eliminate noise interference, enhancing the classification accuracy. The network contains of two sub- networks: the decomposition sub-network handles denoising, while the classification sub-network discriminates targets from confusers. Importantly, both sub-networks are jointly trained as an end-to-end model. Experimental results show that the proposed network significantly outperforms a network without decomposition and SRC-related methods.

Deep Networks With Shape Priors For Nucleus Detection

Nuclei detection has been a topic of enduring interest with promising recent success shown by deep learning methods. These methods train for example convolutional neural networks (CNNs) with a training set of input images and known, labeled nuclei locations. Many of these methods are supplemented by spatial or morphological processing. We develop a new approach that we call Shape Priors with Convolutional Neural Networks (SP-CNN) to perform significantly enhanced nuclei detection.


104 Electrical Engineering East,
University Park, PA 16802, USA

Lab Phone: