Imaging Science Thesis Defense: Multimodal Data Fusion and Model Compression Methods for Computer Vision

Event Image
Imaging Science Ph.D. Defense

Imaging Science Thesis Defense

Multimodal Data Fusion and Model Compression Methods for Computer Vision

Manish Sharma

Imaging Science Ph.D. Candidate
Rochester Institute of Technology

Register for Zoom Link Here


This dissertation investigates the optimization of convolutional neural networks (CNNs) for remote sensing (RS) and computer vision (CV), enhancing object detection performance and computational efficiency. The first part addresses the limitations of conventional object detection methods in RS, hindered by the small size of targets, limited training data, and diverse modalities. We introduce YOLOrs, a novel CNN tailored for real-time, multimodal RS object detection. YOLOrs excels at detecting objects across scales and predicting their orientations, incorporating an efficient mid-level fusion architecture for handling multimodal data. Building on the concept of multimodal data fusion, we further propose a two-phase multi-stream limited paired multimodal data fusion technique that improves upon traditional methods by independently training unimodal streams before a joint training phase of a multimodal decision layer, showing superior performance in empirical tests. The second part focuses on reducing over-parameterization in CNNs, which often leads to excessive computational demands and overfitting. We present YOLOrs-lite, an adaptation of YOLOrs employing the Tensor-Train (TT) format for convolutional kernels, which significantly cuts network parameters while maintaining robust detection capabilities, ideal for edge deployment. Additionally, we extend the TT compression technique to convolutional auto-encoders (CAEs), creating the CAE- TT, which adjusts the number of parameters without altering the network architecture, demonstrating effectiveness in both batch and online learning environments. Lastly, we introduce a dynamic parameter rank pruning method using low-rank matrix approximations and novel regularization strategies, which dynamically adjusts ranks during training, reducing model size while preserving or enhancing performance on benchmark datasets. Collectively, this research advances DL applications in RS and CV by developing innovative methods that ensure high performance and efficiency across diverse platforms.

Intended Audience:
All are Welcome!

To request an interpreter, please visit

Lori Hyde
Event Snapshot
When and Where
June 11, 2024
9:00 am - 10:00 am
Room/Location: Zoom

This is an RIT Only Event

Interpreter Requested?