Registration of full motion video

Principal Investigator(s)

Emmett Ientilucci

Research Team Members

Liam Smith, Nina Raqueno

Project Description

In many cases, it is important to have location information associated with imagery. When position and pointing telemetry is provided by the imaging platform, it is trivial to compute this information. However, when this information, which includes GPS, is deficient, the missing location information must be extrapolated. Thus there becomes a need to extract location information from motion imagery in such a scenario. Various techniques exist to compensate for lack of GPS data in an aerial image matching context, including using additional information from other sensors. Our work focuses on how various machine learning techniques can be implemented and leveraged to perform image matching, or alignment, in a real-time.

Our investigations of feed-forward convolutional neural networks (to perform image alignment) has resulted in three approaches, called HNET, VWNET and MOFLNET. HNET is based on literature, with a feed-forward convolutional neural network (CNN) that takes stacked grayscale images to produce a homography. VWNET is a modification of HNET that instead produces a coarse vector field describing the warping between two images. MOFLNET is another modification of HNET, essentially appending the CNN with a set of transposed convolutions that produce a stacked grayscale image pair representing the optical flow between the two input images, which has similarities to models in literature.

Figures and Images