Imaging Science MS Thesis Defense: Yuval Levental
MS Thesis Defense
LIDAR Voxel Segmentation Using 3D Convolutional Neural Networks
Yuval Levental
Imaging Science MS Candidate
Chester F. Carlson Center for Imaging Science, RIT
Abstract: Light detection and ranging (lidar) forest models are important for studying forest composition in great detail, and for tracking objects in the forest understory. Here we used DIRSIG, a first-principles and physics-based simulation tool, to turn lidar data into voxels. A voxel is a 3D cube where the dimension represents a certain distance. These voxels were split into categories consisting of background, leaf, bark, ground, and object (man-made) elements. Voxel content was then predicted from the provided simulated and real lidar data. The inputs were 3D neighborhood cubes surrounding each voxel, which contain surrounding information. Provided data were from two sources: an unmanned aerial system (UAS)-based VLP-16 lidar, flown close to the canopy, and the National Ecological Observation Network (NEON) lidar airborne system (1000 m above ground level). Different machine learning algorithms were implemented, with 3D CNN algorithms shown to be the most effective. The Keras library was used, since creating the layers with the sequential model was deemed simplest. The VLP-16 data were significantly more accurate than the NEON data, because of the closer sampling distance to the canopy. For VLP-16 data that was tuned, leaves and bark had precision values of 61% and 36%, respectively, due to their relatively random shapes. However, ground and man-made objects had precision values of 97% and 80%, respectively, due to the high signal intensities (1,064 nm) and their rigid shapes. A sample of real NEON data were used, though the sample primarily focused on the forest canopy. Most of the voxels from this data set were correctly predicted as leaves. To improve accuracy, additional channels were added to the input voxels. One input parameter, which proved to be very useful, were the local z-values of each input array. The Keras Tuner framework was then used to obtain improved hyperparameters. The learning rate was reduced by a factor of 10, which provided slower, but steadier, convergence towards accurate predictions. The resulting accuracies from the predictions are promising, but there is room for improvement. Different ML algorithms that use 3D lidar point clouds should also be considered, while further segmentation of forest classes is another possibility. There are different types of trees and bushes, so tree or bush voxels (objects) could have their own unique sub-classes. This would make predicting the shapes much easier. Overall, discovering a method for accurate object prediction has been the strongest result.
Intended Audience: Undergraduates, graduates, and experts. Those with interest in the topic.
To request an interpreter, please visit https://myaccess.rit.edu
Event Snapshot
When and Where
Who
Open to the Public
Interpreter Requested?
No