Imaging Science Ph.D. Defense - Mahshad Mahdavi

Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

Mahshad Mahdavi
Imaging Science Ph.D. Candidate
Chester F. Carlson Center for Imaging Science, RIT

Join Zoom Meeting

Abstract:
We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g., handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds' Arborescence Algorithm. QD-GGA does not require additional grammar or language models, and may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4 % for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QD-GGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context.

Intended Audience:
Undergraduates, graduates, and experts. Those with interest in the topic.

Contact

Beth Lockwood

ealpci@rit.edu

Event Snapshot

When and Where

August 07, 2020

11:00 am - 12:00 pm

Virtual

Room/Location: Zoom

Who

Open to the Public

Interpreter Requested?

Topics

research

All Events