RIT cyber fighters go deep on Tor security

Gabrielle Plucknette-DeVito

Every week, the cybersecurity research team meets for a “scrum” to discuss the latest updates on their Tor security projects and bounce ideas for new attacks and defenses off each other.

Recognizing that the internet is not always secure, millions of people are turning to the Tor anonymity system as a way to browse the World Wide Web more privately.

However, Tor has been found to have its own vulnerabilities, including an attack known as website fingerprinting. This has a team of faculty and students from RIT’s Center for Cybersecurity researching the extent of the problem and ways to address it.

Led by Matthew Wright, director of the center, and supported by a series of projects funded by the National Science Foundation, the team aims to think like future attackers so it can develop defenses that will last. The result: creating new attacks and defenses that use the latest advances in deep learning.

“Deep learning has proven to be effective in so many applications,” said Wright, who is also a professor of computing security. “From self-driving cars to voice recognition in smart home speakers—it’s just a matter of time before attackers take advantage of those same techniques.”

Privacy for all

With more than 8 million daily users, Tor has become a popular free tool for activists, law enforcement, businesses, military, people living in countries with censorship, and even regular privacy-conscious individuals.

“When journalists need to communicate more safely with whistleblowers and dissidents, they often use Tor,” said Wright. “We need this more secure way to access the internet because it’s essential to our freedom of speech and privacy.”

Wright explained that Tor creates a secure browsing experience by encrypting all its connections and sending traffic on a path through several random servers, rather than making a direct connection to the user’s desired website. It protects against snooping on which sites a user visits, such as sites on sensitive issues like religion, health, or politics.

With the website fingerprinting attack, local eavesdroppers or internet service providers can collect the encrypted traffic and identify which website the user is visiting based on specific patterns in the traffic. While hackers can’t actually see what a user did on the website, they have already learned something that the user is trying to protect.

Deep fingerprinting

Tor developers were considering two defenses against website fingerprinting that could cut the attack’s accuracy in half.

Payap Sirinam, a computing and information sciences Ph.D. student, was tasked with exploring the potential for deep learning in the website fingerprinting attack.

Adversaries are going to develop this technology themselves anyway, so the RIT team wanted to figure out how future attacks might work.

While the first website fingerprinting attack used machine-learning classifiers with manually developed features to analyze traffic, the team’s new attack would use deep learning, which extracts features automatically.

“You manually train a machine-learning computer to recognize patterns in web traffic that humans can’t see—that’s why it’s so good at this attack,” said Sirinam, who is from Thailand. “By using deep learning, attackers are essentially able to spend less time training, while finding even more patterns that they can use to identify a website.”

The RIT team’s new attack, called Deep Fingerprinting, was based on a Convolutional Neural Network (CNN) that was designed using cutting-edge deep-learning methods. The attack automatically extracts features from packet traces and does not require handcrafting features for classification.

After thousands of hours running trace experiments in a closed-world setting, the new attack outperformed all previous state-of-the-art website fingerprinting attacks. The attack was 98 percent effective against Tor. Even against existing defenses, Deep Fingerprinting had more than 90-percent accuracy.

The Deep Fingerprinting project included work from Sirinam; Professor Wright; Marc Juarez, a Ph.D. student at the Belgian research university KU Leuven; and Mohsen Imani, a former Ph.D. student of Wright’s at University of Texas at Arlington. A paper on the NSF-sponsored work was a finalist for an Outstanding Paper Award, placing it in the top 1 percent of all submitted papers, at the 2018 ACM Conference on Computer and Communications Security in Toronto.

“Now that we know which defenses aren’t going to work against the new top-level attacks, it’s up to us to create defenses that do,” said Sirinam.

Upping our defense

Nate Mathews, a fourth-year computing security major, finds it fun to work with really difficult and ambiguous problems. However, the dilemma he’s currently trying to solve is one that his mentor created.

Working together with Sirinam, Mathews is trying to better understand why the Deep Fingerprinting attack is so effective, in order to develop a defense that can stop it.

Mathews describes deep learning as a black box. Researchers put data in and output arrives at the other end. But it’s difficult to see the inner workings of the box.

“If we could figure out which data features the deep learning thinks is important, we can identify the particular regions to defend,” said Mathews, who is from Ross, Ohio.

To help visualize which parts of a trace are most important to the classification decision made by deep learning, the team is applying the GradCAM technique. Traditionally used in image classification, GradCAM generates heatmaps that show what parts of the trace the deep learning algorithm is focusing on.

Using their findings, Mathews and Sirinam are proposing ways to add fake packets to these important parts of the trace, which can confuse the deep-learning algorithms.

“It’s like adding noise to a picture of a cat, so you can hide what kind of animal it is,” said Wright. “You can add noise to the entire picture, but that’s expensive in our setting. But if we can obscure the ears and the face, it might be enough.”

Saidur Rahman, a computing and information sciences Ph.D. student, and Aneesh Yogesh Joshi, a computer science master’s degree student from India, are also developing a new defense strategy that is meant to trick the deep learning.

Known as the adversarial examples defense, it uses deep learning to add packets and modify website traces in a way that causes the classifier to misclassify.

“We borrowed the idea from the domain of computer vision, where you can distort patterns in the model,” said Rahman, who is from Bangladesh. “This defense can make Facebook traffic look like Google traffic.”

Before implementing any new defense, the team needs to complete thousands of experiments in closed-world and more realistic open-world settings. They also need to take bandwidth and latency overhead into account. If a defense is going to slow the system down to a halt, users may find that the benefits no longer outweigh the cost.

Bolstering the attacks

Taking it one step further, the experts at RIT are trying to find other attacks they could use to test the robustness of their defenses.

They are developing Tik-Tok, an attack that uses packet timing information. Prior attacks discounted timing information because the characteristics change on each visit to a site, making it hard to extract patterns.

“We saw this as a largely untapped resource and something that might benefit from adding deep-learning classifiers,” said Rahman. “We selected and extracted eight new timing features that provide a lot of value.”

Preliminary results indicate that Tik-Tok could be a successful attack in the future.

Sirinam is also developing a new attack and subsequent defense as the last part of his dissertation. Using a branch of deep learning that he borrowed from facial recognition, he plans to create an attack that is more realistic than Deep Fingerprinting.

While the Deep Fingerprinting model may require 1,000 examples from each website to classify correctly, the new n-shot learning with triplet networks concept allows a classifier to learn from only five examples.

“N-shot learning is like an eco-car that requires fewer resources and has reasonably good performance, while the sports car—like Deep Fingerprinting—requires rich resources in order to perform at its best,” said Sirinam. “This shows the danger of website fingerprinting attacks, even with less powerful adversaries, so we need to figure out a way to stop them.”

Wright said that throughout these research projects, the Tor community has been an amazing partner and appreciative of RIT’s efforts. Many of these defenses could be implemented on Tor in the next two to three years.

“We know that our defenses will likely be broken in the future—that’s the nature of cybersecurity,” said Wright. “But we are coming up with solutions that will help people around the world stay safe for the time being, and I think that’s what really matters now.”

Global Cybersecurity Institute

Construction is underway for RIT’s Global Cybersecurity Institute, which will help the university become a nexus of cybersecurity education and research.

The three-story facility will include a cyber learning experience center, a simulated security operations center, labs, and offices. The institute will address the critical workforce needs in cybersecurity through education and professional development programs.

It is expected to open in July 2020 and will be the first facility of its kind in upstate New York.