Sorry, you need to enable JavaScript to visit this website.

Search form

Mei Nagappan's findings have big ramifications for developers

By Fran Broderick

When RIT professor Mei Nagappan was searching for a place to work as a post-doc he knew he wanted to employ the skills he gained as researcher mining the data in software repositories. However, he was surprised by the question his new boss Ahmed Hassan greeted him with upon arriving at Queen’s University - Kingston: “he asked: ‘do you want to start something completely new?’” says, Nagappan reflecting on the moment that would spark the discoveries he is making with his current research. “This was 2011, four years after the iPhone first launched. Hassan wanted to begin researching mobile development.”

Along with Hassan, and other students, Nagappan spent a lot of time analyzing the oceans of data created in software repositories where researchers can find what went right and what went wrong in the development of a piece of software. The process is called ‘data mining,’ and offers insights into how software development can be improved. The iOS App Store now presented Nagappan with another vast collection of data from which he could glean insights.

“[For mobile research] we looked at the iOS App Store as a data repository. It has apps, metadata – i.e. who the developer is, what permissions the app requests, how many times the app has been downloaded – and most importantly, the App Store is one central location for all the data. We don’t need to collect app reviews from different websites or solicit feedback. Reviews are unsolicited and unbiased. I personally think the smartphone leap made by Apple was not the iPhone – it was was the iOS app store.”

Nagappan buoys his argument by noting smartphones had already been in production for a few years when the iPhone launched, however, no company had introduced a single point-of-entry store. “App stores are important because they’re democratic – a student-developed app can get more hits than an app from a big studio,” says Nagappan. “Plus, it’s a great resource to mine,” he adds, smiling.

 

“I personally think the smartphone leap made by Apple was not the iPhone – it was was the iOS app store.”

 

Nagappan’s initial salvo into mobile data mining involved the study of 5,000 apps. Soon thereafter he and his colleagues were able to simulate an Android device on a server, allowing them to download more than 100,000 apps from the Google Play store. One of the first things that caught his eye was the way apps were using ad networks, the middlemen that take the banner advertisements from companies and serve them to audiences using an app. Nagappan noted that while there are a number of noteworthy ad networks, many apps seemed to be connecting to networks with no rhyme or reason – one app was connected to 28 ad networks. But this was only part of the problem.

“We asked ourselves whether the number of ad networks an app connected to was related to an app’s rating. We weren’t looking for causation, just trends.  When we spotted certain libraries corresponding with poor ratings we looked deeper and found these networks were the ones serving up malware.”

In addition to creating a generally negative user experience these troublesome ad networks have negative impacts on advertisers and developers. Nagappan uses a hypothetical advertiser to demonstrate his point: “Let’s say I’m Chevy and I want to run an ad. I go to a middleman and give them my ad. However when you click on that ad that network decides to go ahead and install Chevy’s app or some other piece of software on the user’s phone. Now, this is a gross invasion of privacy but of course nobody reads the terms and conditions. So, the user is frustrated and he gives the app a poor rating. The developer now has a poor rating and may think it’s related to the app’s functionality – having no idea it was related to the ad experience.”

If you’re an app developer confident this won’t happen to you, Nagappan has some very sobering statistics. By his estimation, 30% of the apps that he analyzed are connected to ad networks that no longer exist. Moreover, many developers don’t realize that negative reviews can be the result of the device, not the app. “Fragmentation is a big issue right now,” says Nagappan referring to the fact that an app developed for Android may not necessarily function across Android devices. “Some developers won’t develop for Android because there are more than 19,000 Android devices.”

But again, data mined from app stores offers potential solutions: “We found one app where most of the bad reviews stemmed from Motorola Droid X users. The app wasn’t designed to work on Droid X in the first place. I think Google really helps developers with this. Developers can mine the information about which devices gave the app ratings. They can then prioritize the devices where the app get most of its reviews and then test your app on those devices.”

Nagappan’s findings are starting to command the attention of industry and academia. He recently presented at the Foundations of Software Engineering conference in Hong Kong where the Q and A involved a lot less Q, and a lot more imploring Nagappan to monetize his research. But Nagappan is humble and his research is not finished. He has continued to find exciting trends in mobile app data and will soon be releasing another paper that will highlight the issues in the 5-star rating system used to rate apps.

He also continues to find humorous outlier apps that showcase the madness stemming from excess ad networks. “I found a calculator app that had eight releases. It’s a calculator – it’s core function isn’t changing – so what purpose can these updates have? It’s simple – they were just adding ad networks with each release.”

 

Students interested in working with Nagappan on analyzing data from app stores are urged to contact him at mei@se.rit.edu