DataFest 2022


group shot of students at Datafest

A data hackathon for undergraduate students sponsored by the American Statistical Association.

Analyze. Work in teams to tackle what's probably the richest, most complex dataset you've seen so far provided by a real-life organization. Students at any stage of their data science education and from any major are welcome.

Network. Meet other data science professionals as well as students from other colleges and university. Make connections that can enrich your education and help launch your career.

Experience. Take away a story to tell at future job interviews about how you met the challenges posed at DataFest, how you functioned under pressure, and how you would approach similar problems in the work place.


Each team captain must take the responsibility of registering his/her team here.

Once the team is registered the captain should encourage all the members to INDIVIDUALLY register here, otherwise the team will be kicked out of the contest.

If you are a graduate student wishing to help with the mentoring of contestants, register here.

If you are a local faculty member or a data scientist in our community willing to help, register here.

Student Guidelines

This year's competition will be held remotely via Zoom.

We recommend that every team member have a desktop or laptop available for use during the competition. You might find it helpful to have a mix of PCs and Macs since they have different strengths. We recommend that you make sure the software you will be using throughout the weekend is installed correctly and running on your computer before the competition. You will be working with a large dataset so make sure that you have the space for it on your drive.

You might want to have some of your favorite statistical or computational reference books ready to be used if you have them, and bookmark the pages that you regularly use.

The dataset you will be working with is quite.  If you type a variable name to view it, it will take a while to display.

Therefore, remember these R commands: head(), tail(), str(). 

We strongly recommend you create a small data set that you can use to test things on.  Then, if it works out, you can apply your procedure to the large dataset.  Some procedures can take a frustratingly long time to run on large data sets, and so it will be comforting to know that your procedure works (because you tested it on a smaller data set) while you wait.  We recommend taking a random sample of rows from the original data set, but there might be other approaches you find useful. 



  • Before downloading the dataset you must sign the Non-Disclosure Agreement by agreeing to the terms of use and entering your name and email address. At the end of DataFest, delete all data from thumb drives, hard drives, etc. The data are sensitive. 
  • Should members of your team drop out at the last minute, you might be merged with another team that is also missing members. 
  • At all times between 9 am-12midnight there will be a friendly Consultant present. These are faculty, grad students, or other professionals with field-specific knowledge on the dataset. They all have different areas of expertise, so if you get stuck on something, ask someone else later. Feel free to ask anything. This is not an exam, but a collaboratory competition. Do not expect the Consultants to write code for you, do data management, etc. They are there to help point you in the right direction, but you're responsible for getting there on your own. The Schedule of Consultants will be made available at the beginning of the event. 


  • Each team will have five minutes to present their findings to the judges. 

  • Each team will be allowed at most three slides. Three. So at some point on Sunday, you might want to set aside time to think about what you want the judges to know. The five-minute time limit will be strictly enforced. All team members must be present for the presentation, but not all team members need to actually speak (given the time limitation). 

  • Your presentation must be emailed to Ernest Fokoué. Allowed formats: PDF, PowerPoint, Keynote. If using a web-based tool like GoogleDocs or Prezi, please export to PDF and send the PDF as your submission, you will not have time to log on/off to your account during the presentations. 

  • Along with your presentation, you will also turn in a one-page write-up of your project. You can think about this as the text of your presentation. The judges will refer to these during deliberation. 

  • Awards will be given in three categories.

  • Best in Show: This is the main prize. We will give out two prizes for this category, one for graduate student teams and one for undergraduates. 

  • Best Visualization 

  • Best Use of Outside Data


  • Ahmed Toslim
  • Jason LaRuez
  • Karan Thacker