DataFest 2019

Overview

A data hackathon for undergraduate students sponsored by the American Statistical Association.

Analyze. Work in teams to tackle what's probably the richest, most complex dataset you've seen so far provided by a real-life organization. Students at any stage of their data science educaton and from any major are welcome.

Network. Meet other data science professionals as well as students from other colleges and university. Make connections that can enrich your education and help launch your career.

Experience. Take away a story to tell at future job interviews about how you met the challenges posed at DataFest, how you functioned under pressure, and how you would approach similar problems in the work place.

Registration

Each team captain must take the responsibility of registering his/her team here.

Once the team is registered the captain should encourage all the members to INDIVIDUALLY register here, otherwise the team will be kicked out of the contest.

If you are a graduate student wishing to help with the mentoring of contestants, register here.

If you are a local faculty member or a data scientist in our community willing to help, register here.

Student Guidelines

Rochester Institute of Technology
Xerox Auditorium in the Kate Gleason College of Engineering (Building 09)

We recommend that every member of the team bring a laptop, if possible.  You might find it helpful to have a mix of PCs and Macs, since they have different strengths. We recommend that you make sure beforehand that the software you will be using throughout the weekend is properly installed and running on your computer. You will be working with a large dataset so make sure that you have the space for it on your drive. 

We will have snacks and munchies.  Feel free to bring anything additional you might want.  You are of course free to come and go as you please, but particularly the first night (up until midnight) will be fairly structured. 

You might want to bring some favorite statistical or computational reference books, if you have them, or bookmark some pages that you routinely refer to. 

The dataset you will be working with is quite.  If you type a variable name to view it, it will take a while to display.

Therefore, remember these R commands: head(), tail(), str(). 

We strongly recommend you create a small data set that you can use to test things on.  Then, if it works out, you can apply your procedure to the large dataset.  Some procedures can take a frustratingly long time to run on large data sets, and so it will be comforting to know that your procedure works (because you tested it on a smaller data set) while you wait.  We recommend taking a random sample of rows from the original data set, but there might be other approaches you find useful. 

Sponsors

American Statistical Association (ASA)
WITR 89.7 
Google
RIT Graduate Statistics
RIT Data and Predictive Analytics Center
RIT College of Science

Cauchy Sponsor - $5,000

  1. Access to “Meet the Sponsors” Career Fair
  2. Access to the Resume Book
  3. Invitation to join the Email Listserv
  4. Full-page ad in the DataFest conference main booklet/conference program
  5. Large logo prominently placed on all DataFest banners
  6. Logo prominently placed on the DataFest poster
  7. Large logo and company link on the DataFest website
  8. Short company profile and link on DataFest social media (FB, Twitter, etc.)
  9. Company name displayed in the DataFest main conference desk during event

Pareto Sponsor - $2,500

  1. Access to “Meet the Sponsors” Career Fair
  2. Access to the Resume Book
  3. Invitation to join the Email Listserv
  4. Medium logo placed on all DataFest banners
  5. Logo placed on the DataFest poster
  6. Medium logo and company link on the DataFest website
  7. Short company profile and link on DataFest social media (FB, Twitter, etc.)
  8. Company name displayed in the DataFest main conference desk during event

Lognormal Sponsor - $1,000

  1. Access to the Resume Book
  2. Invitation to join the Email Listserv
  3. Small logo placed on all DataFest bannersLogo placed on the DataFest poster
  4. Small logo and company link on the DataFest website
  5. Short company profile and link on DataFest social media (FB, Twitter, etc.)
  6. Company name displayed in the DataFest main conference desk during event

Weibull Sponsor - $500

  1. Short company profile and link on DataFest social media (FB, Twitter, etc.)
  2. Acknowledgment of company the DataFest website
  3. Company name displayed in the DataFest main conference desk during event

Gauss Sponsor - $100

  1. Acknowledgment of company the DataFest website
  2. Company name displayed in the DataFest main conference desk during event

Uniform (Individual) Sponsor - $50

  1. DataFest 2017 Memorabilia

Rules

Rules

  • You can come and go as you please, but all work must be completed on premises. 
  • Before downloading the dataset you must sign the Non-Disclosure Agreement by agreeing to the terms of use and entering your name and email address. At the end of DataFest, delete all data from thumb drives, hard drives, etc. The data are sensitive. 
  • Should members of your team drop out at the last minute, you might be merged with another team who is also missing members. 
  • At all times between 9am-12midnight there will be a friendly Consultant present. These are faculty, grad students, or other professionals with field specific knowledge on the dataset. They all have different areas of expertise, so if you get stuck on something, ask someone else later. Feel free to ask anything. This is not an exam, but a collaboratory competition. Do not expect the Consultants to write code for you, or do data management, etc. They are there to help point you in the right direction, but you're responsible for getting there on your own. Schedule of Consultants will be made available at the beginning of the event. 

Judging

  • Each team will have five minutes to present their findings to the judges. 

  • Each team will be allowed at most three slides. Three. So at some point on Sunday, you might want to set aside time to think about what you want the judges to know. The five minute time limit will be strictly enforced. All team members must be present for the presentation, but not all team members need to actually speak (given the time limitation). 

  • Your presentation must be emailed to Ernest Fokoué. Allowed formats: PDF, PowerPoint, Keynote. If using a web-based tool like GoogleDocs or Prezi, please export to PDF and send the PDF as your submission, you will not have time to log on/off to your account during the presentations. 

  • Along with your presentation you will also turn in a one-page write-up of your project. You can think about this as the text of your presentation. The judges will refer to these during deliberation. 

  • Awards will be given in three categories.

  • Best in Show: This is the main prize. We will give out two prizes for this category, one for graduate student teams and one for undergraduate. 

  • Best Visualization 

  • Best Use of Outside Data

Ambassadors

Daniel Jacob Behnke
Sergio Zygmunt
Alexandra Felker
Eddie Pei
Josie Shi
Xiang Xie
Italo Sayan
Nuthan Munaiah
Benjamin Meyers
Sailee Mathew Rumao
Jianfang Ma
Dennis Tao
James Spann