Introduction

Statathon will be held jointly by UConn Statistics Department and New England Statistical Society across the 34th NESS Symposium (September 30–October 2, 2021) and NextGen Data Science Day (November 6, 2021). Statathon is a statistical data science invention marathon. Anyone who has an interest in data science can attend Statathon to approach a real world data science problem, some of which are local, in new and innovative ways. It emphasizes the statistical aspects (insight, interpretation, significance, etc.) of data science problems that are often overlooked in many hackathons. NextGen is a newly-formed committee within NESS supporting the next generation of statisticians and data scientists contributing to the betterment of the New England Statistical Society.

Timeline

15 September, 2021,
Wednesday

Registration Open

Online registration opens, data sets released online with instructions.

4 October, 2021,
Monday

Registration Deadline for Individuals Looking for a Team

Individuals looking to join an assigned team should register by this date, and we will provide your team information no later than October 7.

11 October, 2021,
Monday

Team Registration Deadline

Teams or individual participants should register by this deadline; online registration will be closed at the end of the day.

24 October, 2021,
Sunday

Submission Deadline

Deadline for teams to submit their work for the panelist to review. Submission will close 11:59 EDT.

27 October, 2021,
Wednesday

Notification

Finalist teams are selected and notified.

5 November, 2021,
Friday

Presentation

Finalist teams present to the review panel in NextGen Data Science Day.

00 TBD

Award

Awards to winning teams at the closing ceremony.

Theme and Data

Statathon 2021 focuses on the customer retention theme with the continued support from Travelers. The related data sets can be downloaded from this website below, or from Kaggle. You are encouraged to use related auxiliary data from other sources if necessary.

Theme: Customer Retention

For this theme, there are true answers, and a team should focus on proposing the best predictive model. The performance of a team will be mainly based on the predictive performance of the propose method measured by accuracy and the quality of the code. You can use Python's sklearn.metrics.accuracy_score to calculate the accuracy score for your model.

Challenge: Using historical policy data, create a multiclass predictive model to predict the policies that are most likely to be canceled and those most likely to be renewed, as well as understand what variables are most influential in causing a policy cancellation.

Training dataset: 4 years of property insurance policies from 2013 to 2017.

Test dataset: Test data for property insurance policies.

For more details about this theme, please register as a team or register to join a team for the Statathon, and we will send you a link to work on this challenge through Kaggle.

(Data sets are synthetic, provided by Travelers)

Logistics

Registration

All teams should register online. If you already have a team or want to participate as an individual, please register using the following link.

Registration form for teams or individual participants.

Each team may have up to four team members, and only one registration form should be submit by each team with all names of the team members.

If you do not have a team but want to be a part of one, please use the following form to register. The organizers will try to match you up with similar participants.

Registration form for individuals looking for a team.

Report submission

All teams should submit their work by the deadline (Oct/24/2021 11:59 EDT). Teams are encouraged to create a Git Repository (e.g., Bitbucket, GitHub, or GitLab) to host their source code and data information. However, this is not a review factor in the competition.

Connecticut Housing Theme: Each team should submit a report along with other produce such as program code used in the analysis, software products, links to other data sources if used, etc. The report should be in the format of presentation slides of up to ten pages. Finalist teams may extend their slides to more than ten pages for the presentation during the conference, but for the first round submission, the number of slides should not exceeds ten pages. A report should be submitted through TBA.

Customer Retention Theme: Teams working on this theme should submit their work through Kaggle in class using the link provided to you. No presentation slides are required in the first round submission. Finalist teams are expected to create slides based on their work and give presentations in a section of the NESS conference.

Team presentations

Ten teams (five from each theme) will be selected in the finalist, and they are invited to give a team presentation to the review panels in the sessions of Nov/05/2021. Each team will have 20 minutes to present their findings and products.

FAQ

Who can participate?

Students from universities and high schools can participate. We will not distinguish high school students, undergraduate students, and graduate students among participants.

Do I have to pay to participate?

No. Participation is free for Statathon. We will select five finalist teams from each theme to come and present the day before the NextGen Data Science Day. NextGen Data Science Day will be held virtually.

How big can a team be?

Each team can have up to 4 participants.

How can I form a team?

Participants can form teams among peer students with common interests and/or complementary expertise. If you are not able to find a team yourself, you may either work individually, or request to be assigned with other participants that do not have a team. This is an opportunity for you to meet and work with new people. A participant can be a member of only one team.

When can I start working on the problem?

You can start your work on the problem now.

What programming language can I use?

You can use any programming language or software packages.

Will there be prizes?

Yes! There will be cash prizes for 1st, 2nd and 3rd place teams for both themes ranging from $100 to $300 dollars.

Where can I find the data?

The customer retention theme will be utilizing Kaggle InClass for the Statathon. You can download the data directly from the link provided on the Statathon website Theme.

When do I need to finalize my team?

Teams must be finalized no later than October 11. If you are an individual looking to join an assigned team, you need to register before October 4 and we will provide that information to you no later than October 7.

Can a professor or another professional act as a team mentor?

Yes, a professor or another professional can act as a team mentor. However, this person is not a member of the team and cannot implement any work for the team.

What are the judging criteria for finalists?

Customer Retention Theme: Using the private Kaggle leaderboard, we will evaluate the teams that create the most accurate model score, compared to a gradient boosting machine model benchmark. The code of the top teams on the leaderboard will be reviewed, and based on the model score and code review, we will select 5 finalist teams. We are looking for each team to provide a business recommendation based on the results of your model.

The 5 finalists (or more) will be invited to present their work at the symposium and the winners will be selected among them.

Contact Info

Committee

Aolan Li, University of Connecticut

Daeyoung Lim (Chair), University of Connecticut

Tuhin Sheikh, University of Connecticut

Meiruo Xiang, University of Connecticut

Haiwei Zhou, University of Connecticut

Kathy Ziff, Travelers

Email

For any further questions, please send them to

statathon@gmail.com

Statathon 2019

Go to Statathon 2019

Sponsors

ness
travlers