Introduction

The first Stat-a-thon will be held jointly by UConn Statistics Department and NESS NextGen during the 33rd New England Statistics Symposium on May 15--17, 2019. Stat-a-thon is an statistical data science invention marathon. The trademark is owned by University of Connecticut. Anyone who has an interest in data science attends a Stat-a-thon to approach a real world data science problem, some of which are local, in new and innovative ways. It emphasizes the statistical aspects (insight, interpretation, significance, etc.) of data science problems that are often overlooked in many hackathons. NextGen is a newly-formed committee within NESS supporting the next generation of statisticians and data scientists for them to contribute to the betterment of the New England Statistical Society.



We would like to congratulate the winners of the Travelers Stat-a-thon 2019:

Theme 1: Connecticut Housing

Judge panel: Christopher Glynn, Tyler Kleykamp, Jeremy Teitelbaum, Yu Yue

  • 1st Place: Team Bentley
    • Madhurya Baruah, Bentley University
    • Pooja Sudheendra, Bentley University
    • Wenqi Wang, Bentley University
    • Yueyi Wang, Bentley University
  • 2nd Place: LLJW
    • Peng Jin, New York University
    • Myeonggyun Lee, New York University
    • Chihua Li, Columbia University
    • Bin Wang, New York University
  • 3rd Place: KAE-Group
    • Anhar Aloufi, University of Connecticut
    • Daniel Kpormegbey, University of Connecticut
    • Katherine Zavez, University of Connecticut
  • Honorable mention
    • Richard Dong, Easton Country Day School
    • Christa Malan, Easton Country Day School
  • Theme 2: Customer Retention

    Judge panel: Eugene Evans, Jiafeng Sun, Kathleen Ziff

  • 1st Place: Rule of Three
    • Ariel Chernofsky, Boston University
    • Nina Orwitz, New York University
    • Serena Zhan, Columbia University
  • 2nd Place: Q.E.D.
    • Yizhou Mi, Columbia University
    • Shengjun Wang, Columbia University
    • Jiaxi Yang, Columbia University
    • Weibo Zhang, Columbia University
  • Timeline

    04 March, 2019,
    Monday

    Registration Open

    Online registration opens, data sets released online with instructions.

    25 March, 2019,
    Monday

    Registration Deadline for Individuals Looking for a Team

    Individuals looking to join an assigned team should register by this date, and we will provide your team information no later than April 1st.

    15 April, 2019,
    Monday

    Team Registration Deadline

    Teams or individual participants should register by this deadline; online registration will be closed at the end of the day.

    26 April, 2019,
    Friday

    Submission Deadline

    Deadline for teams to submit their work for the panelist to review.

    03 May, 2019,
    Friday

    Notification

    Finalist teams are selected and notified.

    16 May, 2019,
    Thursday

    Presentation

    Finalist teams present to the review panel in sections of NESS conference.

    17 May, 2019,
    Friday

    Award

    Awards to winning teams at the closing ceremony.

    Themes and Data

    There are two themes for this Stat-a-thon. You may choose one that is interesting to you. Related data sets are provided for each theme. You are encouraged to use related data from other sources.

    Theme 1: Connecticut Housing

    For this theme, there are no true answers, and a team needs to select its own goal to work on. A team does not have to answer all the questions below. You may choose some questions to focus on, and it is welcome to propose new questions to work on.

    Housing has a significant impact on Connecticut’s economy and ability to attract and retain people. In addition, as a result of the great recession, many homeowners have not realized an appreciation in equity that is typically realized from owning a home. How can we better understand the impacts of the accessibility and affordability of housing? What is the impact on Connecticut’s economy? How can we help identify affordable and accessible places to live? What factors might affect housing prices?

    Challenge: Using the primary dataset below, combine it with additional data sources to find interesting insights, trends, correlations, relationships, or patterns in housing in Connecticut.

    Primary dataset: Real estate sales data provided by the State of Connecticut.

    Other datasets to consider: Mill Rates (tax rates); Affordable Housing

    (Data sets are provided by Tyler Kleycamp, Chief Data Officer, Office of Policy and Management, State of Connecticut)

    Theme 2: Customer Retention

    For this theme, there are true answers, and a team should focus on proposing the best predictive model. The performance of a team will be mainly based on the predictive performance of the propose method measured by AUC and the quality of the code.

    Challenge: Using historical policy data, create a retention model to predict those policies that are most likely to cancel as well as understand what variables are most influential in causing a policy cancellation.

    Training dataset: 4 years of property insurance policies from 2013 to 2017.

    Test dataset: Test data for property insurance policies.

    Data description: Variable descriptions.

    For more details about this theme, please register as a team or register to join a team for the Stat-a-thon, and we will send you a link to work on this challenge through Kaggle.

    (Data sets are synthetic, provided by Travelers)

    Logistics

    Registration

    All teams should register online. If you already have a team or want to participate as an individual, please register using the following link.

    Registration form for teams or individual participants.

    Each team may have up to four team members, and only one registration form should be submit by each team with all names of the team members.

    If you do not have a team but want to be a part of one, please use the following form to register. The organizers will try to match you up with similar participants.

    Registration form for individuals looking for a team.

    Report submission

    All teams should submit their work by the deadline (04/26/2019). Teams are encouraged to create a Git Repository (e.g., Bitbucket, GitHub) to host their source codes and data information. However, this is not a review factor in the competition.

    Connecticut Housing Theme: Each team should submit a report along with other produce such as program codes used in the analysis, software products, links to other data sources if used, etc. The report should be in the format of presentation slides of up to ten pages. Finalist teams may extend their slides to more than ten pages for the presentation during the conference, but for the first round submission, the number of slides should not exceeds ten pages. A report should be submitted through TBA.

    Customer Retention Theme: Teams working on this theme should submit their work through Kaggle in class using the link provided to you. No presentation slides are required in the first round submission. Finalist teams are expected to create slides based on their work and give presentations in a section of the NESS conference.

    Team presentations

    Ten teams (five from each theme) will be selected in the finalist, and they are invited to give team presentations to the review panels in the sessions of 05/16/2019. Travelers Insurance will be covering the conference registration fees of the NESS conference for all team members in the finalist. Each team will have twenty minutes to present their findings and products.

    FAQ

    Who can participate?

    Students from universities and high schools can participate. We will not distinguish high school students, undergraduate students, and graduate students among participants.

    Do I have to pay to participate?

    No. Participation is free for the Stat-a-thon. We will select five finalist teams from each theme to come and present during the last day of the NESS conference. Travelers Insurance will be covering the conference registration fees for members of the finalist teams. However, team members may have to cover their own travel and/or lodging expenses.

    How big can a team be?

    Each team can have up to 4 participants.

    How can I form a team?

    Participants can form teams among peer students with common interests and/or complementary expertise. If you are not able to find a team yourself, you may either work individually, or request to be assigned with other participants that do not have a team. This is an opportunity for you to meet and work with new people. A participant can be a member of only one team.

    When can I start working on the problem?

    You can start your work on the problem now.

    Can I work on both themes?

    Although we allow a team to work on both themes, we encourage you to focus on one theme in order to produce the best result. Note that each participant can only be in one team, so to work on both themes, the whole team must have an agreement. If you don't have a team, we will assign a team according to your preferred theme which means you may work on only one theme.

    What programming language can I use?

    You can use any programming language or software packages.

    Will there be prizes?

    Yes! There will be cash prizes for 1st, 2nd and 3rd place teams for both themes ranging from $500 to $100 dollars.

    Where can I find the data?

    Direct links to the data sets for the Connecticut houseing theme are available on the Stat-a-thon website Theme 1. The customer retention theme will be utilizing Kaggle InClass for the stat-a-thon. You can download the data directly from the link provided on the Stat-a-thon website Theme 2.

    When do I need to finalize my team?

    Teams must be finalized no later than April 15th. If you are an individual looking to join an assigned team, you need to register before March 25 and we will provide that information to you no later than April 1st.

    Can a professor or another professional act as a team mentor?

    Yes, a professor or another professional can act as a team mentor. However, this person is not a member of the team, and cannot implement any work for the team.

    What are the judging criteria for finalists?

    Customer Retention Theme: Using the private Kaggle leaderboard, we will evaluate the teams that create the most accurate model score, compared to a gradient boosting machine model benchmark. The code of the top teams on the leaderboard will be reviewed, and based on the model score and code review, we will select 5 finalist teams. We are looking for each team to provide a business recommendation based on the results of your model.

    Connecticut Housing Theme: Review panel will read team reports and codes to evaluate the completeness, validity, and innovation of your analysis. Since there are no true answer in this theme, we will focus on whether teams appropriately apply their statistical knowledge and skills to identify real problems, tell a complete story, provide interesting insights, develop novel solutions, and/or create intuitive and informative displays. We will select 5 finalist teams in this theme.

    The 5 finalists from each challenge will be invited to present their work at the symposium and the winners will be selected among them.

    Contact Info

    Committees

    HaiYing Wang (Chair), University of Connecticut

    Tyler Kleykamp, Office of Policy and Managment, CT

    Dooti Roy, Boehringer Ingelheim

    Gregory Vaughan, Bentley University

    Kathy Ziff, Travelers

    Email

    For any further questions, please send them to

    statathon@gmail.com

    Sponsors

    CTdata
    ness
    statds
    travlers