Data Dialogue: How Rare is Rare? Why Model Validation is Important
In the growing world of data science and analytics, data is becoming more prevalent and used by organizations from every field. However, we must be careful that the pendulum doesn't swing too far to one side and we start torturing data into saying whatever we want. Much like in clinical trials where we have placebos, we need to properly validate our results and models to better understand if we have a "placebo effect" in our model results and we aren't just getting lucky. This talk highlights the technique of target shuffling, which tries to answer that exact question - what is the probability that my results occurred due to random chance? First used in hedge fund strategy validation, this simulation based technique answers this question for any cross-sectional data problem in an easily interpretable way.