Data Dialogue: Using machine learning to detect malicious webpages
Cybersecurity is quickly proving to be an area of huge potential for the application of machine learning (ML) and data science. Proofpoint is a leading cybersecurity and information security company dedicated to success and innovation in keeping our customers and their employees safe. Today, I'll present some of our current applied research using machine learning to detect malicious webpages, such as credential phishing attempts. This is an interesting area of ML application for three chief reasons: first, the adversarial relationship between cybersecurity companies and malicious actors means that protection is a non-stationary problem. Second, malicious webpages are rare, which leads to extreme class imbalance problems when creating datasets and training models. Lastly, the large rate of data production (billions of URLs per day) enables the training of complex ML models but also imposes strong throughput and latency requirements on inference speed. In addition, as a Duke PhD alumnus, I'll share some unsolicited career advice with individuals interested in pursuing data science careers in industry.