Data Science and Business Analytics are exciting areas that has many new applications that could potentially revolutionize our lives. One of the most important area is forecasting business or economics activities. For example, banks using customers data to predict their credit card default rate, retailers use online browsing behaviors to predict online advertisement click-through rate, and governments using analytics to reduce crime rate and traffic congestion.
This workshop begins with important classification algorithms including decision tree, random forest, and gradient boosting machines including XGBoost and LightGBM. We will discuss how to conduct exploratory visualization for data transformation, handle missing values, cross-validation, features engineering with domain knowledge, grid search for hyperparameter tuning, and stack results from multiple prediction models. We will learn and practice R and Python to apply algorithms on Kaggle’s famous tutorials for beginners, such as predicting real-estate prices, predict future sales, or predicting the survivors during the Titanic event.
During the second phase students will work in groups to participate in an (active) data competition problem on Kaggle.com. We will work on topics that are more relevant to business analytics based on mostly structured dataset. For example, the current topics in 2018 October include “Google Analytics Customer Revenue Prediction” and “Using News to Predict Stock Movements”.
Students interested in this workshop should have basic knowledge of object-oriented programming and basic Statistics.