Success Is No Accident
Kickstarter Campaign Success Prediction with ML Classification Algorithms
The goal of this project is to predict the success of a Kickstarter campaign. The datasets used in this project came from Web Robots website from January through April 2021. I used various classification algorithms such as KNN, Logistic Regression, Decision Tree, Random Forest, Naive Bayes, and XGBoost. My final classification model was XGBoost that has a F1 score of 0.80 and AUC score of 0.82. The Model was interpreted using SHAP values metrics to understand which features have higher importance for success. Lastly, a Flask app was built using the final model after retraining it with the entire dataset.
Tools
- SQL
- Python (sqlalchemy, Numpy, Pandas)
- Matplotlib, Seaborn
- Tableau
- Scikit-learn
- Flask
Techniques/Algorithms
Classification Algorithms:
- K-nearest Neighbor
- Logistic Regression
- Decision Tree, Random Forest
- Naive Bayes - Gaussian, Bernoulli
- XGBoost
Metrics Selection:
- ROC-AUC curve - for model comparison
- F1 score - balance between precision and recall
- Minimize False Positives: creators wouldn’t want the model to predict too many success that will turn out to be a failure
- Minimize False Negatives: backers would want to make sure the model capture as many success as possible
- Confusion matrix - show actual prediction results
Application Usage
Use Case 1
Flask demo from Crystal Huang on Vimeo.
Use Case 2
If the campaign outcome prediction is bleak, some suggestions based on the feature importance with SHAP values are provided.