The Roger-Ebert Bot
Predicting the Late Renowned Movie Critic's Rating With Regression Algorithms
The goal of this project is to use linear regression model to predict the renowned late movie critic Roger Ebert's film rating if he were alive today. My primary and secondary datasets were obtained through webscraping process. Using numerical and categorical features, along with some feature engineering, I built some linear regression models. After multiple trials of 5-folds cross-validation test, modifying datasets, and more feature engineering, my final linear regression model with lasso regularization has R-squared value of 0.396 and mean absolute error of 0.55, which in layman's term, the prediction is off by 0.5 star.
Tools
- BeautifulSoup
- Selenium
- Python (Numpy, Pandas)
- Matplotlib
- Seaborn
- Scikitlearn
- Statsmodel
Techniques/Algorithms
- Webscraping
- Linear Regression (Ridge, LASSO, Polynomial)
- Cross Validation for Model Selection