Python project for identifying overvalued and undervalued outliers in professional soccer. Players are stratified by position and league, and then a scikit-learn regression model is trained using 35 different advanced performance metrics scraped from FBref.com and stored in an SQL table. The model predicts players’ market value using projected market values from Transfermarkt.com. Then, each player’s predicted market value is compared to their projected market value from Transfermarkt. This way, overvalued and undervalued players are identified.

Valuable learning Experiences from this project:
- Gained a deeper understanding of Python’s sklearn library
- Got more comfortable with SQL and Python’s sqlite3 library
- Learned more about Python’s pandas library
- Learned how to clean and reformat data
- Learned how to Impute data using sklearn
- Learned ways to optimize web scraping for more efficient scraping