Decision Tree

Decision Trees are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

For our case, think of the transfer season. Our decision tree will try to decide if one player should be hired as striker. If he has a good finishing score, he can be hired. Like we did in hypothesis testing, “finishing” score is divided into three categories; good, average and bad. Then “finishing” column was replaced with “forvet” column. The tree will classify the forvet quality of a player according to different attributes.

Version #1

First we tried to construct a decision tree from the whole dataset but since it has 17995 different players (rows) we had a very complicated and big decision tree. So, we started with Version #1 dataset.

kşk
Shuffling of the rows
data.PNG
Dataset used for Decision Tree algorithm

Our dataset looked like above; 12 features and 1 forvet column as label. We created this forvet column using the function below.

1.PNG
“Forvet” column creation

Then, we assigned “bad”, “average” or “good” according to their forvet values.2.PNG

We did a 70-30 percent train-test split on the data and after applying decision tree algorithm, we ended up with the tree below:dt_common.png

Firstly, we manually checked accuracy of our decision tree on a couple of rows. For example, the player on the 289’th row of our dataset has the following scores for the features.

3.PNG

By following the rules on the decision tree, we can easily see that our decision tree classified him correctly.

Accuracy Score: 89%

Version #2

Having too many features can cause overfitting, thus we wanted to see the results if we decrease the number of features used in our decision tree, this is why we eliminated most of the features in Version#1 and ended up with the below dataframe. Correlations between these features are very low, almost all of them are below 0.5.

5.PNG
dataframe of version#2
dt_common2.png
Decision tree with 4 features

Again if we were to manually check the accuracy of our decision tree; below we have the scores of the players at index 289 and 56. If we follow the scores on the decision tree we can see that both of the players are classified correctly.

6.PNG

Accuracy Score: 80%

Version #3

Since the decision tree is too complicated to be analyzed, we are not sharing it here, but accuracy score we got is the best out of these 3 versions.

Accuracy Score : 91%

7.PNG