Linear Regression

Linear regression is a basic and commonly used type of predictive analysis.  The simplest form of the regression equation with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent variable score, c = constant, b = regression coefficient, and x = score on the independent variable.

Since linear regression can take only one input variable, we chose “free kick accuracy” as our input variable. And as always our dependent variable will be “finishing”. Also we checked if there is a linear relationship between these two attributes.

9.PNG
Linear relationship between finishing and free kick accuracy

One thing to note here, since we did not use linear regression for classification, we did not create the “forvet” column for dataframe. Linear regression can make predictions for continious variables.

10.PNG
pairwise relations

After we fitted our model and made predictions, we got the following results.

11.PNG
Results of Linear Regression

Here, variance score is measured with its closeness to 1. If it is 1, it is perfect prediction. Since we got 0.63, it is really good. Also, in neural networks, decision trees we used accuracy score as a measure of our predictions but accuracy score is only for classification. For regression we used Mean Square Error which was 236 for our case.

11.PNG

R-squared = Explained variation / Total variation

We have a R-squared value of 0.62 which is good. This means that model explains 62% of the response data around its mean. In general, the higher the R-squared, the better the model fits your data. Since p value <  F-statistic we can also say that the linear regression model fits the data.

In order to see the effect of non-linear relation between two variables, we did another try with “jumping” instead of “free kick accuracy”.

12.PNG
jumping vs finishing

The results were as expected. Variance score 0 and MSE is very high.

13.PNG