Polynomial Regression

Polynomial regression is a powerful technique in data science and machine learning that allows us to model and analyze complex relationships between variables. In this comprehensive guide, we’ll delve into the depths of Polynomial Regression, exploring how it works, calculating R-Squared for model evaluation, and predicting future values. Whether you’re a beginner or an experienced data scientist, this article will provide you with valuable insights to enhance your Python skills.

How Does it Work?

Polynomial regression extends the concept of linear regression by introducing polynomial terms into the equation. Instead of fitting a straight line to the data, it fits a polynomial curve. This allows us to capture more intricate patterns in the data, making it a versatile tool for various real-world scenarios.


Imagine you have a dataset that represents the relationship between the number of years of experience and the salary of employees. While a linear regression model might be suitable for a simple analysis, a polynomial regression can better capture the nuances in salary growth. Let’s look at some Python code to illustrate this:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([1, 2, 3, 4, 5, 6])
y = np.array([25000, 35000, 45000, 55000, 65000, 75000])

# Transforming data for polynomial regression
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X.reshape(-1, 1))

# Fit the polynomial regression model
model = LinearRegression()
model.fit(X_poly, y)

# Predicting a new value
new_experience = 7
predicted_salary = model.predict(poly.transform(np.array([[new_experience]])))

print(f"Predicted Salary for {new_experience} years of experience: ${predicted_salary[0]}")

This code demonstrates how to use Polynomial Regression in Python to predict an employee’s salary based on years of experience, taking into account the non-linear relationship between the two variables.

R-Squared: Evaluating Model Performance

R-Squared (R²) is a crucial metric to assess the goodness of fit in polynomial regression models. It measures the proportion of the variance in the dependent variable that is predictable from the independent variables. A higher R² value indicates a better fit.


Continuing with our salary prediction example, let’s calculate the R² value for our polynomial regression model:

from sklearn.metrics import r2_score

# Predicted values
y_pred = model.predict(X_poly)

# Calculate R-squared
r_squared = r2_score(y, y_pred)
print(f"R-Squared: {r_squared}")

Predict Future Values

One of the primary advantages of polynomial regression is its ability to make predictions beyond the existing dataset. This makes it a valuable tool for forecasting future trends based on historical data.


Let’s say we want to predict the salary of an employee with 8 years of experience:

new_experience = 8
predicted_salary = model.predict(poly.transform(np.array([[new_experience]])))

print(f"Predicted Salary for {new_experience} years of experience: ${predicted_salary[0]}")

By applying our trained polynomial regression model, we can estimate the salary for an employee with 8 years of experience.

In conclusion, Polynomial Regression in Python is a potent technique for modeling complex relationships between variables, and it can help you make accurate predictions and informed decisions. By understanding its inner workings, calculating R-Squared, and utilizing code examples, you’re on your way to mastering this valuable tool for data analysis and machine learning.