In order to apply Machine Learning, we need to train the dataset we are using. For that we use a library in Python, called sklearn
. It is very simple and straightforward method.
For this, we need to follow some steps, as mentioned below:
- First of all, install
sklearn
by runningpip install sklearn
on your terminal. - Next, import the necessary modules from
sklearn
. You will need themodel_selection
module for splitting the data into training and test sets, and the model you want to use for training from thelinear_model
,tree
, orsvm
modules. For example, if you want to use a linear regression model, you will need to importLinearRegression
fromsklearn.linear_model
. - Load your dataset using
pandas
or any other method you prefer. If you are usingpandas
, you can use theread_csv()
function to read the dataset into apandas
DataFrame. - Split the dataset into training and test sets using the
train_test_split()
function from themodel_selection
module. This function takes in your dataset and the target variable as input and returns the training and test sets as output. - Now, it’s time to train your model. Initialize an instance of the model you want to use, and call the
fit()
method on it. Thefit()
method takes in the training data and the target variable as input and trains the model on the training data. - Once the model is trained, you can use it to make predictions on the test set. To do this, call the
predict()
method on the model, and pass in the test data as input. Thepredict()
method returns an array of predictions. - Finally, you can evaluate the performance of your model by comparing the predictions with the actual values in the test set. You can use metrics such as mean squared error (MSE) or mean absolute error (MAE) to evaluate the accuracy of your model.
An example of how you can use sklearn
to train a linear regression model on a dataset:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd
# Load the dataset
df = pd.read_csv("dataset.csv")
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(df[["feature1", "feature2"]], df["target"], test_size=0.2)
# Initialize the model
model = LinearRegression()
# Train the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test set
predictions = model.predict(X_test)
# Evaluate the model's performance
mse = mean_squared_error(y_test, predictions)
mae = mean_absolute_error(y_test, predictions)
print("Mean Squared Error:", mse)
print("Mean Absolute Error:", mae)