Machine Learning with Python Part.2 ~Logistic Regression~ - IMACEL Academy -人工知能・画像解析の技術応用に向けて-| エルピクセル株式会社

What is logistic regression?

Logistic regression is a classification algorithm used to assign data into a discrete set of classes. There are multiple types of logistic regression: binary(yes/no, pass/fail), multi(cats/dogs/rats), and ordinal(small, medium, high). The purpose of binary logistic regression is similar to that of Perceptron, but there is a key difference: activation function.

Activation function for Perceptron: Binary step function

$$\phi (z)=\begin{cases} 1 & \text{ if } z>0\\ -1 & \text{ if } z\leq 0 \end{cases}$$

Activation function for logistic regression: Sigmoid function

$$\phi(z)=\frac{1}{1+e^{-z}}$$

Let's graph this function using Numpy and Matplotlib.

import matplotlib.pyplot as plt
import numpy as np

def sigmoid(z):
    return 1.0/ (1.0 + np.exp(-z))

x = np.arange(-7,7,0.1)
plt.plot(x,sigmoid(x))

sigmoid_plot.py

As you can see from this graph, it exists between 0 and 1. Since probability of anything exists between the range of 0 and 1, Sigmoid function is used to predict the probability as an output.

For the binary classification problem, we set 0.5 as a threshold and define the output as:

$$\hat{y}=\begin{cases}1 & \text{ if } \phi(z)\geq 0.5 \\ 0 & \text{ if } \phi(z)< 0.5 \end{cases}$$

How the logistic regression works?

Some steps are very similar to the Perceptron algorithm, so please refer to the below link if you are interested in the Perceptron algorithm or would like to go over some basic concepts of machine learning.

Machine Learning with Python Part.1 ~Perceptron~

1. Create and initialize the parameters of the network

2. Multiply weights by inputs and sum them up

3. Apply activation function

As already explained, sigmoid function instead of binary step function is used for logistic regression.

def sigmoid(self, z: np.array) -> np.array:
        """Sigmoid function.
        
        :param x: input of the activation function: z
        :return: output of the sigmoid function
        """
        return 1/(1+np.exp(-z))

sigmoid_function.py

4. Calculate the cost

We will still use the cost function to calculate the model's performance, but instead of using the mean squared error, we will use the cross-entropy loss function. The loss will increase as the predicted probability diverges from the actual label.

$$-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}log(\phi(z))+(1-y^{(i)})log(1-\phi(z)))$$

where $m$: number of examples, $y$: true label vector, $\phi(z)$: output prediction vector.

5. Update the parameters

Same as the Perceptron algorithm, logistic regression uses gradient descent optimization algorithm to update weights and biases. In order to do this, we need to calculate the derivative of cost function.

Derivative of sigmoid function:
$$\phi'(z)=\frac{\mathrm{d} }{\mathrm{d} z}(\frac{1}{1+e^{-z}})=\frac{e^{-z}}{(1+e^{-z})^2}=\frac{1+e^{-z}-1}{(1+e^{-z})^2}=\frac{1}{1+e^{-z}}-\frac{1}{(1+e^{-z})^2}=\phi(z)(1-\phi(z))$$

Derivative of cost function: J with respect to the activation $\phi(z)$ (denote this as "a"):
$$\frac{\partial J}{\partial a}=\frac{\partial }{\partial a}(-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}log(a)+(1-y^{(i)})log(1-a))=\frac{1-y}{1-a}-\frac{y}{a}$$

Derivative of cost function: J with respect to weights:
$$dw=\frac{\partial J}{\partial w}=\frac{\partial J}{\partial a}\frac{\partial a}{\partial z}\frac{\partial z}{\partial w}=\frac{1}{m}(\frac{1-y}{1-a}-\frac{y}{a})*a(1-a)*X.T$$

Derivative of cost function: J with respect to bias:
$$db=\frac{\partial J}{\partial b}=\frac{\partial J}{\partial a}\frac{\partial a}{\partial z}\frac{\partial z}{\partial b}=\frac{1}{m}np.sum((\frac{1-y}{1-a}-\frac{y}{a})*a(1-a))$$

Gradient descent algorithm:
$$weights=weights-learning\_rate*dw$$
$$bias=bias-learning\_rate*db$$

6. Repeat step 2-5 until the convergence of cost function

7. The complete logistic regression code

import numpy as np

class Logistic_regression:
    """Logistic regression classifier

    === Public Attributes ===
    learning_rate:
        A hyper-parameter that controls how much we are adjusting the weights
        of the network with respect the loss gradient.
    num_iterations:
        Number of iterations of the optimization loop.
    weight:
        This determines the strength of the connection of the neurons.
    bias:
        Bias neurons allow the output of an activation function to be shifted.
    """
    learning_rate: float
    num_iterations: int
    weight: np.array
    bias: np.array

    def __init__(self, learning_rate, num_iterations) -> None:
        """Initialize a new Logistic_regression with the provided
        <learning_rate> and <num_iterations>
        """
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weight = np.array([0])
        self.bias = np.array([0])

    def net_input(self, x: np.array) -> np.array:
        """Calculate the input of the activation function.

        :param x: input data, of shape (n_x, n_samples)
        :return: the input of the activation function
        """
        return np.dot(self.weight, x) + self.bias

    def predict(self, x: np.array) -> np.array:
        """Linear part -> Activation function (sigmoid).  Calculate the output prediction.

        :param x: input data, of shape (n_x, n_samples)
        :return: the output prediction
        """
        return self.sigmoid(self.net_input(x))
    
    def sigmoid(self, z: np.array) -> np.array:
        """Sigmoid function.
        
        :param x: input of the activation function: z
        :return: output of the sigmoid function
        """
        return 1/(1+np.exp(-z))
    
    def gradient(self, x: np.array, y: np.array, y_pred: np.array) -> tuple:
        """Calculate the gradients: dw and db
        
        :param x: input data, of shape (n_x, n_samples)
        :param y: true label vector, of shape (1, n_samples)
        :param y_pred: predicted label vector, of shape (1, n_samples)
        :return: derivatives of cost function: J with respect to weight: w and bias: b
        """
        m = x.shape[1]
        epsilon = 10**-8
        da = np.divide(1-y, 1-y_pred+epsilon) - np.divide(y, y_pred+epsilon)
        dz = da*self.sigmoid(self.net_input(x))*(1-self.sigmoid(self.net_input(x)))
        
        dw = np.dot(dz, x.T)/m
        db = np.sum(dz, axis=1, keepdims=True)/m
        return dw, db

    def fit(self, x: np.array, y: np.array) -> np.array:
        """Fit training data

        :param x: input data, of shape (n_x, n_samples)
        :param y: true label vector, of shape (1, n_samples)
        :return: the output prediction after training data
        """
        y_pred = np.array
        self.weight = np.random.randn(1, x.shape[0])*0.01
        self.bias = np.zeros((1, 1))

        for i in range(self.num_iterations):
            y_pred = self.predict(x)

            dw, db = self.gradient(x, y, y_pred)

            self.weight -= self.learning_rate*dw
            self.bias -= self.learning_rate*db

        return np.where(y_pred >= 0.5, 1, 0)

logistic_regression.py

Training the logistic regression model on the iris dataset

We are going to apply this logistic regression algorithm to the iris dataset, which is very similar to what we did with Perceptron algorithm (please refer to the previous article).

The iris dataset consists of:
- 150 samples
- 3 labels (species of iris): $setosa, virginica, versicolor$
- 4 features: $sepal\:length, sepal\:width, petal\:length, petal\:width (in\:cm)$

For this example, we will only use $setosa$ and $versicolor$ for labels, $sepal\:length$ and $petal\:length$ for features.

import pandas as pd
from sklearn import datasets

# acquire Data
iris = datasets.load_iris()
X = iris.data
y = iris.target

# select only "setosa" and "versicolor"
# extract only "sepal length" and "petal length"
X = np.delete(X, [1, 3], axis=1)
delete_target = np.where(y == 2)
y = np.delete(y, delete_target)
X = np.delete(X, delete_target, axis=0)
y = np.where(y == 0, -1, 1)

# visualize the data using Pandas DataFrame
pd.set_option('display.max_rows', 9)
pd.DataFrame({'sepal length (cm)': X[:, 0], 'petal length (cm)': X[:, 1], 
              'target (1: versicolor, -1: setosa)': y}, 
              index=np.arange(1, len(X)+1))

iris_data.py

# plot data
setosa = np.where(y == 0)
versicolor = np.where(y == 1)
plt.scatter(X[setosa, 0], X[setosa, 1],
            color='red', marker='o', label='setosa')
plt.scatter(X[versicolor, 0], X[versicolor, 1],
            color='blue', marker='x', label='versicolor')
plt.title('Training a logistic regression model on the Iris dataset')
plt.xlabel('sepal length (cm)')
plt.ylabel('petal length (cm)')
plt.legend(loc='upper left')
plt.show()