2019年8月13日 更新

# Machine Learning with Python Part.2 ~Logistic Regression~

Implementing the logistic regression classifier in Python

288 view 0

## What is logistic regression?

Logistic regression is a classification algorithm used to assign data into a discrete set of classes. There are multiple types of logistic regression: binary(yes/no, pass/fail), multi(cats/dogs/rats), and ordinal(small, medium, high). The purpose of binary logistic regression is similar to that of Perceptron, but there is a key difference: activation function.

Activation function for Perceptron: Binary step function

$$\phi (z)=\begin{cases} 1 & \text{ if } z>0\\ -1 & \text{ if } z\leq 0 \end{cases}$$

Activation function for logistic regression: Sigmoid function

$$\phi(z)=\frac{1}{1+e^{-z}}$$

Let's graph this function using Numpy and Matplotlib.
import matplotlib.pyplot as plt
import numpy as np

def sigmoid(z):
return 1.0/ (1.0 + np.exp(-z))

x = np.arange(-7,7,0.1)
plt.plot(x,sigmoid(x))

sigmoid_plot.py
As you can see from this graph, it exists between 0 and 1. Since probability of anything exists between the range of 0 and 1, Sigmoid function is used to predict the probability as an output.

For the binary classification problem, we set 0.5 as a threshold and define the output as:

$$\hat{y}=\begin{cases}1 & \text{ if } \phi(z)\geq 0.5 \\ 0 & \text{ if } \phi(z)< 0.5 \end{cases}$$

## How the logistic regression works?

Some steps are very similar to the Perceptron algorithm, so please refer to the below link if you are interested in the Perceptron algorithm or would like to go over some basic concepts of machine learning.

### 3. Apply activation function

As already explained, sigmoid function instead of binary step function is used for logistic regression.
def sigmoid(self, z: np.array) -> np.array:
"""Sigmoid function.

:param x: input of the activation function: z
:return: output of the sigmoid function
"""
return 1/(1+np.exp(-z))
sigmoid_function.py

### 4. Calculate the cost

We will still use the cost function to calculate the model's performance, but instead of using the mean squared error, we will use the cross-entropy loss function. The loss will increase as the predicted probability diverges from the actual label.

$$-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}log(\phi(z))+(1-y^{(i)})log(1-\phi(z)))$$

where $m$: number of examples, $y$: true label vector, $\phi(z)$: output prediction vector.
31 件