2019年7月27日 更新

# Machine Learning with Python Part.1 ~Perceptron~

Implementing the Perceptron algorithm in Python

3,285 view 1

## What is Perceptron?

Perceptron is one of the simplest types of artificial neural network and invented by Frank Rosenblatt in 1957. A single layer Perceptron is typically used for binary classification problems (1 or 0, Yes or No). The goal of Perceptron is to estimate the parameters that best predict the outcome, given the input features. The optimal parameters should yield the best approximation of decision boundary which separates the input data in two parts. For data that has non-linear decision boundary, more complicated algorithm such as deep learning instead of Perceptron is required.

## How the Perceptron works

### 1. Create and initialize the parameters of the network

In Perceptron, a single layer has two kinds of parameters: weights and biases.
Bias term is an additional parameter which is used to adjust the output along with the weighted sum of the inputs. Bias is a constant and often denoted as $w_0$.

Use random initialization for the weights: $np.random.randn(shape)*0.01$
Use zero initialization for the biases: $np.zeros(shape)$

### 2. Multiply weights by inputs and sum them up

Calculate the dot product of weights and inputs, and then add bias term to it. This operating can be done easily by using NumPy, which is the package for scientific computing in Python:
$np.dots(weights, inputs) + bias$
This sum value is usually called the input of the activation function, or pre-activation parameter.

### 3. Apply activation function

The purpose of applying activation function is to convert a input signal of a node to an output signal. In more detail, it is restricting outputs to a certain range or value which enhance the categorization of the data. There are many different types of activation functions. Perceptron uses binary step function, which is a threshold-based function:
$\phi (z)=\begin{cases} 1 & \text{ if } z>0\\ -1 & \text{ if } z\leq 0 \end{cases}$

### 4. Calculate the cost

In Perceptron, the mean squared error (MSE) cost function is used to estimate how badly models are performing. Our goal is to find the parameters that minimizes this cost value.
The formula of MSE is:
$\frac{1}{2m}\sum_{i=1}^{m}(y_i-\hat{y}_i)^2$
where $m$: number of examples, $y$: true label vector, $\hat{y}$: output prediction vector.

### 5. Update the parameters

We need to update weights and biases by using derivative of cost function.
Perceptron uses an optimization algorithm called gradient descent to update the parameters.
Gradient can be computed as follows:
$dw = \frac{1}{m}np.dot(inputs, y-\hat{y})$
$db = \frac{1}{m}np.sum(y-\hat{y})$
where $dw$: derivative of cost function with respect to weights, $db$: derivative of cost function with respect to bias

$weights = weights - learning\_rate*dw$
$bias = bias - learning\_rate*db$
where $learning\_rate$: Learning rate of the gradient descent update rule (0 < $\alpha$ < 1)

30 件