Neural Networks Β· Neural Networks Β· 5 min read
The activation function
In the last lesson a neuron produced a raw number that could be any size. On its own that number is just a straight-line combination of the inputs. To build something that can bend and curve, each neuron passes through an activation function, a fixed nonlinear function applied to the output.
Without a nonlinear activation, a deep stack of neurons collapses into a single linear function, no matter how many layers you add.
Squashing the sum
The activation function takes the neuronβs raw output and returns the neuronβs final output :
Here is the weighted sum from the previous lesson, is the chosen activation function, and is the activated output that the neuron actually passes on. Two common choices:
ReLU returns when is positive and returns otherwise. It keeps positive signals and discards negative ones.
This is the sigmoid. Here is the base of the natural exponential, and is the raw input. The output is always between and , so it never blows up no matter how large becomes.
A worked example
Take the value from the previous lesson, , and apply each function:
In code:
import math
# the neuron's raw weighted sum from the previous lesson
z = -1.0
# ReLU keeps positive values and replaces negatives with zero
relu = max(0.0, z)
# sigmoid squashes any number into the range between 0 and 1
sigmoid = 1 / (1 + math.exp(-z))
print(relu) # 0.0
print(sigmoid) # 0.2689414213699951
Why the nonlinearity matters
Suppose neurons had no activation, so each layer just computed a weighted sum. Stacking two such layers would give , where and are the two layersβ weights. The product is itself just one matrix, so the whole stack reduces to a single weighted sum. Adding more layers changes nothing. The nonlinear placed between layers is exactly what breaks this collapse and lets depth add real power.
In the next lesson, we will line up many of these neurons side by side into a single layer and write the whole layer as one compact equation.