Artificial Neural Network

  • Universal function approximator
  • Inspired from neurons in our brain
  • Most powerful artificial intelligence and machine learning algorithms
Biological NeuronArtificial Neuron
DendritesInputs
Cell Nucleus (computation unit)Node (linear function and activation function)
AxonOutput
SynapseWeight

Preceptron

If the output and target differs, weights are updated such that the output will be closer to target

VariableMeaning
Inputs
Weights
Bias
Activation function
Learning rate
Output
Target

Stopping Rules

  • Maximum training time
  • Maximum number of training cycles (epoch)
  • High enough accuracy
  • Low enough error
  • Weight change threshold

Cons

  • Can only represent limited set of function
  • Can only distinguish something that is linearly separable

Multilayer Perceptron

  • Feed-forward neural network (no cycle)
  • Input, hidden (>=1), output layer (i, j, k)
  • Minimize the error/loss function

Steps

  1. Initialize weights and biases to some small random values
  2. Forward propagation (from input to output)
    1. For the n-th node in the m-th layer, compute the sum
    2. Compute the output
  3. Backward propagation (from output to hidden)
    1. Compute
    2. Update
    3. Update
  4. Backward propagation (from hidden to input)
    1. Compute
    2. Update
  1. Repeat from step 2 until stopping criteria is met

The formulas are for the sigmoid function

Stopping Criteria

  • Fixed number of iterations
  • Error falls below threshold
  • Minimum of the error on the validation set

Gradient Descent

To reach a local minimum, we minimize by following the negative of the gradient, and update the initial guess by

  • +ve slope decrease weights and biases
  • -ve slope increase weights and biases

Implementation

Steps

  1. Import the required libraries and define a global variable
  2. Load the data
  3. Explore the data
  4. Build the model
  5. Compile the model
  6. Train the model
  7. Evaluate the model accuracy
  8. Save the model
  9. Use the model
  10. Plotting the confusion matrix

Code

  • kernel_regularizer=regularizers.l2(0.002) to avoid overfitting
  • activation=activations.relu or activation='relu

How to Calculate Param of Dense Layer

  • Flatten: (None, 784)
  • Dense: (None, 128), param: 784 x 128 weights + 128 bias
  • Dense: (None, 128), param: 128 x 128 weights + 128 bias
  • Dense: (None, 10), param: 128 * 10 weights + 10 bias

Validation

  • validation loss
    • average of (error = 1 - x/n), a perfect label have a probability of 1
  • validation accuracy

Model Saving

  • HDFS (Hadoop Distributed File System)
model_name = ""
mode.save(model_name, save_format="h5")
 
loaded_model = load_model(model_name)

Predication

predictions = loaded_model.predict([x_test])
print('predictions:', predictions.shape)
prediction_results = np.argmax(predictions, axis=1)

Confusion Matrix

# First parameter is actual label, second one is prediction
cm = confusion_matrix(y_test, prediction_results)

Layers and Neurons

  • 1 input layer
    • Number of neurons = number of features
  • 1 output layer
    • Number of neurons = mostly 1 (unless softmax)
  • Hidden layers
    • Number of layers
      • Linearly separable 0
      • Less complex 1 to 2
      • More complex 3 to 5
    • Number of neurons
      • Between the size of the input and output layer
      • Decreasing in subsequent layers to get closer to pattern and feature extraction

Weights and Biases

  • Weights control the steepness of the activation function
    • Higher weight steeper slope
    • Lower weight softer slope
  • Biases is for shifting the activation function left/right
    • Smaller bias right
    • Larger bias left

Problems

Vanishing Gradient

  • Parameters of the higher layers vary drastically
  • Parameters of the lower levels do not change significantly
  • Weight may become zero
  • Learns slowly, even stagnant

Exploding Gradient

  • All parameters grow exponentially
  • Weights may become NaN
  • Avalanche learning process

Overfitting

  • Learns details and noise
  • Use regularizer to add some error, to avoid overfitting

Underfitting

  • Cannot generalize to new data

Activation Functions

Neural NetworkCommonly Used Activation Fucntion
MLPReLU
CNNReLU
RNNTanh/Sigmoid
ScenerioActivation Function for Output Layer
RegressionLinear
Binary ClassificationOne node, sigmoid
Multiclass ClassificationOne node per class, softmax
Multilabel ClassificationOne node per class, sigmoid

Linear

Softmax

  • Probability values
  • For multi-class classification problems
  • For negative values, would give positive values

ReLU

  • Most common and simple
  • Less susceptible to vanishing gradient
  • “He Normal” or “He Uniform” to scale input to the range 0 to 1

Sigmoid

  • Hidden layer: “Glorot Normal” or “Glorot Uniform” (or Xavier) to scale input to the range -1 to 1
  • Output layer: 0 to 1

Tanh

  • “Glorot Normal” or “Glorot Uniform” (or Xavier) to scale input to the range -1 to 1