Image Processing

Coordinate System

  • x is increasing from left-to-right
  • y is increasing from top-to-bottom
  • origin (0, 0) is at top-left corner

Showing Image

plt.imshow(X, cmap, vmin, vmax)

Grayscale

Based on the brightness of RGB

  • Reduce amount of memory if color information is not useful
cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

Affine Transformation

  • Preserves collinearity, parallelism, and the ratio of distances between the points
  • Does not necessarily preserve distances and angles
  • e.g. translation, rotation, scaling, shearing
cv2.warpAffine(src, M, dsize, flags, borderMode, borderValue)
# M is the tranformation matrix (2, 3)

Translation

Reflection

X-axis
Y-Axis

rows is height, cols is width

Rotation

Degrees are in radian
is the the rotation center

# Form the transformation matrix of rotation
# The angle for math.sin and math.cos should be in radian.
# 45 degree = pi/4 radian
angle = math.pi/4
 
M = np.float32([[math.cos(angle), math.sin(angle),
-(cols//2)*math.cos(angle)-(rows//2)*math.sin(angle) + (cols//2)],
[-math.sin(angle), math.cos(angle),
(cols//2)*math.sin(angle)-(rows//2)*math.cos(angle) + (rows//2)]])
 
# Another way to generate the required transformation matrix
# The angle for getRotationMatrix2D should be in degree
# 1.0 is the scale factor
M = cv2.getRotationMatrix2D((cols//2,rows//2), 45, 1.0)

Scaling/Resizing

cv2.resize(src, dsize, dst, fx = 0, fy = 0, interpolation = INTER_LINEAR)
  • Interpolation (speed high to low, quality low to high)
    • cv2.INTER_NEARNEST (nearest neighbor interpolation)
    • cv2.INTER_LINEAR (bilinear interpolation)
    • cv2.INTER_CUBIC (bicubic interpolation)

aspect ratio is width:height

Image Operations

  • Point (same coordinate)
    • Brightness adjustment
    • Contrast stretching
    • Gamma correction
    • Grayscale threshold
      • If is larger, fewer pixel is bigger than , more black, less white
      • Otsu’s method
        1. Select an initial estimate of the threshold T. A good initial value is the average intensity of the image.
        2. Partition the image into two groups, R1, R2, using the threshold T
        3. Calculate the mean gray values μ1 and μ2 of the partitions, R1, R2
        4. Compute a new threshold
        5. Repeat steps 2-4 until the mean values μ1 and μ2 in successive iterations do not change
    • Histogram equalization
  • Local (neighborhood)
    • Image smoothing
    • Image edge detection
    • Image sharpening
  • Global (image)

using imread to read PNG would return pixel in range [0, 1]

Image Convolution

  1. Left right flip
  2. Top bottom flip
  3. Convolute
  • Resulting image size is calculated by <…>

  • Ignore boundary, OR

  • Zero padding (fastest)

  • Replicate

  • Reflection, edge is boundary

  • Mirror, cell is boundary

# ddepth (e.g. 3 for rgb, 4 for rgba, -1 for original)
# does not handle flipping, need to do it yourself first
cv2.filter2D(src, ddepth, kernel, dst, anchor, delta, borderType=cv2.BORDER_DEFAULT)
  • Prefer small kernel convolute many times, it is cheaper in computation

  • If total sum of kernel value = 1, all +ve, smoothing kernel

  • If total sum of kernel value = 0, some +ve some -ve, sharpening kernel

  • If apply sharpening kernel to pure color image, resulting values will be all 0

Convolutional Neural Network

  • Recognize/classify images

  • Can also be used for NLP, speech, recommendation, image segmentation, medical image analysis, financial time series

  • Based on color channels, we need to have n layers kernel (3 layers for RGB)

  • Based on number of features, we have n kernels

  • The output will have dimension of number of features, pixel-wise sum

  • For MLP we train weights and biases, for CNN, weight would be in the kernels, biases would be for each set of kernel

  • For each kernel set, number of parameters would be (width x height x no_kernel + 1) x no_kernel_sets

  • e.g. 28x28x3 with 1 5x5x3 kernel 24x24x1 output

  • Stride

    • Movement each time
    • Have to fit the image, otherwise cannot apply
  • Output size = (Size of image dimension - Size of kernel dimension) / Stride + 1

  • If output is not integer, it means does not fit

  • The bigger the stride, the smaller the output image

  • border pixel width = (K - 1) / 2, K is the kernel size

  • if we do padding, output size = ((original + padding) - kernel size) / stride + 1

Pooling Layer

  • Max pooling

  • Averaging pooling

  • Picker up larger scale details

  • Reduce dimension

Training Process

  1. Initialize all kernels and parameters/weights with random values
  2. Takes input image, do forward propagation, find the probability of each class
  3. Calculate total error at the output layer
  4. Backpropagation to update all kernel values/weights
  5. Repeated 2-4 with images in training set

Dropout Layer

  • Prevent overfitting
  • Dropout ratio between 0 to 1, probability value
  • Dropped value will be set to 0, or ignored

One-Hot Encoding

  • Use in a situation where data has no relation to each other
  • Prevent the problem of seeing bigger value is more important

layer calculation example

trainable parameters = num_kernels * (kernel_size + 1)

model = Sequential() # Create a Sequential object
# Add a convolutional layer with 32 kernels, each of size 3x3
# Use ReLU activation function, padding="valid", strides=(1,1)
# Specify the input size to this convolutional layer: (28,28,1)
# Note: Input size needs to be specified for the first layer only
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
# Add another convolutional layer with 64 kernels, each of size 3x3
# Use ReLU activation function, padding="valid", strides=(1,1)
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
# Add a max pooling layer of size 2 x 2
model.add(MaxPooling2D(pool_size=(2, 2)))
# Add a dropout layer to prevent a model from overfitting
model.add(Dropout(0.25))
# Add a flatten layer to convert the pooled data to a single column
# that is passed to the fully-connected layer
model.add(Flatten())
# Add a dense layer (fully-connected layer) and use ReLU activation function
model.add(Dense(units=128, activation='relu'))
# Add a dropout layer tpo prevent a model from overfitting
model.add(Dropout(0.5))
# Add a dense layer (fully-connected layer) and use Softmax activation function
model.add(Dense(units=num_classes, activation='softmax'))
 
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320            28-3+1, 32*(3*3*1+1)
conv2d_1 (Conv2D) (None, 24, 24, 64) 18496        26-3+1, 64*(3*3*32+1)
max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0
dropout (Dropout) (None, 12, 12, 64) 0
flatten (Flatten) (None, 9216) 0
dense (Dense) (None, 128) 1179776                 9216*128+128
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 10) 1290                   128*10+10
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________

Calculations

Output Size

  • n - dimension
  • f - filter
  • p - padding
  • s - stride

Parameters