Image Processing

Coordinate System

x is increasing from left-to-right
y is increasing from top-to-bottom
origin (0, 0) is at top-left corner

Showing Image

plt.imshow(X, cmap, vmin, vmax)

Grayscale

V = 0.299 \times R + 0.587 \times G + 0.114 \times B

Based on the brightness of RGB

Reduce amount of memory if color information is not useful

cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

Affine Transformation

Preserves collinearity, parallelism, and the ratio of distances between the points
Does not necessarily preserve distances and angles
e.g. translation, rotation, scaling, shearing

cv2.warpAffine(src, M, dsize, flags, borderMode, borderValue)
# M is the tranformation matrix (2, 3)

Translation

x^{'} = x + t_{x} y^{'} = y + t_{y}

[x^{'} y^{'}] = [1001 t_{x} t_{y}] x y 1

Reflection

X-axis

x^{'} = x y = - y + (rows - 1)

[x^{'} y^{'}] = [10 0 - 1 0 rows - 1] x y 1

Y-Axis

x^{'} = - x + (cols - 1) y^{'} = y

[x^{'} y^{'}] = [- 1 0 01 cols - 1 0] x y 1

rows is height, cols is width

Rotation

x^{'} = (x - x_{0}) cos θ + (y - y_{0}) sin θ + x_{0} y^{'} = - (x - x_{0}) sin θ + (y - y_{0}) cos θ + y_{0}

[x^{'} y^{'}] = [cos θ - sin θ sin θ cos θ - x_{0} cos θ - y_{0} sin θ + x_{0} x_{0} sin θ - y_{0} cos θ + y_{0}] x y 1

Degrees are in radian
$(x_{0}, y_{0})$ is the the rotation center

\frac{θ}{180} = \frac{r}{π} r = \frac{π θ}{180}

# Form the transformation matrix of rotation
# The angle for math.sin and math.cos should be in radian.
# 45 degree = pi/4 radian
angle = math.pi/4
 
M = np.float32([[math.cos(angle), math.sin(angle),
-(cols//2)*math.cos(angle)-(rows//2)*math.sin(angle) + (cols//2)],
[-math.sin(angle), math.cos(angle),
(cols//2)*math.sin(angle)-(rows//2)*math.cos(angle) + (rows//2)]])
 
# Another way to generate the required transformation matrix
# The angle for getRotationMatrix2D should be in degree
# 1.0 is the scale factor
M = cv2.getRotationMatrix2D((cols//2,rows//2), 45, 1.0)

Scaling/Resizing

cv2.resize(src, dsize, dst, fx = 0, fy = 0, interpolation = INTER_LINEAR)

Interpolation (speed high to low, quality low to high)
- cv2.INTER_NEARNEST (nearest neighbor interpolation)
- cv2.INTER_LINEAR (bilinear interpolation)
- cv2.INTER_CUBIC (bicubic interpolation)

aspect ratio is width:height

Image Operations

Point (same coordinate)
- Brightness adjustment
- Contrast stretching $I_{n e w} = \frac{I - I _{min}}{I _{ma x} - I _{min}} \times 255$
- Gamma correction
- Grayscale threshold $I_{n e w} = {0, I < T 255, I \geq T$
  - If $T$ is larger, fewer pixel is bigger than $T$ , more black, less white
  - Otsu’s method
    1. Select an initial estimate of the threshold T. A good initial value is the average intensity of the image.
    2. Partition the image into two groups, R1, R2, using the threshold T
    3. Calculate the mean gray values μ1 and μ2 of the partitions, R1, R2
    4. Compute a new threshold $T = \frac{1}{2} (μ_{1} + μ_{2})$
    5. Repeat steps 2-4 until the mean values μ1 and μ2 in successive iterations do not change
- Histogram equalization
Local (neighborhood)
- Image smoothing
- Image edge detection
- Image sharpening
Global (image)

using imread to read PNG would return pixel in range [0, 1]

Image Convolution

Left right flip
Top bottom flip
Convolute

Resulting image size is calculated by <…>
Ignore boundary, OR
Zero padding (fastest)
Replicate
Reflection, edge is boundary
Mirror, cell is boundary

# ddepth (e.g. 3 for rgb, 4 for rgba, -1 for original)
# does not handle flipping, need to do it yourself first
cv2.filter2D(src, ddepth, kernel, dst, anchor, delta, borderType=cv2.BORDER_DEFAULT)

Prefer small kernel convolute many times, it is cheaper in computation
If total sum of kernel value = 1, all +ve, smoothing kernel
If total sum of kernel value = 0, some +ve some -ve, sharpening kernel
If apply sharpening kernel to pure color image, resulting values will be all 0

Convolutional Neural Network

Recognize/classify images
Can also be used for NLP, speech, recommendation, image segmentation, medical image analysis, financial time series
Based on color channels, we need to have n layers kernel (3 layers for RGB)
Based on number of features, we have n kernels
The output will have dimension of number of features, pixel-wise sum
For MLP we train weights and biases, for CNN, weight would be in the kernels, biases would be for each set of kernel
For each kernel set, number of parameters would be (width x height x no_kernel + 1) x no_kernel_sets
e.g. 28x28x3 with 1 5x5x3 kernel ⇒ 24x24x1 output
Stride
- Movement each time
- Have to fit the image, otherwise cannot apply
Output size = (Size of image dimension - Size of kernel dimension) / Stride + 1
If output is not integer, it means does not fit
The bigger the stride, the smaller the output image
border pixel width = (K - 1) / 2, K is the kernel size
if we do padding, output size = ((original + padding) - kernel size) / stride + 1

Pooling Layer

Max pooling
Averaging pooling
Picker up larger scale details
Reduce dimension

Training Process

Initialize all kernels and parameters/weights with random values
Takes input image, do forward propagation, find the probability of each class
Calculate total error at the output layer
Backpropagation to update all kernel values/weights
Repeated 2-4 with images in training set

Dropout Layer

Prevent overfitting
Dropout ratio between 0 to 1, probability value
Dropped value will be set to 0, or ignored

One-Hot Encoding

Use in a situation where data has no relation to each other
Prevent the problem of seeing bigger value is more important

layer calculation example

trainable parameters = num_kernels * (kernel_size + 1)

model = Sequential() # Create a Sequential object
# Add a convolutional layer with 32 kernels, each of size 3x3
# Use ReLU activation function, padding="valid", strides=(1,1)
# Specify the input size to this convolutional layer: (28,28,1)
# Note: Input size needs to be specified for the first layer only
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
# Add another convolutional layer with 64 kernels, each of size 3x3
# Use ReLU activation function, padding="valid", strides=(1,1)
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
# Add a max pooling layer of size 2 x 2
model.add(MaxPooling2D(pool_size=(2, 2)))
# Add a dropout layer to prevent a model from overfitting
model.add(Dropout(0.25))
# Add a flatten layer to convert the pooled data to a single column
# that is passed to the fully-connected layer
model.add(Flatten())
# Add a dense layer (fully-connected layer) and use ReLU activation function
model.add(Dense(units=128, activation='relu'))
# Add a dropout layer tpo prevent a model from overfitting
model.add(Dropout(0.5))
# Add a dense layer (fully-connected layer) and use Softmax activation function
model.add(Dense(units=num_classes, activation='softmax'))
 
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320            28-3+1, 32*(3*3*1+1)
conv2d_1 (Conv2D) (None, 24, 24, 64) 18496        26-3+1, 64*(3*3*32+1)
max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0
dropout (Dropout) (None, 12, 12, 64) 0
flatten (Flatten) (None, 9216) 0
dense (Dense) (None, 128) 1179776                 9216*128+128
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 10) 1290                   128*10+10
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________

Calculations

Output Size

n - dimension
f - filter
p - padding
s - stride

⌊ \frac{n - f + 2 \times p}{s} ⌋ + 1

🏡

Explorer

Image Processing

Image Processing

Coordinate System

Showing Image

Grayscale

Affine Transformation

Translation

Reflection

X-axis

Y-Axis

Rotation

Scaling/Resizing

Image Operations

Image Convolution

Convolutional Neural Network

Pooling Layer

Training Process

Dropout Layer

One-Hot Encoding

Calculations

Output Size

Parameters

Explorer

Table of Contents

Backlinks