Image Processing
Coordinate System
- x is increasing from left-to-right
- y is increasing from top-to-bottom
- origin (0, 0) is at top-left corner
Showing Image
plt.imshow(X, cmap, vmin, vmax)Grayscale
Based on the brightness of RGB
- Reduce amount of memory if color information is not useful
cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)Affine Transformation
- Preserves collinearity, parallelism, and the ratio of distances between the points
- Does not necessarily preserve distances and angles
- e.g. translation, rotation, scaling, shearing
cv2.warpAffine(src, M, dsize, flags, borderMode, borderValue)
# M is the tranformation matrix (2, 3)Translation
Reflection
X-axis
Y-Axis
rows is height, cols is width
Rotation
Degrees are in radian
is the the rotation center
# Form the transformation matrix of rotation
# The angle for math.sin and math.cos should be in radian.
# 45 degree = pi/4 radian
angle = math.pi/4
M = np.float32([[math.cos(angle), math.sin(angle),
-(cols//2)*math.cos(angle)-(rows//2)*math.sin(angle) + (cols//2)],
[-math.sin(angle), math.cos(angle),
(cols//2)*math.sin(angle)-(rows//2)*math.cos(angle) + (rows//2)]])
# Another way to generate the required transformation matrix
# The angle for getRotationMatrix2D should be in degree
# 1.0 is the scale factor
M = cv2.getRotationMatrix2D((cols//2,rows//2), 45, 1.0)Scaling/Resizing
cv2.resize(src, dsize, dst, fx = 0, fy = 0, interpolation = INTER_LINEAR)- Interpolation (speed high to low, quality low to high)
cv2.INTER_NEARNEST (nearest neighbor interpolation)cv2.INTER_LINEAR (bilinear interpolation)cv2.INTER_CUBIC (bicubic interpolation)
aspect ratio is width:height
Image Operations
- Point (same coordinate)
- Brightness adjustment
- Contrast stretching
- Gamma correction
- Grayscale threshold
- If is larger, fewer pixel is bigger than , more black, less white
- Otsu’s method
- Select an initial estimate of the threshold T. A good initial value is the average intensity of the image.
- Partition the image into two groups, R1, R2, using the threshold T
- Calculate the mean gray values μ1 and μ2 of the partitions, R1, R2
- Compute a new threshold
- Repeat steps 2-4 until the mean values μ1 and μ2 in successive iterations do not change
- Histogram equalization
- Local (neighborhood)
- Image smoothing
- Image edge detection
- Image sharpening
- Global (image)
using imread to read PNG would return pixel in range [0, 1]
Image Convolution
- Left right flip
- Top bottom flip
- Convolute
-
Resulting image size is calculated by <…>
-
Ignore boundary, OR
-
Zero padding (fastest)
-
Replicate
-
Reflection, edge is boundary
-
Mirror, cell is boundary
# ddepth (e.g. 3 for rgb, 4 for rgba, -1 for original)
# does not handle flipping, need to do it yourself first
cv2.filter2D(src, ddepth, kernel, dst, anchor, delta, borderType=cv2.BORDER_DEFAULT)-
Prefer small kernel convolute many times, it is cheaper in computation
-
If total sum of kernel value = 1, all +ve, smoothing kernel
-
If total sum of kernel value = 0, some +ve some -ve, sharpening kernel
-
If apply sharpening kernel to pure color image, resulting values will be all 0
Convolutional Neural Network
-
Recognize/classify images
-
Can also be used for NLP, speech, recommendation, image segmentation, medical image analysis, financial time series
-
Based on color channels, we need to have n layers kernel (3 layers for RGB)
-
Based on number of features, we have n kernels
-
The output will have dimension of number of features, pixel-wise sum
-
For MLP we train weights and biases, for CNN, weight would be in the kernels, biases would be for each set of kernel
-
For each kernel set, number of parameters would be (width x height x no_kernel + 1) x no_kernel_sets
-
e.g. 28x28x3 with 1 5x5x3 kernel ⇒ 24x24x1 output
-
Stride
- Movement each time
- Have to fit the image, otherwise cannot apply
-
Output size = (Size of image dimension - Size of kernel dimension) / Stride + 1
-
If output is not integer, it means does not fit
-
The bigger the stride, the smaller the output image
-
border pixel width = (K - 1) / 2, K is the kernel size
-
if we do padding, output size = ((original + padding) - kernel size) / stride + 1
Pooling Layer
-
Max pooling
-
Averaging pooling
-
Picker up larger scale details
-
Reduce dimension
Training Process
- Initialize all kernels and parameters/weights with random values
- Takes input image, do forward propagation, find the probability of each class
- Calculate total error at the output layer
- Backpropagation to update all kernel values/weights
- Repeated 2-4 with images in training set
Dropout Layer
- Prevent overfitting
- Dropout ratio between 0 to 1, probability value
- Dropped value will be set to 0, or ignored
One-Hot Encoding
- Use in a situation where data has no relation to each other
- Prevent the problem of seeing bigger value is more important
layer calculation example
trainable parameters = num_kernels * (kernel_size + 1)
model = Sequential() # Create a Sequential object
# Add a convolutional layer with 32 kernels, each of size 3x3
# Use ReLU activation function, padding="valid", strides=(1,1)
# Specify the input size to this convolutional layer: (28,28,1)
# Note: Input size needs to be specified for the first layer only
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
# Add another convolutional layer with 64 kernels, each of size 3x3
# Use ReLU activation function, padding="valid", strides=(1,1)
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
# Add a max pooling layer of size 2 x 2
model.add(MaxPooling2D(pool_size=(2, 2)))
# Add a dropout layer to prevent a model from overfitting
model.add(Dropout(0.25))
# Add a flatten layer to convert the pooled data to a single column
# that is passed to the fully-connected layer
model.add(Flatten())
# Add a dense layer (fully-connected layer) and use ReLU activation function
model.add(Dense(units=128, activation='relu'))
# Add a dropout layer tpo prevent a model from overfitting
model.add(Dropout(0.5))
# Add a dense layer (fully-connected layer) and use Softmax activation function
model.add(Dense(units=num_classes, activation='softmax'))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320 28-3+1, 32*(3*3*1+1)
conv2d_1 (Conv2D) (None, 24, 24, 64) 18496 26-3+1, 64*(3*3*32+1)
max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0
dropout (Dropout) (None, 12, 12, 64) 0
flatten (Flatten) (None, 9216) 0
dense (Dense) (None, 128) 1179776 9216*128+128
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 10) 1290 128*10+10
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________Calculations
Output Size
- n - dimension
- f - filter
- p - padding
- s - stride