Source Coding
Ensuring bits are used efficiently with the help of compress.
Compression
Lossless
- Nothing is lost
- Reversible
- Accuracy
e.g. financial record, scientific data
Lossy
- Not reversible
- Higher compression ratio
- Remove features that are not noticeable
e.g. MP3, JPEG (Joint Photographic Experts Group), DVD
Fixed-Length Code
We need a minimum of bits for symbols
Variable-Length Code
Taking into consideration of frequency of occurrence of the symbols
Prefix Code
A code which no codeword is a prefix of another codeword
In terms of a tree, all codewords must be at the leaves of the code tree so that they are uniquely decodable.
Huffman Code
- Reversible
- Most frequent symbols have shorter codewords
- Solutions are not unique, but they have the same average bits/symbol
Developed by David Albert Huffman
Steps
- Compute the relative frequency for each symbol
- Sort by descending order to draw the leaf nodes
- Combine the most infrequent two symbols
- Compute the new relative frequency (addition of the two original ones)
- Assign 0 and 1 to based on some kind of rules (e.g. based on higher probability)
- Repeat 1-5 until reaching root node
- Convert into prefix code
Average Bits Per Symbol
A symbol can be a number of symbols, divide this by that number to get the “real” average bits per symbol (e.g. AA, BB, AB, BA).
This can never be smaller than the entropy
MPEG (Moving Picture Experts Group)
MP3 (Audio Layer 3 of the MPEG-1 Multimedia Coding Standard)
Uses lossy compression to reduce to data required to store and transmit audio files
Coding in done in the frequency domain instead of time domain (similar to how human perceive sound)
Perceptual noise shaping
Eliminate the following parts to reduce file size, without changing the sound noticeably
- We cannot hear certain frequency
- We can hear some frequency better than others
- We can only hear the louder sound, when there are sounds with adjacent frequencies
MPEG-2
Applications
- Broadcast satellite
- Cable TV
- Commercial DVD moves
- HDTV
Techniques
- Inter-frame prediction
- Coding in the frequency domain (Discrete Cosine Transform/DCT)
MPEG-4
Applications
- Web
- CD distribution
- Conversation
- Video-conference
- Broadcast television
- Motion estimation
Techniques
- Motion estimation (predicting how blocks of pixels will move)
Channel Coding
Detecting errors with the help of redundant bits
Code Rate
Closer to 1 is better
For N x N two-dimension parity code,
Another representation is (total bits, data bits)
e.g. (8, 4) means transmit 8 bits for every 4 data bits,
Problems
- Attenuation
- Distortion
- Noise
Parity Check
- Even parity (number of 1s is even)
- Odd parity (number of 1s is odd)
Only error detection of odd number of errors
HKID
- Write down the 7 digitals (e.g. A123456)
- Map the digitals into number (A → 1, Z → 26)
- Multiply by 8 → 2
- Find their sum
- Divide by 11
- 11 - remainder is the check digit (if the result is 10, the check digital is A)
Repetition Code
Repeat the input bit for at least 3 times
- Cannot correct 2-bit or 3-bit errors
- Inefficient
Two-Dimension Parity Code
Can fix single-bit error
- Arrange data in N x N block
- Add the parity bit for each row and column
Block Error Rate
Take 3 bits block (111) as an example, with
| 0 bit wrong | 1 bit wrong | 2 bits wrong | 3 bits wrong |
|---|---|---|---|
| 110 | 001 | 000 | |
| 101 | 010 | ||
| 011 | 100 |
Since 1 bit wrong can be correct, the error rate is the probability of 2 bits wrong and 3 bits wrong
Channel Capacity
if coding rate < capacity, we can correct all error bits when number of bits become large