Source Coding

Ensuring bits are used efficiently with the help of compress.

Compression

Lossless

  • Nothing is lost
  • Reversible
  • Accuracy

e.g. financial record, scientific data

Lossy

  • Not reversible
  • Higher compression ratio
  • Remove features that are not noticeable

e.g. MP3, JPEG (Joint Photographic Experts Group), DVD

Fixed-Length Code

We need a minimum of bits for symbols

Variable-Length Code

Taking into consideration of frequency of occurrence of the symbols

Prefix Code

A code which no codeword is a prefix of another codeword

In terms of a tree, all codewords must be at the leaves of the code tree so that they are uniquely decodable.

Huffman Code

  • Reversible
  • Most frequent symbols have shorter codewords
  • Solutions are not unique, but they have the same average bits/symbol

Developed by David Albert Huffman

Steps
  1. Compute the relative frequency for each symbol
  2. Sort by descending order to draw the leaf nodes
  3. Combine the most infrequent two symbols
  4. Compute the new relative frequency (addition of the two original ones)
  5. Assign 0 and 1 to based on some kind of rules (e.g. based on higher probability)
  6. Repeat 1-5 until reaching root node
  7. Convert into prefix code

Average Bits Per Symbol

A symbol can be a number of symbols, divide this by that number to get the “real” average bits per symbol (e.g. AA, BB, AB, BA).

This can never be smaller than the entropy

MPEG (Moving Picture Experts Group)

MP3 (Audio Layer 3 of the MPEG-1 Multimedia Coding Standard)

Uses lossy compression to reduce to data required to store and transmit audio files

Coding in done in the frequency domain instead of time domain (similar to how human perceive sound)

Perceptual noise shaping

Eliminate the following parts to reduce file size, without changing the sound noticeably

  • We cannot hear certain frequency
  • We can hear some frequency better than others
  • We can only hear the louder sound, when there are sounds with adjacent frequencies

MPEG-2

Applications
  • Broadcast satellite
  • Cable TV
  • Commercial DVD moves
  • HDTV
Techniques
  • Inter-frame prediction
  • Coding in the frequency domain (Discrete Cosine Transform/DCT)

MPEG-4

Applications
  • Web
  • CD distribution
  • Conversation
  • Video-conference
  • Broadcast television
  • Motion estimation
Techniques
  • Motion estimation (predicting how blocks of pixels will move)

Channel Coding

Detecting errors with the help of redundant bits

Code Rate

Closer to 1 is better
For N x N two-dimension parity code,

Another representation is (total bits, data bits)
e.g. (8, 4) means transmit 8 bits for every 4 data bits,

Problems

  • Attenuation
  • Distortion
  • Noise

Parity Check

  • Even parity (number of 1s is even)
  • Odd parity (number of 1s is odd)

Only error detection of odd number of errors

HKID

  1. Write down the 7 digitals (e.g. A123456)
  2. Map the digitals into number (A 1, Z 26)
  3. Multiply by 8 2
  4. Find their sum
  5. Divide by 11
  6. 11 - remainder is the check digit (if the result is 10, the check digital is A)

Repetition Code

Repeat the input bit for at least 3 times

  • Cannot correct 2-bit or 3-bit errors
  • Inefficient

Two-Dimension Parity Code

Can fix single-bit error

  1. Arrange data in N x N block
  2. Add the parity bit for each row and column

Block Error Rate

Take 3 bits block (111) as an example, with

0 bit wrong1 bit wrong2 bits wrong3 bits wrong
110001000
101010
011100

Since 1 bit wrong can be correct, the error rate is the probability of 2 bits wrong and 3 bits wrong

Channel Capacity

if coding rate < capacity, we can correct all error bits when number of bits become large