(b) Is there a non-prefix code that has a lower expected code word length than the most optimal prefix code? If so, give an example.
Yes, there can be a non-prefix code which has codeword length less than the most optimal prefix code. As the code need not be uniquely decodable there is no restriction on the assignment of the codewords. Consider the following non-prefix code.


Exam 2

Problem No. 1: Entropy Coding

The alphabet A = {0. 1. 2. 3. 4} has a pdf  = {0.4, 0.3, 0.2, 0.05, 0.05}.

(a) Compute the average codeword length for an optimal code.

Huffman code is the optimal code and is computed from the probabilities of occurrence of the symbols.

Symbol	P(A)	codeword	length
0	0.4	1	1
1	0.3	00	2
2	0.2	010	3
3	0.05	0110	4
4	0.05	0111	4


Average Codeword Length 


Construction of the Huffman Code:

	Example of a non-prefix code
	
Symbol	P(A)	codeword	length
0	0.4	0	1
1	0.3	1	1
2	0.2	00	2
3	0.05	01	2
4	0.05	10	2


Average codeword length 

                                                     			
So, it is demonstrated that a non-prefix code with codeword length less than that of an optimal code exists.

(c) Suppose sending a codeword through this channel costs an amount of money non linearly
related to the codeword length: . For example the cost of a codeword of length 2 is $4, and the cost of a codeword of length 3 is $8. Design a code that delivers a good compression rate, it minimizes the overall cost.

A good compression ratio can be achieved when the bit rate is minimized
Consider an optimal code i.e, huffman code.

	From the codeword length of each symbol, the cost can be computed as:

	
= $46

Consider a non-prefix code: for example take the non-prefix code shown in (b)

	From the codeword length of each symbol, the cost for the non-prefix code is


		= $16.

Thus the non-prefix code minimizes the cost ad also delivers good compression ratio.

Problem No. 2: Binary Symmetric Channels (BSC) and Capacity. For the BSC with transition probabilities :

(a) Find the value of P that minimizes the capacity of the channel. Explain

The mutual information for the Binary Symmetric Channel is given by 1 - H(p).
For the capacity to be minimum, H(P) should be maximum. The maximum value 
of H(P) is 1. We know that H(P) is 1 when p = 0.5

Hence the value of P that minimizes the capacity of a binary symmetric channel is 0.5

(b) Compute the capacity of the cascade of two channels.

Two Binary Symmetric channels connected in cascade is given below:

	                                                                      
The equivalent as a single symmetric channel is 

                                               
The capacity of the cascade of two channels is


(c) Explain to what value the capacity of n identical BSC channels converges as n becomes large

We have seen from the capacity of the two cascaded channels that  increases when compared to the single channel. Similarly as  increases, increases still further and capacity decreases. Hence as  is sufficiently large the capacity converges to zero.

Problem No. 3: Differential Entropy

(a) A continuous random variable has a pdf:  where . Compute the entropy.

	 	
	using integration by parts,


(b) A Continuous random variable has a pdf:  where . Compute the entropy and compare to (a). Explain any differences. 
			
	
Integrating by parts, we get


The difference in the values from (a) is due to the scaling theorem. According to the scaling theorem . hence the difference from the answer of the first part.


(c) A continuous random variable has a pdf:  where . Compute entropy. Explain.

Translation theorem can be used to compute the entropy. Translation theorem states . Since  is the translation of , the entropy is the same. Hence,


(d) Compute the relative entropy between  (in part a) and  (in part b): D(f||g).


Integrating by parts, we get