NPTEL Reinforcement Learning Week 4 Assignment Answers 2025

NPTEL Reinforcement Learning Week 4 Assignment Answers 2025

1. State True/False
The state transition graph for any MDP is a directed acyclic graph.

  • True
  • False
Answer :- For Answers Click Here 

2. Consider the following statements:
(i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function (v) ,without accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-valuefunction (q), without accessing the MDP parameters.

Which of these statements are true?

  • Only (ii)
  • Only (iii)
  • Only (i), (ii)
  • Only (i), (iii)
  • Only (ii), (iii)
Answer :- 

3. Which of the following is a benefit of using RL algorithms for solving MDPs?

  • They do not require the state of the agent for solving a MDP.
  • They do not require the action taken by the agent for solving a MDP.
  • They do not require the state transition probability matrix for solving a MDP.
  • They do not require the reward signal for solving a MDP.
Answer :- 

4. Consider the following equations:

Which of the above are correct?

  • Only (i)
  • Only (i), (ii)
  • Only (ii), (iii)
  • Only (i), (iii)
  • (i), (ii), (iii)
Answer :- 

5. State True/False
While solving MDPs, in case of discounted rewards, the value of γ (discount factor) cannot affect the optimal policy.

  • True
  • False
Answer :- 

6. Consider the following statements for a finite MDP (I is an identity matrix with dimensions |S|×|S|(S is the set of all states) and Pπ is a stochastic matrix):
(i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
(iii) If 0≤γ<1, then rank of the matrix I−γPπ is equal to |S|.
(iv) If 0≤γ<1, then rank of the matrix I−γPπis less than |S|.

Which of the above statements are true?

  • Only (ii), (iii)
  • Only (ii), (iv)
  • Only (i), (iii)
  • Only (i), (ii), (iii)
Answer :- For Answers Click Here 

7. Consider an MDP with 3 states A, B, C. From each state, we can go to either of the two states, i.e, from state A, we can perform 2 actions, that lead to state B and C respectively.The rewards for all the transitions are: r(A,B)=2 (reward if we go from A to B), r(B,A)=5,r(B,C)=7,r(C,B)=10,r(A,C)=1,r(C,A)=12.,The discount factor is 0.7. Find the value function for the policy given by:π(A)=C (if we are in state A, we choose the action to go to C), π(B)=A
and π(C)=B([vπ(A),vπ(B),vπ(C)]).

  • [10.2, 16.7, 20.2]
  • [14.2, 16.5, 15.1]
  • [15.9, 16.1, 21.3]
  • [12.2, 6.2, 14.5]
Answer :- 

8. Suppose x is a fixed point for the function A,y is a fixed point for the function B, and x=BA(x)
, where BA is the composition of B and A . Consider the following statements:

(i) x is a fixed point for B
(ii) x=y
(iii) BA(y)=y

Which of the above must be true?

  • Only (i)
  • Only (ii)
  • Only (i), (ii)
  • (i), (ii), (iii)
Answer :- 

9. Which of the following is not a valid norm function? (x is a D dimensional vector)

Answer :- 

10. Which of the following is a contraction mapping in any norm?

  • T([x1,x2])=[0.5x1,0.5x2]
  • T([x1,x2])=[2x1,2x2]
  • T([x1,x2])=[2x1,3x2]
  • T([x1,x2])=[x1+x2,x1−x2]
Answer :- For Answers Click Here 
Scroll to Top