NPTEL Reinforcement Learning Week 11 Assignment Answers 2025

NPTEL Reinforcement Learning Week 11 Assignment Answers 2025

1. Recall that in MAXQ Value Function Decomposition, we draw a ”call Graph” where nodes are ’tasks’ and edges show the dependency of the tasks. Which of the following is true about the graph?

  • The graph must be a tree
  • The graph must be a DAG
  • The graph can be any regular graph without self loops
  • Any directed graph can be a call graph
Answer :- For Answers Click Here 

2. Which actions among the following are primitive actions?
i Left
ii Drop off
iii Navigate
iv put-down

  • i, ii, iii, iv
  • ii, iii
  • i, iv
  • None of the above
Answer :- 

3. From the discussion in the class, it is said that Navigate is not a single sub-task. That is the reason it is followed by a parameter ’t’. What is parameter ’t’ from class discussions?

  • the number of times ’Pick up’ or ’Drop off’ have called sub-task Navigate
  • the maximum number of primitive actions permitted to finish sub-task
  • the destination (in this case, one of R, G, B, Y)
  • None of the above
Answer :- 

4. State True/False. The left to right ordering in above call-graph is important and sub-tasks should be performed via these orderings.

  • True
  • False
Answer :- 

5. Suppose the passenger is always either inside the taxi or at one of the four pickup/dropoff locations. That means there are 5 states for passenger’s location. Then for the given taxiworld, what is the number of states that suffices to define all information?

  • 18
  • 18*5
  • 18*5*4
  • None of the above
Answer :- For Answers Click Here 

6. We define the action value for MAXQ as qπ(i,s,a)=vπ(a,s)+Cπ(i,s,a) where qπ(i,s,a) can be interpreted as expected return when you are in sub-task i, and state s, and you decide to perform sub-task a. Assume that in taking a, you get reward r1, and after completion of a, you get reward r2 in completing sub-task i. Choose the correct value of Cπ(i,s,a) from following.

  • Cπ(i,s,a)=r2
  • Cπ(i,s,a)=r1+r2
  • Cπ(i,s,a)=r1
  • None of the above
Answer :- 

7. In the MAXQ approach to solving a problem, suppose that sub-task Mi invokes sub-task Mj.Do the pseudo rewards of Mj have any effect on sub-task Mi?

  • Yes
  • No
Answer :- 

8. Which of the following can be considered as a good option in Hierarchical RL?

  • An option that can be reused often
  • An option that can cut down exploration
  • An option that helps in transfer learning
  • None of the above
Answer :- 

9. State True or False:
In the MAXQ Framework, in presence of pseudo-rewards, two completion functions need to be maintained for every (i,s,a) one for the pseudo-rewards and one for the underlying MDP rewards.

  • True
  • False
Answer :- 

10. Which of the following can be achieved when we use the MAXQ approach to solve a problem?

  • Solutions that are hierarchically optimal
  • Solutions that are Recursively optimal
  • Solutions that are Flat optimal
  • None of these
Answer :- For Answers Click Here 
Scroll to Top