NPTEL Reinforcement Learning Week 11 Assignment Answers 2025

1. Recall that in MAXQ Value Function Decomposition, we draw a ”call Graph” where nodes are ’tasks’ and edges show the dependency of the tasks. Which of the following is true about the graph?

The graph must be a tree
The graph must be a DAG
The graph can be any regular graph without self loops
Any directed graph can be a call graph

Answer :- For Answers Click Here

2. Which actions among the following are primitive actions?
i Left
ii Drop off
iii Navigate
iv put-down

i, ii, iii, iv
ii, iii
i, iv
None of the above

Answer :-

3. From the discussion in the class, it is said that Navigate is not a single sub-task. That is the reason it is followed by a parameter ’t’. What is parameter ’t’ from class discussions?

the number of times ’Pick up’ or ’Drop off’ have called sub-task Navigate
the maximum number of primitive actions permitted to finish sub-task
the destination (in this case, one of R, G, B, Y)
None of the above

Answer :-

4. State True/False. The left to right ordering in above call-graph is important and sub-tasks should be performed via these orderings.

True
False

Answer :-

5. Suppose the passenger is always either inside the taxi or at one of the four pickup/dropoff locations. That means there are 5 states for passenger’s location. Then for the given taxiworld, what is the number of states that suffices to define all information?

18
18*5
18*5*4
None of the above

Answer :- For Answers Click Here

6. We define the action value for MAXQ as qπ(i,s,a)=vπ(a,s)+Cπ(i,s,a) where qπ(i,s,a) can be interpreted as expected return when you are in sub-task i, and state s, and you decide to perform sub-task a. Assume that in taking a, you get reward r1, and after completion of a, you get reward r2 in completing sub-task i. Choose the correct value of Cπ(i,s,a) from following.

Cπ(i,s,a)=r2
Cπ(i,s,a)=r1+r2
Cπ(i,s,a)=r1
None of the above

Answer :-

7. In the MAXQ approach to solving a problem, suppose that sub-task Mi invokes sub-task Mj.Do the pseudo rewards of Mj have any effect on sub-task Mi?

Answer :-

8. Which of the following can be considered as a good option in Hierarchical RL?

An option that can be reused often
An option that can cut down exploration
An option that helps in transfer learning
None of the above

Answer :-

9. State True or False:
In the MAXQ Framework, in presence of pseudo-rewards, two completion functions need to be maintained for every (i,s,a) one for the pseudo-rewards and one for the underlying MDP rewards.

True
False

Answer :-

10. Which of the following can be achieved when we use the MAXQ approach to solve a problem?

Solutions that are hierarchically optimal
Solutions that are Recursively optimal
Solutions that are Flat optimal
None of these

Answer :- For Answers Click Here

NPTEL Reinforcement Learning Week 11 Assignment Answers 2025

NPTEL Reinforcement Learning Week 11 Assignment Answers 2025

Related Posts