NPTEL Reinforcement Learning Week 11 Assignment Answers 2025
1. Recall that in MAXQ Value Function Decomposition, we draw a ”call Graph” where nodes are ’tasks’ and edges show the dependency of the tasks. Which of the following is true about the graph?
- The graph must be a tree
- The graph must be a DAG
- The graph can be any regular graph without self loops
- Any directed graph can be a call graph
Answer :- For Answers Click Here
2. Which actions among the following are primitive actions?
i Left
ii Drop off
iii Navigate
iv put-down
- i, ii, iii, iv
- ii, iii
- i, iv
- None of the above
Answer :-
3. From the discussion in the class, it is said that Navigate is not a single sub-task. That is the reason it is followed by a parameter ’t’. What is parameter ’t’ from class discussions?
- the number of times ’Pick up’ or ’Drop off’ have called sub-task Navigate
- the maximum number of primitive actions permitted to finish sub-task
- the destination (in this case, one of R, G, B, Y)
- None of the above
Answer :-
4. State True/False. The left to right ordering in above call-graph is important and sub-tasks should be performed via these orderings.
- True
- False
Answer :-
5. Suppose the passenger is always either inside the taxi or at one of the four pickup/dropoff locations. That means there are 5 states for passenger’s location. Then for the given taxiworld, what is the number of states that suffices to define all information?
- 18
- 18*5
- 18*5*4
- None of the above
Answer :- For Answers Click Here
6. We define the action value for MAXQ as qπ(i,s,a)=vπ(a,s)+Cπ(i,s,a) where qπ(i,s,a) can be interpreted as expected return when you are in sub-task i, and state s, and you decide to perform sub-task a. Assume that in taking a, you get reward r1, and after completion of a, you get reward r2 in completing sub-task i. Choose the correct value of Cπ(i,s,a) from following.
- Cπ(i,s,a)=r2
- Cπ(i,s,a)=r1+r2
- Cπ(i,s,a)=r1
- None of the above
Answer :-
7. In the MAXQ approach to solving a problem, suppose that sub-task Mi invokes sub-task Mj.Do the pseudo rewards of Mj have any effect on sub-task Mi?
- Yes
- No
Answer :-
8. Which of the following can be considered as a good option in Hierarchical RL?
- An option that can be reused often
- An option that can cut down exploration
- An option that helps in transfer learning
- None of the above
Answer :-
9. State True or False:
In the MAXQ Framework, in presence of pseudo-rewards, two completion functions need to be maintained for every (i,s,a) one for the pseudo-rewards and one for the underlying MDP rewards.
- True
- False
Answer :-
10. Which of the following can be achieved when we use the MAXQ approach to solve a problem?
- Solutions that are hierarchically optimal
- Solutions that are Recursively optimal
- Solutions that are Flat optimal
- None of these
Answer :- For Answers Click Here