NPTEL Reinforcement Learning Week 8 Assignment Answers 2025

NPTEL Reinforcement Learning Week 8 Assignment Answers 2025

1. You are given a training set of vectors Φ, so that each row of the matrix Φ corresponds to k
attributes of a single training sample. Suppose that you are asked to find a linear function that minimizes the mean squared error for a given set of stationary targets y using linear regression. State true/false for the following statement.
Statement: If the column vectors of Φ are linearly independent, then there exists a unique linear function that minimizes the mean-squared-error.

  • True
  • False
Answer :- For Answers Click Here 

2. Which of the following statements are true ?

  • Function approximation allows us to deal with continuous state spaces.
  • A lookup table is a linear function approximator.
  • State aggregates do not overlap in coarse-coding.
  • None of the above.
Answer :- 

3. In which of the following cases, the loss of a function appoximator as ∑s∈S(V^(s)−V(s))2
would lead to poor performance? Consider ’relevant’ states to be those which are visited frequently when executing near optimal policies

  • Large state space with small percentage of relevant states.
  • Small state space with large percentage of relevant states.
  • Large state space with large percentage of relevant states.
  • None of the above.
Answer :- 

4. Assertion: It is not possible to use look-up table based methods to solve continuous state or action space problems. (Assume discretization of continuous space is not allowed)
Reason: For continuous state or action space, there are an infinite number of states/actions.

  • Both Assertion and Reason are true, and Reason is correct explanation for Assertion.
  • Both Assertion and Reason are true, but Reason is not correct explanation for assertion
  • Assertion is true, Reason is false
  • Both Assertion and Reason are false
Answer :- 

5. Assertion: If we make incremental updates for a linear approximation of the value function v^
under a policy π, using gradient descent to minimize the mean-square-error between u^(st)
and bootstrapped targets Rt+γv^(st+1), then we will eventually converge to the same solution that we would have if we used the true uπ values as targets instead.
Reason: Each update moves v^ closer to uπ, so eventually the bootstrapped targets Rt+γv^(st+1),
will converge to the true uπ(st) values

(Assume that we sample on-policy)

  • Both Assertion and Reason are true, and Reason is correct explanation for Assertion
  • Both Assertion and Reason are true, but Reason is not correct explanation for assertion.
  • Assertion is true and Reason is false
  • Both Assertion and Reason are false
Answer :- 

6.

Answer :- For Answers Click Here 

7. Which of the following is/are true about the LSTD and LSTDQ algorithm?

  • Both are iterative algorithms, where the estimate of the parameters are updated using the gradient information of the loss function.
  • Both LSTD and LSTDQ can reuse samples.
  • Both LSTD and LSTDQ are linear function approximation methods.
  • None of the above
Answer :- 

8. Assertion: When minimizing mean-squared-error to approximate the value of states under a given policy π, it is important that we draw samples on-policy.
Reason: Sampling on-policy makes the training data approximately reflect the steady state distribution of states under the policy π.

  • Both Assertion and Reason are true, and Reason is correct explanation for Assertion.
  • Both Assertion and Reason are true, but Reason is not correct explanation for assertion.
  • Assertion is true and Reason is false
  • Both Assertion and Reason are false
Answer :- 

9. Tile coding is a method of state aggregation for gridworld problems. Consider the following statements.
(i) The number of indicators for each state is equal to number of tilings.
(ii) Tile coding cannot be used in continuous state spaces.
(iii) Tile coding is also a form of Coarse coding.

Say which of the above statements are true

  • (iii) only
  • (i), (iii)
  • (i) only
  • (i), (ii), (iii)
Answer :- 

10.

Answer :- For Answers Click Here 
Scroll to Top