NPTEL Introduction to Machine Learning Week 5 Assignment Answers 2025

1. Consider a feedforward neural network that performs regression on a p-dimensional input to produce a scalar output. It has m hidden layers and each of these layers has k hidden units. What is the total number of trainable parameters in the network? Ignore the bias terms.

pk+mk²+k
pk+(m−1)k²+k
p²+(m−1)pk+k
p²+(m−1)pk+k²

Answer :- For Answers Click Here

Answer :-

3. Consider a two-layered neural network y=σ(W^(B)σ(W^(A)x)). Let h=σ(W^(A)x) denote the hidden layer representation.W(A) and W(B) are arbitrary weights. Which of the following statement(s) is/are true? Note:∇_g(f) denotes the gradient of f w.r.t g.

∇h(y) depends on W^(A)
∇_W(A)(y) depends on W^(B)
∇_W(A)(h) depends on W^(B)
∇_W(B)(y) depends on W^(A)

Answer :-

4. Which of the following statement(s) about the initialization of neural network weights is/are true for a network that uses the sigmoid activation function?

Two different initializations of the same network could converge to different minima
For a given initialization, gradient descent will converge to the same minima irrespective of the learning rate.
Initializing all weights to the same constant value leads to undesirable results
Initializing all weights to very large values leads to undesirable results

Answer :-

Answer :-

6. A geometric distribution is defined by the p.m.f. f(x;p)=(1−p)^(x−1)p for x=1,2,…… Given the samples [4, 5, 6, 5, 4, 3] drawn from this distribution, find the MLE of p.

0.111
0.222
0.333
0.444

Answer :- For Answers Click Here

7. Consider a Bernoulli distribution with p=0.7 (true value of the parameter). We draw samples from this distribution and compute an MAP estimate of p by assuming a prior distribution over p
. Let N(μ,σ²) denote a gaussian distribution with a mean μ and variance σ² .Distributions are normalized as needed. Which of the following statement(s) is/are true?

If the prior is N(0.6,0.1), we will likely require fewer samples for converging to the true value than if the prior is N(0.4,0.1)
If the prior is N(0.4,0.1), we will likely require fewer samples for converging to the true value than if the prior is N(0.6,0.1)
With a prior of N(0.1,0.001), the estimate will never converge to the true value, regardless of the number of samples used.
With a prior of U(0,0.5) (i.e. uniform distribution between 0 and 0.5), the estimate will never converge to the true value, regardless of the number of samples used.

Answer :-

8. Which of the following statement(s) about parameter estimation techniques is/are true?

To obtain a distribution over the predicted values for a new data point, we need to compute an integral over the parameter space.
The MAP estimate of the parameter gives a point prediction for a new data point.
The MLE of a parameter gives a distribution of predicted values for a new data point.
We need a point estimate of the parameter to compute a distribution of the predicted values for a new data point.

Answer :-

9. In a classification setting, it is common in machine learning applications to minimize the discrete cross entropy loss given by H_CE(p,q)=−Σ_ip_ilogq_i where p_i and q_iare the true and predicted distributions (p_i∈{0,1} depending on the label of the corresponding entry). Which of the following statement(s) about minimizing the cross entropy loss is/are true?

Minimizing H_CE(p,q) is equivalent to minimizing the (self) entropy H(q)
Minimizing H_CE(p,q) is equivalent to minimizing H_CE(q,p).
Minimizing H_CE(p,q) is equivalent to minimizing the KL divergence D_KL(p||q)
Minimizing H_CE(p,q) is equivalent to minimizing the KL divergence D_KL(q||p)

Answer :-

10. Which of the following statement(s) about activation functions is/are NOT true?

Non-linearity of activation functions is not a necessary criterion when designing very deep neural networks
Saturating non-linear activation functions (derivative →0 as x→±∞) avoid the vanishing gradients problem
Using the ReLU activation function avoids all problems arising due to gradients being too small.
The dead neurons problem in ReLU networks can be fixed using a leaky ReLU activation function

Answer :- For Answers Click Here

NPTEL Introduction to Machine Learning Week 5 Assignment Answers 2025

NPTEL Introduction to Machine Learning Week 5 Assignment Answers 2025

Related Posts