NPTEL Data Analytics with Python Week 11 Assignment Answers 2025
1. Which library is used for calculating distance measures in clustering using python?
- distance_matrix
- scipy.spatial
- scipy_spatial
- distance.matrix
Answer :- For Answers Click Here
2. Formula for dissimilarity computation between two objects for categorical variables is – Here p is a categorical variable and m denotes the number of matches.
- D(i, j) = p-m / p
- D(i, j) = p-m / m
- D(i, j) = m-p / p
- D(i, j) = m-p / m
Answer :-
3. Select the correct option for a data set with 7 objects and an interval-scaled variable ‘f’ we have the following measurements:
f = (1, 2, 3, 4, 5, 8, 50)
containing one outlying value.
- Std deviation (std_f) and mean absolute deviation (s_f) are having the same effect of the outlier.
- Mean absolute deviation (s_f) is more affected by the outlier
- Std deviation (std_f) is less affected by the outlier
- Std deviation(std_f) is more affected by the outlier.
Answer :-
4. Select the correct statement about the standardization in the following options –
- Standardizing the data always gives inefficient result while making clusters
- Standardizing the data always beneficial during clustering analysis
- The variables having an absolute value may not efficient after standardization during clustering analysis
- Outliers can not be detected by standardized data
Answer :-
5. Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
3. Centroids do not change between successive iterations.
4. Terminate when RSS falls below a threshold.
- 1,3, and 4
- 1,2,3 and 4
- 2 and 3
- None of these
Answer :- For Answers Click Here
6. In the figure below (see attached drive link), if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?
https://drive.google.com/file/d/1pZaKZa6CDK-Hzn0Iar6LTWPA_zlpStW7/view?usp=sharing
- 1
- 2
- 3
- 4
Answer :-
7. Which of the following clustering requires merging approach?
- Partitional
- Naive Bayes
- Hierarchical
- None of the above
Answer :-
8. State True or False: Hierarchical clustering should primarily be used for data exploration
- True
- False
Answer :-
9. State True or False: For finding dissimilarity between two clusters in hierarchical clustering, average-link is the only metric used
- True
- False
Answer :-
10. If two variables V1 and V2, are used for clustering. Which of the following are true for K means clustering with k =3?
1. If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line
2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line
- 1 only
- 2 only
- 1 and 2
- None of the above
Answer :- For Answers Click Here