NPTEL Business Intelligence & Analytics Week 9 Assignment Answers 2025
1. Which of the following statements is NOT true about clustering algorithms?
- K-medoids algorithm uses actual data points as cluster representatives, while K-modes algorithm employs modes to assess similarity in categorical data.
- K-means algorithm calculates the mean of points within a cluster to determine the centroid, while K-modes algorithm utilizes modes to evaluate similarity in categorical data.
- K-medoids algorithm is generally more robust to outliers and noise compared to K-means algorithm.
- The K-means algorithm always produces better results than K-medoids for all types of datasets.
Answer :- For Answers Click Here
2. Which of the following statements is true about the agglomerative hierarchical clustering method?
- Agglomerative hierarchical clustering follows a top-down approach, starting with all objects in one cluster and splitting them iteratively.
- In agglomerative hierarchical clustering, each data point starts as an individual cluster, and clusters are merged iteratively
- The merging process in agglomerative clustering is random and does not depend on distance measures.
- Agglomerative hierarchical clustering requires exactly 𝑛 + 1iterations to form the final clustering structure.
Answer :-
3. Which hierarchical clustering method computes all pairwise dissimilarities between the observations in cluster A and the observations in cluster B, and records the smallest of these dissimilarities?
- Single linkage
- Average linkage
- Complete linkage
- Centroid linkage
Answer :-
4. A dendrogram in hierarchical clustering is a _____________ representation that shows how clusters are merged at different levels.
- linear
- tree-like
- tabular
- circular
Answer :-
5. ___________ is an unsupervised learning algorithm that groups data points into clusters based on similarity.
- Linear Regression
- K-Means Clustering
- Decision Tree
- Logistic Regression
Answer :-
6. A data scientist is working with a dataset where the number of fraudulent transactions is significantly lower than the number of legitimate transactions. Which technique would be most suitable to handle this class imbalance?
- PCA
- SMOTE
- Decision Tree
- t-SNE
Answer :- For Answers Click Here
7. In a 3-dimensional space represented by coordinates (x, y, z), two cluster centroids, A and B, have coordinates A(1, 5, 8) and B (7, 3, 2). Calculate the Euclidean distance between these centroids to determine their dissimilarity. Round your answer to two decimal places.
- 8.72 units
- 7.11 units
- 8.54 units
- 9.38 units
Answer :-
8. The elbow method in K-means clustering is commonly used to:
- Identify the convergence threshold
- Optimize the starting centroids
- Determine the ideal number of clusters
- Choose the distance metric
Answer :-
9. What will be the Manhattan distance for observation (8, 8) from cluster centroid C1 in the second iteration?
- 12
- 8
- 10
- 14
Answer :-
10. A dendrogram is used in _____________ clustering to visualize the merging of clusters.
- Hierarchical
- K-Means
- DBSCAN
- Spectral
Answer :-
11. ____________ are used to determine how distances between clusters are measured in hierarchical clustering.
- Partitioning methods
- Linkage measures
- Cross-validation
- Density measures
Answer :-
12. Which of the following best describes the divisive hierarchical clustering method?
- Probabilistic
- Deterministic
- Stochastic
- Non-parametric
Answer :-
13. Density-based clustering methods group data points based on density, requiring that each core point’s neighborhood within a specified radius contains at least a minimum number of points.
- True
- False
Answer :-
14. The clustering objective function seeks to achieve which of the following?
- High similarity within clusters, high similarity between clusters
- Low similarity within clusters, low similarity between clusters
- High similarity within clusters, low similarity between clusters
- Low similarity within clusters, high similarity between clusters
Answer :-
15. The k-modes method is a variant of k-means that is specifically used for clustering:
- Numerical data
- Sequential data
- Nominal data
- Time-series data
Answer :- For Answers Click Here