NPTEL Business Intelligence & Analytics Week 8 Assignment Answers 2025
1. In a RandomForestClassifier, what does setting n_estimators=100 mean?
- The number of samples used for each tree
- The number of trees in the forest
- The number of features considered at each split
- The maximum depth of each tree
Answer :- For Answers Click Here
2. 1 point
Imagine you have a decision tree that perfectly fits your training data but performs poorly on test data. What technique can help address this issue?
- Increasing the tree depth.
- Growing additional branches on the tree
- Adding more features to the dataset.
- Applying pruning to avoid overfitting.
Answer :-
3. If you want to visualize the decision-making process of a decision tree x in scikit-learn, which function would you use?
- tree.plot_tree(x)
- tree.evaluate_tree(x)
- tree.train_tree(x)
- tree.prune_tree(x)
Answer :-
4. Using scikit-learn, you generate a classification report to evaluate a model predicting exam grade categories. Which of the following metrics is NOT included in the report?
- Precision
- Recall
- F1-score
- Mean Squared Error
Answer :-
5. In a Random Forest classifier, what function does the Gini criterion serve?
- To measure the prediction accuracy of the forest.
- To identify the most important features for classification.
- To determine the best feature for splitting at each node in a decision tree.
- To calculate the purity of the final node.
Answer :-
6. What action does rf.fit (X_train, Y_train) perform when working with a Random Forest classifier in scikit-learn?
- It selects the best features for the training data
- It trains the Random Forest model using the provided training data
- It predicts the target values for a new set of inputs.
- It calculates the training data accuracy.
Answer :-
7. You are using the make_classification function from scikit-learn to generate a dataset for predicting whether graduate level students will pass or fail a course. The dataset includes features such as hours of study, previous grades, and mental health factors. What does the n_samples parameter control in this function?
- The number of features (e.g., study hours, grades, mental health factors).
- The number of classes (e.g., Pass/Fail) in the dataset.
- The number of data points (students) to be generated, including all their characteristics.
- The degree of noise in the dataset, such as random errors in student data.
Answer :-
8. What function does ccp_alpha serve in decision tree pruning in scikit-learn?
- Defines the minimum number of leaf nodes required.
- Controls the number of samples required to split a node
- Sets the threshold that helps decide which nodes to prune based on cost complexity.
- Determines the maximum depth of the tree.
Answer :-
9. When using the roc_curve function from sklearn.metrics, which of the following statements is true?
- The roc_curve function computes the precision-recall curve for binary classification models.
- The roc_curve function requires predicted probabilities or decision function scores as inputs, not just class labels.
- The roc_curve function can be used to evaluate multi-class classification models without modifications.
- The roc_curve function returns the Receiver Operating Characteristic (ROC) curve plot by default.
Answer :-
10. In NumPy, what is the output of the np.shape function when applied to an array?
- The data type of the array
- A tuple representing the size of each dimension of the array.
- The number of dimensions of the array.
- The total number of elements in the array
Answer :- For Answers Click Here
11. Which of the following best describes the difference between classification and regression trees in the CART algorithm?
- Classification trees predict continuous variables, while regression trees predict categorical variables.
- Regression trees predict categorical variables, while classification trees predict continuous variables.
- Classification trees predict categorical variables, while regression trees predict continuous variables.
- Regression trees are computationally expensive, while classification trees are computationally inexpensive.
Answer :-
12. What is the role of entropy in decision trees?
- It measures the accuracy of the tree’s predictions
- It determines the optimal number of splits in the tree
- It quantifies the disorder or impurity in a node
- It calculates the variance of the data in each node
Answer :-
13. You are creating a model to identify spam emails that you receive in your college mail ID. Which of the following defines a False Positive (FP) in this context?
- An email correctly identified as not spam
- An email predicted as not spam that is actually spam
- An email predicted as spam that is actually not spam
- An email correctly identified as spam
Answer :-
14.If the true positive value is 20 and the false negative value is 5, what is the recall score for the classification model?
- 0.8
- 0.9
- 0.7
- None of the above
Answer :-
15. Which method is used to evaluate a model by splitting the data into multiple subsets?
- Gradient Descent
- Cross-validation
- Principal Component Analysis
- Regularization
Answer :- For Answers Click Here