NPTEL Business Intelligence & Analytics Week 12 Assignment Answers 2025

NPTEL Business Intelligence & Analytics Week 12 Assignment Answers 2025

1. What is the process of breaking text into smaller units called in text mining?

  • Lemmatization
  • Stopword removal
  • Stemming
  • Tokenization
Answer :- For Answers Click Here 

2. In sentiment analysis, which dataset assigns integer values to words based on their positive or negative strength?

  • NRC sentiment dataset
  • SentiWordNet
  • AFINN sentiment dataset
  • VADER sentiment dataset
Answer :- 

3. A cybersecurity system uses TF-IDF to detect suspicious keywords in threat reports. If a specific term appears frequently in one report but rarely in others, what does its term frequency (TF) measure?

  • The number of reports containing the term
  • The total number of words in the security database
  • The similarity between different threat reports
  • The term’s relative importance within that threat report
Answer :- 

4. A higher Phi coefficient in word co-occurrence analysis suggests:

  • No relationship between two words
  • A strong association between two words appearing together
  • A weak correlation between two words
  • That one word is always followed by the other in a sequence
Answer :- 

5. What advantage do bigrams provide in text analysis compared to single words?

  • Bigrams eliminate the need for stopword removal
  • Bigrams are better than all other types of n-grams
  • Bigrams always have higher frequency counts than individual words
  • Bigrams offer more context and capture structural relationships between words
Answer :- For Answers Click Here 

6. A search engine computes the cosine similarity between a user query and multiple documents. If one document has a similarity score of 0, what does this imply?

  • The document shares no common words with the query
  • The document has a partial match with the query
  • The document has the highest relevance
  • The document is highly relevant to the query
Answer :- 

7. In a dataset of 250 research papers, the words “ocean” and “reef” do not appear together in 15 papers. However, both “ocean” and “reef” are found in 60 papers. Meanwhile, “ocean” appears alone in 20 papers, and “reef” is found without “ocean” in 25 papers. Based on this data, what is the Phi coefficient measuring the correlation between the occurrence of “ocean” and “reef” in this dataset?

  • 0.69
  • 0.95
  • 0.21
  • 0.88
Answer :- 

8. If a word appears in only one document of a corpus, what can be said about its IDF score?

  • It will be negative
  • It will be low
  • It will be high
  • It will be zero
Answer :- 

9. What is the main role of Inverse Document Frequency (IDF) in TF-IDF analysis?

  • Assign higher importance to common words like “the” and “is”
  • Rank documents based on total word count
  • Reduce the weight of frequently occurring words across documents
  • Ensure all words are treated equally
Answer :- 

10. In text classification, how does the Bag of Words model process text?

  • Uses deep learning to understand the meaning of words
  • Converts words into numerical representations based on presence and frequency
  • Ignores word frequency and focuses only on synonyms
  • Retains sentence structure while analyzing text
Answer :- 

11. Cluster profiling is used to determine the optimal number of clusters in a dataset.

  • True
  • False
Answer :- 

12. Cross-sectional data is collected from the same subjects over multiple time periods.

  • True
  • False
Answer :- 

13. A data scientist is preprocessing text for a sentiment analysis model. What would they likely do with words like “the,” “and,” “is,” and “of”?

  • Remove them as stop words
  • Convert them into their root forms
  • Merge them into a single feature
  • Assign them higher weights for analysis
Answer :- 

14. Which of the following is NOT a function of stemming or lemmatization?

  • Reducing different word variations to a common form
  • Enhancing text classification by normalizing words
  • Improving text search accuracy
  • Converting text into numerical vectors
Answer :- 

15. Which of the following is NOT an example of a corpus?

  • A collection of legal documents used in NLP
  • A dataset of medical research papers
  • A collection of all Shakespeare’s works
  • A single email from a spam filter dataset
Answer :- For Answers Click Here 
Scroll to Top