Filters
Question type

Study Flashcards

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents.


A) tokens
B) stems
C) terms
D) stack

E) C) and D)
F) All of the above

Correct Answer

verifed

verified

The process of extracting useful information from text data is known as __________.


A) text mining
B) tokenization
C) stemming
D) corpus

E) A) and B)
F) C) and D)

Correct Answer

verifed

verified

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?


A) 100
B) 125
C) 150
D) 175

E) A) and B)
F) A) and C)

Correct Answer

verifed

verified

In preparing categorical variables for analysis, it is usually best to


A) convert the categories to numeric representations.
B) convert the categories to binary, dummy variables.
C) combine as many categories as possible.
D) let them remain categorical.

E) All of the above
F) A) and C)

Correct Answer

verifed

verified

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.


A) Single linkage
B) Ward's method
C) Average group linkage
D) Dendrogram

E) None of the above
F) A) and C)

Correct Answer

verifed

verified

Single linkage is a measure of calculating dissimilarity between clusters by


A) considering only the two most dissimilar observations in the two clusters.
B) computing the average dissimilarity between every pair of observations between the two clusters.
C) considering only the two most similar observations in the two clusters.
D) considering the distance between the cluster centroids.

E) C) and D)
F) B) and C)

Correct Answer

verifed

verified

A method for modifying variables that reduces bias prior to cluster analysis is


A) standardization.
B) weighting.
C) removing outliers.
D) randomizing.

E) B) and D)
F) B) and C)

Correct Answer

verifed

verified

The goal of __________ is to use the variable values to identify relationships between observations.


A) unsupervised learning
B) data mining
C) McQuitty's method
D) Ward's method

E) A) and C)
F) None of the above

Correct Answer

verifed

verified

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the


A) matching coefficient.
B) Jaccard's coefficient.
C) Euclidean distance.
D) antecedent.

E) B) and C)
F) A) and B)

Correct Answer

verifed

verified

A collection of text documents to be analyzed is called a ___________.


A) book
B) corpus
C) library
D) consequent

E) All of the above
F) A) and B)

Correct Answer

verifed

verified

A __________ refers to the number of times a collection of items occurs together in a transaction data set.


A) consequent
B) validation count
C) support count
D) antecedent

E) None of the above
F) A) and D)

Correct Answer

verifed

verified

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a


A) dendrogram.
B) scatter chart.
C) decile-wise lift chart.
D) cumulative lift tree.

E) B) and C)
F) A) and D)

Correct Answer

verifed

verified

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?


A) The short leg
B) The long leg
C) The hypotenuse
D) Euclidean distance is not related to right triangles.

E) All of the above
F) A) and D)

Correct Answer

verifed

verified

__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.


A) Single linkage
B) Complete linkage
C) Average linkage
D) Average group linkage

E) A) and B)
F) A) and C)

Correct Answer

verifed

verified

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.


A) 66.21
B) 72.28
C) 75.39
D) 88.57

E) B) and C)
F) A) and C)

Correct Answer

verifed

verified

Which of the following is true of Euclidean distances?


A) It is used to measure dissimilarity between categorical variable observations.
B) It is not affected by the scale on which variables are measured.
C) It increases with the increase in similarity between variable values.
D) It is commonly used as a method of measuring dissimilarity between quantitative observations.

E) A) and B)
F) None of the above

Correct Answer

verifed

verified

k-means clustering is the process of


A) agglomerating observations into a series of nested groups based on a measure of similarity.
B) organizing observations into distinct groups based on a measure of similarity.
C) reducing the number of variables to consider in data-mining.
D) estimating the value of a continuous outcome variable.

E) None of the above
F) A) and C)

Correct Answer

verifed

verified

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.


A) most similar
B) most different
C) farthest apart
D) closest

E) A) and B)
F) A) and C)

Correct Answer

verifed

verified

Euclidean distance can be used to measure the distance between __________ in cluster analysis.


A) objects
B) clusters
C) observations
D) ward

E) B) and D)
F) C) and D)

Correct Answer

verifed

verified

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?


A) 0.5
B) 1
C) 1.5
D) 2

E) B) and C)
F) All of the above

Correct Answer

verifed

verified

Showing 21 - 40 of 44

Related Exams

Show Answer