Question 1

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.

Accepted Answer

A)  Single linkage
B)  Ward's method
C)  Average group linkage
D)  DendrogramE) C) and D)
F) A) and D)

Question 2

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a

Accepted Answer

A)  dendrogram.
B)  scatter chart.
C)  decile-wise lift chart.
D)  cumulative lift tree.E) All of the above
F) A) and B)

Question 3

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

Accepted Answer

A)  matching coefficient.
B)  Jaccard's coefficient.
C)  Euclidean distance.
D)  antecedent.E) A) and D)
F) A) and B)

Question 4

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.

Accepted Answer

A)  most similar
B)  most different
C)  farthest apart
D)  closestE) B) and C)
F) None of the above

Question 5

A method for modifying variables that reduces bias prior to cluster analysis is

Accepted Answer

A)  standardization.
B)  weighting.
C)  removing outliers.
D)  randomizing.E) B) and D)
F) All of the above

Question 6

The goal of __________ is to use the variable values to identify relationships between observations.

Accepted Answer

A)  unsupervised learning
B)  data mining
C)  McQuitty's method
D)  Ward's methodE) C) and D)
F) A) and D)

Question 7

k-means clustering is the process of

Accepted Answer

A)  agglomerating observations into a series of nested groups based on a measure of similarity.
B)  organizing observations into distinct groups based on a measure of similarity.
C)  reducing the number of variables to consider in data-mining.
D)  estimating the value of a continuous outcome variable.E) C) and D)
F) A) and B)

Question 8

A collection of text documents to be analyzed is called a ___________.

Accepted Answer

A)  book
B)  corpus
C)  library
D)  consequentE) C) and D)
F) A) and B)

Question 9

Which of the following is true of Euclidean distances?

Accepted Answer

A)  It is used to measure dissimilarity between categorical variable observations.
B)  It is not affected by the scale on which variables are measured.
C)  It increases with the increase in similarity between variable values.
D)  It is commonly used as a method of measuring dissimilarity between quantitative observations.E) All of the above
F) C) and D)

Question 10

Euclidean distance can be used to measure the distance between __________ in cluster analysis.

Accepted Answer

A)  objects
B)  clusters
C)  observations
D)  wardE) B) and D)
F) B) and C)

Question 11

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?

Accepted Answer

A)  100
B)  125
C)  150
D)  175E) C) and D)
F) A) and B)

Question 12

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

Accepted Answer

A)  0.5
B)  1
C)  1.5
D)  2E) A) and B)
F) A) and C)

Question 13

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents.

Accepted Answer

A)  tokens
B)  stems
C)  terms
D)  stackE) A) and D)
F) B) and C)

Question 14

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350)  correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420)  correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

Accepted Answer

A)  66.21
B)  72.28
C)  75.39
D)  88.57E) A) and D)
F) C) and D)

Question 15

In preparing categorical variables for analysis, it is usually best to

Accepted Answer

A)  convert the categories to numeric representations.
B)  convert the categories to binary, dummy variables.
C)  combine as many categories as possible.
D)  let them remain categorical.E) A) and C)
F) A) and B)

Question 16

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

Accepted Answer

A)  The short leg
B)  The long leg
C)  The hypotenuse
D)  Euclidean distance is not related to right triangles.E) A) and D)
F) None of the above

Question 17

The process of extracting useful information from text data is known as __________.

Accepted Answer

A)  text mining
B)  tokenization
C)  stemming
D)  corpusE) A) and C)
F) A) and B)

Question 18

A __________ refers to the number of times a collection of items occurs together in a transaction data set.

Accepted Answer

A)  consequent
B)  validation count
C)  support count
D)  antecedentE) None of the above
F) B) and C)

Question 19

Single linkage is a measure of calculating dissimilarity between clusters by

Accepted Answer

A)  considering only the two most dissimilar observations in the two clusters.
B)  computing the average dissimilarity between every pair of observations between the two clusters.
C)  considering only the two most similar observations in the two clusters.
D)  considering the distance between the cluster centroids.E) None of the above
F) A) and B)

Question 20

__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

Accepted Answer

A)  Single linkage
B)  Complete linkage
C)  Average linkage
D)  Average group linkageE) B) and D)
F) B) and C)

Exam 4: Descriptive Data Mining

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents.

Correct Answer
verified

The process of extracting useful information from text data is known as __________.

Correct Answer
verified

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?

Correct Answer
verified

In preparing categorical variables for analysis, it is usually best to

Correct Answer
verified

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.

Correct Answer
verified

Single linkage is a measure of calculating dissimilarity between clusters by

Correct Answer
verified

A method for modifying variables that reduces bias prior to cluster analysis is

Correct Answer
verified

The goal of __________ is to use the variable values to identify relationships between observations.

Correct Answer
verified

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

Correct Answer
verified

A collection of text documents to be analyzed is called a ___________.

Correct Answer
verified

A __________ refers to the number of times a collection of items occurs together in a transaction data set.

Correct Answer
verified

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a

Correct Answer
verified

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

Correct Answer
verified

__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

Correct Answer
verified

Correct Answer
verified

Which of the following is true of Euclidean distances?

Correct Answer
verified

k-means clustering is the process of

Correct Answer
verified

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.

Correct Answer
verified

Euclidean distance can be used to measure the distance between __________ in cluster analysis.

Correct Answer
verified

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

Correct Answer
verified

Exam 4: Descriptive Data Mining

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents.

Correct AnswerverifiedShow Answer

The process of extracting useful information from text data is known as __________.

Correct AnswerverifiedShow Answer

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?

Correct AnswerverifiedShow Answer

In preparing categorical variables for analysis, it is usually best to

Correct AnswerverifiedShow Answer

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.

Correct AnswerverifiedShow Answer

Single linkage is a measure of calculating dissimilarity between clusters by

Correct AnswerverifiedShow Answer

A method for modifying variables that reduces bias prior to cluster analysis is

Correct AnswerverifiedShow Answer

The goal of __________ is to use the variable values to identify relationships between observations.

Correct AnswerverifiedShow Answer

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

Correct AnswerverifiedShow Answer

A collection of text documents to be analyzed is called a ___________.

Correct AnswerverifiedShow Answer

A __________ refers to the number of times a collection of items occurs together in a transaction data set.

Correct AnswerverifiedShow Answer

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a

Correct AnswerverifiedShow Answer

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

Correct AnswerverifiedShow Answer

__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Which of the following is true of Euclidean distances?

Correct AnswerverifiedShow Answer

k-means clustering is the process of

Correct AnswerverifiedShow Answer

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.

Correct AnswerverifiedShow Answer

Euclidean distance can be used to measure the distance between __________ in cluster analysis.

Correct AnswerverifiedShow Answer

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

Correct AnswerverifiedShow Answer

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified