Question 1

The X axis of a lift chart shows

Accepted Answer

A)  number of actual Class 1 records identified.
B)  ratio of decile mean to overall mean.
C)  the number of actual Class 1 records.
D)  the ratio of the overall mean to the decile mean.E) A) and B)
F) A) and C)

Question 2

Separate error rates with respect to the false negative and false positive cases are computed to take into account the

Accepted Answer

A)  asymmetric costs in misclassification.
B)  symmetric weights of these two cases.
C)  distortions due to outliers.
D)  effect of sampling error.E) C) and D)
F) B) and D)

Question 3

____________ is a category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest.

Accepted Answer

A)  Supervised Learning
B)  Unsupervised Learning
C)  Dimension Reduction
D)  Data SamplingE) A) and C)
F) All of the above

Question 4

Data-mining methods for predicting an outcome based on a set of input variables is referred to as

Accepted Answer

A)  supervised learning.
B)  unsupervised learning.
C)  dimension reduction.
D)  data sampling.E) A) and D)
F) A) and C)

Question 5

____________is a method of extracting data relevant to the business problem under consideration. It is the first step in the Data Mining process.

Accepted Answer

A)  Data sampling
B)  Data partitioning
C)  Model construction
D)  Model assessmentE) All of the above
F) A) and C)

Question 6

Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of

Accepted Answer

A)  data exploration.
B)  data partitioning.
C)  data preparation.
D)  model assessment.E) B) and D)
F) All of the above A

Question 7

_____ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

Accepted Answer

A)  Underfitting
B)  Overfitting
C)  Oversampling
D)  UndersamplingE) A) and C)
F) A) and D)

Question 8

______________ is NOT a step of Data Mining Process.

Accepted Answer

A)  Data sampling
B)  Data partitioning
C)  Model construction
D)  Supervised learningE) A) and B)
F) A) and C)

Question 9

______________ involves descriptive statistics, data visualization, and clustering.

Accepted Answer

A)  Data exploration
B)  Data partitioning
C)  Data preparation
D)  Model assessmentE) B) and C)
F) All of the above A

Question 10

Given the following classification confusion matrix, what is the overall error rate? $$\begin{array} { | l | c | c | } 
\hline  { \text { Classification Confusion Matrix } } \
\hline &  { \text { Predicted Class } } \
\hline \text { Actual Class } & 1 & 0 \
\hline 1 & 224 & 85 \
\hline 0 & 28 & 3,258 \
\hline
\end{array}$$ ​
​
​

Accepted Answer

The answer of Given the following classification confusion matrix, what...

Question 11

Misclassifying an actual ______ observation as a(n)  ______ observation is known as a false positive.

Accepted Answer

A)  Class 0, Class 1
B)  Class 1, Class 0
C)  error, accuracy
D)  false, trueE) All of the above
F) A) and D)

Question 12

Test set is the data set used to

Accepted Answer

A)  build the data mining model.
B)  estimate accuracy of candidate models on unseen data.
C)  estimate accuracy of final model on unseen data.
D)  show counts of actual versus predicted class values.E) A) and D)
F) B) and C)

Question 13

A _____ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.

Accepted Answer

A)  regression tree
B)  scatter chart
C)  classification tree
D)  classification confusion matrixE) A) and D)
F) A) and C)

Question 14

A(n) _______________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.

Accepted Answer

A)  record
B)  data point
C)  classification
D)  locationE) B) and D)
F) B) and C)

Question 15

_______compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly selected.

Accepted Answer

A)  Cumulative lift
B)  ​Classification confusion
C)  Decile-wise lift chart
D)  ROC curveE) B) and D)
F) C) and D)

Question 16

Estimation methods are also referred to as

Accepted Answer

A)  prediction methods.
B)  clustering methods.
C)  association methods.
D)  supervised methods.E) All of the above
F) A) and D)

Question 17

An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)

Accepted Answer

A)  false negative.
B)  false positive.
C)  residual.
D)  outlier.E) All of the above
F) A) and B)

Question 18

___________ is a generalization of linear regression for predicting a categorical outcome variable.

Accepted Answer

A)  Multiple linear regression
B)  Logistic regression
C)  Discriminant analysis
D)  Cluster analysisE) B) and D)
F) B) and C)

Question 19

The percent of misclassified records out of the total records in the validation data is known as the

Accepted Answer

A)  overall error rate.
B)  error.
C)  accuracy.
D)  class.E) None of the above
F) C) and D) A

Question 20

Given the following classification confusion matrix, what is the accuracy?
​ $$\begin{array} { | l | c | c | } 
\hline  { \text { Classification Confusion Matrix } } \
\hline &  { \text { Predicted Class } } \
\hline \text { Actual Class } & 1 & 0 \
\hline 1 & 224 & 85 \
\hline 0 & 28 & 3,258 \
\hline
\end{array}$$ ​
​

Accepted Answer

The answer of Given the following classification confusion matrix, what...

Exam 9: Predictive Data Mining

The X axis of a lift chart shows

Correct Answer
verified

Separate error rates with respect to the false negative and false positive cases are computed to take into account the

Correct Answer
verified

____________ is a category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest.

Correct Answer
verified

Data-mining methods for predicting an outcome based on a set of input variables is referred to as

Correct Answer
verified

____________is a method of extracting data relevant to the business problem under consideration. It is the first step in the Data Mining process.

Correct Answer
verified

Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of

Correct Answer
verified
A

_____ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

Correct Answer
verified

______________ is NOT a step of Data Mining Process.

Correct Answer
verified

______________ involves descriptive statistics, data visualization, and clustering.

Correct Answer
verified
A

Correct Answer
verified

Misclassifying an actual observation as a(n) observation is known as a false positive.

Correct Answer
verified

Test set is the data set used to

Correct Answer
verified

A _____ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.

Correct Answer
verified

A(n) _______________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.

Correct Answer
verified

_______compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly selected.

Correct Answer
verified

Estimation methods are also referred to as

Correct Answer
verified

An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)

Correct Answer
verified

___________ is a generalization of linear regression for predicting a categorical outcome variable.

Correct Answer
verified

The percent of misclassified records out of the total records in the validation data is known as the

Correct Answer
verified
A

Correct Answer
verified

Exam 9: Predictive Data Mining

The X axis of a lift chart shows

Correct AnswerverifiedShow Answer

Separate error rates with respect to the false negative and false positive cases are computed to take into account the

Correct AnswerverifiedShow Answer

____________ is a category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest.

Correct AnswerverifiedShow Answer

Data-mining methods for predicting an outcome based on a set of input variables is referred to as

Correct AnswerverifiedShow Answer

____________is a method of extracting data relevant to the business problem under consideration. It is the first step in the Data Mining process.

Correct AnswerverifiedShow Answer

Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of

Correct AnswerverifiedA

_____ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

Correct AnswerverifiedShow Answer

______________ is NOT a step of Data Mining Process.

Correct AnswerverifiedShow Answer

______________ involves descriptive statistics, data visualization, and clustering.

Correct AnswerverifiedA

Correct AnswerverifiedShow Answer

Misclassifying an actual ______ observation as a(n) ______ observation is known as a false positive.

Correct AnswerverifiedShow Answer

Test set is the data set used to

Correct AnswerverifiedShow Answer

A _____ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.

Correct AnswerverifiedShow Answer

A(n) _______________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.

Correct AnswerverifiedShow Answer

_______compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly selected.

Correct AnswerverifiedShow Answer

Estimation methods are also referred to as

Correct AnswerverifiedShow Answer

An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)

Correct AnswerverifiedShow Answer

___________ is a generalization of linear regression for predicting a categorical outcome variable.

Correct AnswerverifiedShow Answer

The percent of misclassified records out of the total records in the validation data is known as the

Correct AnswerverifiedA

Correct AnswerverifiedShow Answer

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified
A

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified
A

Correct Answer
verified

Misclassifying an actual observation as a(n) observation is known as a false positive.

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified
A

Correct Answer
verified