Ch 4 Midterm

24 July 2022
4.7 (114 reviews)
70 test answers

Unlock all answers in this set

Unlock answers (66)
question
In data mining, classification models help in prediction.
answer
True
question
The data mining in cancer research case study explains that data mining methods are capable of extracting patterns and ________ hidden deep in large and complex medical databases.
answer
relationships
question
List five reasons for the growing popularity of data mining in the business world.
answer
More intense competition at the global scale driven by customers' ever-changing needs and wants in an increasingly saturated marketplace β€’ General recognition of the untapped value hidden in large data sources β€’ Consolidation and integration of database records, which enables a single view of customers, vendors, transactions, etc. β€’ Consolidation of databases and other data repositories into a single location in the form of a data warehouse β€’ The exponential increase in data processing and storage technologies β€’ Significant reduction in the cost of hardware and software for data storage and processing β€’ Movement toward the demassification (conversion of information resources into nonphysical form) of business practices
question
In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.
answer
False
question
In the terrorist funding case study, an observed price ________ may be related to income tax avoidance/evasion, money laundering, or terrorist financing.
answer
deviation
question
If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."
answer
True
question
All of the following statements about data mining are true EXCEPT:
answer
The ideas behind it are relatively new.
question
List 3 common data mining myths and realities.
answer
1) Myth: Data mining provides instant, crystal-ball-like predictions. Reality: Data mining is a multistep process that requires deliberate, proactive design and use. 2) Myth: Data mining is not yet viable for mainstream business applications. Reality: The current state of the art is ready to go for almost any business type and/or size. 3) Myth: Data mining requires a separate, dedicated database. Reality: Because of the advances in database technology, a dedicated database is not required. 4) Myth: Only those with advanced degrees can do data mining. Reality: Newer Web-based tools enable managers of all educational levels to do data mining. 5) Myth: Data mining is only for large firms that have lots of customer data. Reality: If the data accurately reflect the business or its customers, any company can use data mining.
question
Describe cluster analysis and some of its applications.
answer
Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the degree of association is strong among members of the same cluster and weak among members of different clusters. Cluster analysis is an essential data mining method for classifying items, events, or concepts into common groupings called clusters. The method is commonly used in biology, medicine, genetics, social network analysis, anthropology, archaeology, astronomy, character recognition, and even in MIS development. As data mining has increased in popularity, the underlying techniques have been applied to business, especially to marketing. Cluster analysis has been used extensively for fraud detection (both credit card and e-commerce fraud) and market segmentation of customers in contemporary CRM systems.
question
Data are often buried deep within very large ________, which sometimes contain data from several years.
answer
databases
question
Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?
answer
clustering
question
K-fold cross-validation is also called sliding estimation.
answer
False
question
In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.
answer
False
question
Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches.
answer
extracted
question
Statistics and data mining both look for data sets that are as large as possible.
answer
False
question
The cost of data storage has plummeted recently, making data mining feasible for more firms.
answer
True
question
In the Dell case study, engineers working closely with marketing, used lean software development strategies and numerous technologies to create a highly scalable, singular ________.
answer
data mart
question
Clustering partitions a collection of things into segments whose members share
answer
similar characteristics.
question
What does the scalability of a data mining method refer to?
answer
its ability to construct a prediction model efficiently given a large amount of data
question
List and briefly describe the six steps of the CRISP-DM data mining process.
answer
Step 1: Business Understanding β€” The key element of any data mining study is to know what the study is for. Answering such a question begins with a thorough understanding of the managerial need for new knowledge and an explicit specification of the business objective regarding the study to be conducted. Step 2: Data Understanding β€” A data mining study is specific to addressing a well-defined business task, and different business tasks require different sets of data. Following the business understanding, the main activity of the data mining process is to identify the relevant data from many available databases. Step 3: Data Preparation β€” The purpose of data preparation (or more commonly called data preprocessing) is to take the data identified in the previous step and prepare it for analysis by data mining methods. Compared to the other steps in CRISP-DM, data preprocessing consumes the most time and effort; most believe that this step accounts for roughly 80 percent of the total time spent on a data mining project Step 4: Model Building β€” Here, various modeling techniques are selected and applied to an already prepared data set in order to address the specific business need. The model-building step also encompasses the assessment and comparative analysis of the various models built. Step 5: Testing and Evaluation β€” In step 5, the developed models are assessed and evaluated for their accuracy and generality. This step assesses the degree to which the selected model (or models) meets the business objectives and, if so, to what extent (i.e., do more models need to be developed and assessed). Step 6: Deployment β€” Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. In many cases, it is the customer, not the data analyst, who carries out the deployment steps.
question
In the opening case, police detectives used data mining to identify possible new areas of inquiry.
answer
False
question
The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit.
answer
False
question
Because of its successful application to retail business problems, association rule mining is commonly called ________.
answer
market-basket analysis
question
In the Influence Health case, the company was able to evaluate over ________ million records in only two days.
answer
195
question
In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as
answer
association rule mining.
question
Prediction problems where the variables have numeric values are most accurately defined as
answer
regressions.
question
Fayyad et al. (1996) defined ________ in databases as a process of using data mining methods to find useful information and patterns in the data.
answer
knowledge discovery
question
Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.
answer
False
question
Converting continuous valued numerical variables to ranges and categories is referred to as discretization.
answer
True
question
Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________.
answer
data mining
question
Whereas ________ starts with a well-defined proposition and hypothesis, data mining starts with a loosely defined discovery statement.
answer
statistics
question
Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as ________.
answer
data preprocessing
question
In the data mining in Hollywood case study, how successful were the models in predicting the success or failure of a Hollywood movie?
answer
The researchers claim that these prediction results are better than any reported in the published literature for this problem domain. Fusion classification methods attained up to 56.07% accuracy in correctly classifying movies and 90.75% accuracy in classifying movies within one category of their actual category. The SVM classification method attained up to 55.49% accuracy in correctly classifying movies and 85.55% accuracy in classifying movies within one category of their actual category.
question
Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?
answer
classification
question
Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.
answer
False
question
During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.
answer
True
question
Customer ________ management extends traditional marketing by creating one-on-one relationships with customers.
answer
relationship
question
Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.
answer
False
question
What does the robustness of a data mining method refer to?
answer
its ability to overcome noisy data to make somewhat accurate predictions
question
List six common data mining mistakes.
answer
β€’ Selecting the wrong problem for data mining β€’ Ignoring what your sponsor thinks data mining is and what it really can and cannot do β€’ Leaving insufficient time for data preparation β€’ Looking only at aggregated results and not at individual records β€’ Being sloppy about keeping track of the data mining procedure and results β€’ Ignoring suspicious findings and quickly moving on β€’ Running mining algorithms repeatedly and blindly β€’ Believing everything you are told about the data β€’ Believing everything you are told about your own data mining analysis β€’ Measuring your results differently from the way your sponsor measures them
question
In the Dell cases study, the largest issue was how to properly spend the online marketing budget.
answer
False
question
Describe the role of the simple split in estimating the accuracy of classification models.
answer
The simple split (or holdout or test sample estimation) partitions the data into two mutually exclusive subsets called a training set and a test set (or holdout set). It is common to designate two-thirds of the data as the training set and the remaining one-third as the test set. The training set is used by the inducer (model builder), and the built classifier is then tested on the test set. An exception to this rule occurs when the classifier is an artificial neural network. In this case, the data is partitioned into three mutually exclusive subsets: training, validation, and testing.
question
In lessons learned from the Target case, what legal warnings would you give another retailer using data mining for marketing?
answer
If you look at this practice from a legal perspective, you would conclude that Target did not use any information that violates customer privacy; rather, they used transactional data that most every other retail chain is collecting and storing (and perhaps analyzing) about their customers. What was disturbing in this scenario was perhaps the targeted concept: pregnancy. There are certain events or concepts that should be off limits or treated extremely cautiously, such as terminal disease, divorce, and bankruptcy.
question
Briefly describe five techniques (or algorithms) that are used for classification modeling.
answer
β€’ Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. β€’ Statistical analysis. Statistical techniques were the primary classification algorithm for many years until the emergence of machine-learning techniques. Statistical classification techniques include logistic regression and discriminant analysis. β€’ Neural networks. These are among the most popular machine-learning techniques that can be used for classification-type problems. β€’ Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category. β€’ Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class (or category). β€’ Genetic algorithms. This approach uses the analogy of natural evolution to build directed-search-based mechanisms to classify data samples. β€’ Rough sets. This method takes into account the partial membership of class labels to predefined categories in building models (collection of rules) for classification problems.
question
The data field "ethnic group" can be best described as
answer
nominal data.
question
Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by
answer
removing identifiers such as names and social security numbers.
question
Which of the following is a data mining myth?
answer
Data mining requires a separate, dedicated database.
question
As described in the Influence Health case study, customers are more often ________ services from a variety of healthcare service providers before selecting one.
answer
comparing
question
Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.
answer
True
question
One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.
answer
de-identification
question
A data mining study is specific to addressing a well-defined business task, and different business tasks require
answer
different sets of data.
question
In estimating the accuracy of data mining (or other) classification models, the true positive rate is
answer
the ratio of correctly classified positives divided by the total positive count.
question
In ________, a classification method, the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset, using the others as a training set.
answer
k-fold cross validation
question
While prediction is largely experience and opinion based, ________ is data and model based.
answer
forecasting
question
Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.
answer
False
question
The basic idea behind a(n) ________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.
answer
decision tree
question
List four myths associated with data mining.
answer
β€’ Data mining provides instant, crystal-ball-like predictions. β€’ Data mining is not yet viable for business applications. β€’ Data mining requires a separate, dedicated database. β€’ Only those with advanced degrees can do data mining. β€’ Data mining is only for large firms that have lots of customer data.
question
Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings?
answer
CRISP-DM
question
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?
answer
insurance
question
Ratio data is a type of categorical data.
answer
False
question
Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.
answer
False
question
There has been an increase in data mining to deal with global competition and customers' more sophisticated ________ and wants.
answer
needs
question
When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.
answer
True
question
________ was proposed in the mid-1990s by a European consortium of companies to serve as a nonproprietary standard methodology for data mining.
answer
CRISP-DM
question
What is the main reason parallel processing is sometimes used for data mining?
answer
because of the massive data amounts and search efforts involved
question
The ________ is the most commonly used algorithm to discover association rules. Given a set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum number of the itemsets.
answer
Apriori algorithm
question
Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from
answer
analyzing the vast data amounts routinely collected.
question
In the Influence Health case study, what was the goal of the system?
answer
increasing service use
question
All of the following statements about data mining are true EXCEPT
answer
the process aspect means that data mining should be a one-step process to results.
question
In the Target case study, why did Target send a teen maternity ads?
answer
Target's analytic model suggested she was pregnant based on her buying habits.