Data dredging

9/23/2023

We can put this in the category of other easily recognizable fallacies, such as the Gambler's Fallacy, False Causality, biased sampling, overgeneralization, and many others. In an attempt to demonstrate just how obvious and simplistic that statistical fallacies can be, let's start off with the classic which everyone should already know: cherry picking. The failure to do so will be catastrophic in terms of both data outcomes and a data scientist's credibility. Here are five statistical fallacies - traps - which data scientists should be aware of and definitely avoid. The end results is the same, however: plain ol' wrong. That's not hard and fast, however, as there is definitely overlap between these 2 phenomena. Out of interest, when misuse of statistics is not intentional, the process bears a resemblance to cognitive biases, which Wikipedia defines as "tendencies to think in certain ways that can lead to systematic deviations from a standard of rationality or good judgment." The former builds incorrect reasoning on top of data and its explicit and active analysis, while the latter reaches a similar outcome much more implicitly and passively. Let's have a look at a few of these more common fallacies and see how we can avoid them. The good thing is that once they are identified and studied, they can be avoided. Given that people have been making these mistakes for so long, many statistical fallacies have been identified and can be explained. There are infinite ways to incorrectly reason from data, some of which are much more obvious than others. A free resource from GRC Data Intelligence. Related glossary terms: decision tree, box plot

However, because the concurrence of variables does not constitute information about their relationship (which could, after all, be merely coincidental), further analysis is required to yield any useful conclusions. Data dredging is sometimes used to present an unexamined concurrence of variables as if they led to a valid conclusion, prior to any such study.Īlthough data dredging is often used improperly, it can be a useful means of finding surprising relationships that might not otherwise have been discovered. To make a valid assessment of the relationship between any two variables, further study is required in which isolated variables are contrasted with a control group. Many variables may be related through chance alone others may be related through some unknown factor. Data dredging is sometimes described as "seeking more information from a data set than it actually contains."ĭata dredging sometimes results in relationships between variables announced as significant when, in fact, the data require more study before such an association can legitimately be determined. Sometimes conducted for unethical purposes, data dredging often circumvents traditional data mining techniques and may lead to premature conclusions. The traditional scientific method, in contrast, begins with a hypothesis and follows with an examination of the data.

Data dredging (data fishing) - Data dredging, sometimes referred to as "data fishing" is a data mining practice in which large volumes of data are analyzed seeking any possible relationships between data.

0 Comments

discovery guide

Data dredging

Leave a Reply.

Author

Archives

Categories