Exploratory data analysis in the context of data mining and resampling.
PDF

Keywords

exploratory data analysis
data mining
resampling
cross-validation
data visualization
clustering
classification trees
neural networks

How to Cite

Ho Yu, C. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research, 3(1), 9–22. https://doi.org/10.21500/20112084.819

Abstract

Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.
https://doi.org/10.21500/20112084.819
PDF

References

Altman, D. G., & Royston, P. (2000).What do we mean by validating a prognostic model? Statistics in Medicine, 19, 453-473.

Baker, B. D., & Richards, C. E. (1999). A comparison of conventional linear regression methods and neural networks for forecasting educational spending. Economics of Education Review, 18, 405-415.

Behrens, J. T. & Yu, C. H. (2003). Exploratory data analysis. In J. A. Schinka & W. F. Velicer, (Eds.), Handbook of psychology Volume 2: Research methods in Psychology (pp. 33-64). New Jersey: John Wiley & Sons, Inc.

Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2, 131-160.

Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer.

Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Monterey, CA: Wadsworth International Group.

Carpio, K.J.E. & Hermosilla, A.Y. (2002), On multicollinearity and artificial neural networks, Complexity International, 10, Retrieved October 8, 2009, from http://www.complexity.org.au/ci/vol10/hermos01/.

The work that is sent to this journal must be original, not published or sent to be published elsewhere; and if it is accepted for publication, authors will agree to transfer copyright to International Journal of Psychological Research. 

To give up copyright, the authors allow that, International Journal of Psychological Research, distribute the work more broadly, check for the reuse by others and take care of the necessary procedures for the registration and administration of copyright; at the same time, our editorial board represents the interests of the author and allows authors to re-use his work in various forms. In response to the above, authors transfer copyright to the journal, International Journal of Psychological Research. This transfer does not imply other rights which are not those of authorship (for example those that concern about patents). Likewise, preserves the authors rights to use the work integral or partially in lectures, books and courses, as well as make copies for educational purposes. Finally, the authors may use freely the tables and figures in its future work, wherever make explicit reference to the previous publication in International Journal of Psychological Research. The assignment of copyright includes both virtual rights and forms of the article to allow the editorial to disseminate the work in the manner which it deems appropriate. 

The editorial board reserves the right of amendments deemed necessary in the application of the rules of publication.