Data Mining and Statistics for Decision Making Stéphane Tufféry.

Saved in:
Bibliographic Details
Online Access: Full Text (via ProQuest)
Main Author: Tufféry, Stéphane (Author, Verfasser)
Format: eBook
Language:English
French
Published: New York, NY John Wiley & Sons 2011.
Series:Wiley Series in Computational Statistics.
Subjects:

MARC

LEADER 00000cam a2200000 c 4500
001 b9658133
003 CoU
005 20200821172030.3
006 m o d
007 cr |||||||||||
008 150129s2011 gw o 000 0 eng
019 |a 711780360  |a 765144014  |a 769189252  |a 769849270  |a 771999468  |a 772397870  |a 799078712  |a 816879070  |a 852505227  |a 961503244  |a 961597673  |a 962613432  |a 962729284  |a 988429457  |a 991924332  |a 992926383  |a 1004475275  |a 1004783163  |a 1008892615  |a 1013730948  |a 1017950860  |a 1021224489  |a 1055368157  |a 1066410657  |a 1077278673  |a 1081214494  |a 1153518407  |a 1162542199 
020 |a 9780470979167 
020 |a 047097916X 
020 |a 9780470979174  |q (electronic bk.) 
020 |a 0470979178  |q (electronic bk.) 
020 |a 9780470979280  |q (electronic bk.) 
020 |a 0470979283  |q (electronic bk.) 
020 |z 1283373971 
020 |z 9781283373975 
020 |z 9780470688298  |q (hardback) 
020 |z 0470688297 
020 |a 9786613373977 
020 |a 6613373974 
024 3 |a 9780470979167 
024 8 |a urn:nbn:de:101:1-201501291327 
024 8 |a 9786613373977 
024 3 |a 9780470979280 
024 8 |a urn:nbn:de:101:1-201501032208 
035 |a (OCoLC)ebqac716215543 
035 |a (OCoLC)716215543  |z (OCoLC)711780360  |z (OCoLC)765144014  |z (OCoLC)769189252  |z (OCoLC)769849270  |z (OCoLC)771999468  |z (OCoLC)772397870  |z (OCoLC)799078712  |z (OCoLC)816879070  |z (OCoLC)852505227  |z (OCoLC)961503244  |z (OCoLC)961597673  |z (OCoLC)962613432  |z (OCoLC)962729284  |z (OCoLC)988429457  |z (OCoLC)991924332  |z (OCoLC)992926383  |z (OCoLC)1004475275  |z (OCoLC)1004783163  |z (OCoLC)1008892615  |z (OCoLC)1013730948  |z (OCoLC)1017950860  |z (OCoLC)1021224489  |z (OCoLC)1055368157  |z (OCoLC)1066410657  |z (OCoLC)1077278673  |z (OCoLC)1081214494  |z (OCoLC)1153518407  |z (OCoLC)1162542199 
037 |a ebqac792450 
040 |a GWDNB  |b ger  |c GWDNB  |d YDXCP  |d DG1  |d COO  |d B24X7  |d E7B  |d CDX  |d REDDC  |d EBLCP  |d DEBSZ  |d N$T  |d OCLCF  |d IDEBK  |d DEBBG  |d S3O  |d AZK  |d MOR  |d LIP  |d PIFAG  |d ZCU  |d LIV  |d MERUC  |d TEFOD  |d SAV  |d MERER  |d U3W  |d UUM  |d COCUF  |d ICG  |d INT  |d VT2  |d AU@  |d WYU  |d TKN  |d DKC  |d OL$  |d UKCRE  |d VLY  |d BRF  |d UIU  |d GWDNB 
041 1 |a eng  |h fre 
044 |c XA-DE-BW 
049 |a GWRE 
050 4 |a QA76.9.D343  |b T84 2011 
066 |c (S 
100 1 |a Tufféry, Stéphane  |e Verfasser  |4 aut. 
245 1 0 |a Data Mining and Statistics for Decision Making  |c Stéphane Tufféry. 
264 1 |a New York, NY  |b John Wiley & Sons  |c 2011. 
300 |a Online-Ressource. 
336 |a Text  |b txt  |2 rdacontent/ger. 
337 |a Computermedien  |b c  |2 rdamedia/ger. 
338 |a Online-Ressource  |b cr  |2 rdacarrier/ger. 
347 |a data file  |2 rda. 
490 0 |a Wiley Series in Computational Statistics. 
500 |a Lizenzpflichtig. 
505 0 |6 880-01  |a Front Matter -- Overview of Data Mining -- The Development of a Data Mining Study -- Data Exploration and Preparation -- Using Commercial Data -- Statistical and Data Mining Software -- An Outline of Data Mining Methods -- Factor Analysis -- Neural Networks -- Cluster Analysis -- Association Analysis -- Classification and Prediction Methods -- An Application of Data Mining: Scoring -- Factors for Success in a Data Mining Project -- Text Mining -- Web Mining -- Appendix A: Elements of Statistics -- Appendix B: Further Reading -- Index. 
505 8 |a Machine generated contents note: Preface -- Foreword -- Contents -- Overview of data mining -- 1.1. What is data mining? -- 1.2. What is data mining used for? -- 1.3. Data Mining and statistics -- 1.4. Data mining and information technology -- 1.5. Data mining and protection of personal data -- 1.6. Implementation of data mining -- The development of a data mining study -- 2.1. Defining the aims -- 2.2. Listing the existing data -- 2.3. Collecting the data -- 2.4. Exploring and preparing the data -- 2.5. Population segmentation -- 2.6. Drawing up and validating predictive models -- 2.7. Synthesizing predictive models of different segments -- 2.8. Iteration of the preceding steps -- 2.9. Deploying the models -- 2.10. Training the model users -- 2.11. Monitoring the models -- 2.12. Enriching the models -- 2.13. Remarks -- 2.14. Life cycle of a model -- 2.15. Costs of a pilot project -- Data exploration and preparation -- 3.1. The different types of data -- 3.2. Examining the distribution of variables -- 3.3. Detection of rare or missing values -- 3.4. Detection of aberrant values -- 3.5. Detection of extreme values -- 3.6. Tests of normality -- 3.7. Homoscedasticity and heteroscedasticity -- 3.8. Detection of the most discriminating variables -- 3.9. Transformation of variables -- 3.10. Choosing ranges of values of continuous variables -- 3.11. Creating new variables -- 3.12. Detecting interactions 89 -- 3.13. Automatic variable selection -- 3.14. Detection of collinearity -- 3.15. Sampling -- Using commercial data -- 4.1. Data used in commercial applications -- 4.2. Special data -- 4.3. Data used by business sector -- Statistical and data mining software -- 5.1. Types of data mining and statistical software -- 5.2. Essential characteristics of the software -- 5.3. The main software packages -- 5.4. Comparison of R, SAS and IBM SPSS -- 5.5. How to reduce processing time -- An outline of data mining methods -- 6.1. A note on terminology -- 6.2. Classification of the methods -- 6.3. Comparison of the methods -- 6.4. Using these methods in the business world -- Factor analysis -- 7.1. Principal component analysis -- 7.2. Variants of principal component analysis -- 7.3. Correspondence analysis -- 7.4. Multiple correspondence analysis -- Neural networks -- 8.1. General information on neural networks -- 8.2. Structure of a neural network -- 8.3. Choosing the training sample -- 8.4. Some empirical rules for network design -- 8.5. Data normalization -- 8.6. Learning algorithms -- 8.7. The main neural networks -- Automatic clustering methods -- 9.1. Definition of clustering -- 9.2. Applications of clustering -- 9.3. Complexity of clustering -- 9.4. Clustering structures -- 9.5. Some methodological considerations -- 9.6. Comparison of factor analysis and clustering -- 9.7. Intra-class and inter-class inertias -- 9.8. Measurements of clustering quality -- 9.9. Partitioning methods -- 9.10. Hierarchical ascending clustering -- 9.11. Hybrid clustering methods -- 9.12. Neural clustering -- 9.13. Clustering by aggregation of similarities -- 9.14. Clustering of numeric variables -- 9.15. Overview of clustering methods -- Finding associations -- 10.1. Principles -- 10.2. Using taxonomy -- 10.3. Using supplementary variables -- 10.4. Applications -- 10.5. Example of use -- Classification and prediction methods -- 11.1. Introduction -- 11.2. Inductive and transductive methods -- 11.3. Overview of classification and prediction methods -- 11.4. Classification by decision tree -- 11.5. Prediction by decision tree -- 11.6. Classification by discriminant analysis -- 11.7. Prediction by linear regression -- 11.8. Classification by logistic regression -- 11.9. Developments in logistic regression -- 11.10. Bayesian methods -- 11.11. Classification and prediction by neural networks -- 11.12. Classification by support vector machines (SVMs) -- 11.13. Prediction by genetic algorithms -- 11.14. Improving the performance of a predictive model -- 11.15. Bootstrapping and aggregation of models -- 11.16. Using classification and prediction methods -- An application of data mining: scoring -- 12.1. The different types of score -- 12.2. Using propensity scores and risk scores -- 12.3. Methodology -- 12.4. Implementing a strategic score -- 12.5. Implementing an operational score -- 12.6. The kinds of scoring solutions used in a business -- 12.7. An example of credit scoring (data preparation) -- 12.8. An example of credit scoring (modelling by logistic regression) -- 12.9. An example of credit scoring (modelling by DISQUAL discriminant analysis) -- 12.10. A brief history of credit scoring -- Factors for success in a data mining project -- 13.1. The subject -- 13.2. The people -- 13.3. The data -- 13.4. The IT systems -- 13.5. The business culture -- 13.6. Data mining: eight common misconceptions -- 13.7. Return on investment -- Text mining -- 14.1. Definition of text mining -- 14.2. Text sources used -- 14.3. Using text mining -- 14.4. Information retrieval -- 14.5. Information extraction -- 14.6. Multi-type data mining -- Web mining -- 15.1. The aims of web mining -- 15.2. Global analyses -- 15.3. Individual analyses -- 15.4. Personal analyses -- Appendix: Elements of statistics -- 16.1. A brief history -- 16.2. Elements of statistics -- 16.3. Statistical tables -- Further reading -- 17.1. Statistics and data analysis -- 17.2. Data mining and statistical learning -- 17.3. Text mining -- 17.4. Web mining -- 17.5. R software -- 17.6. SAS software -- 17.7. IBM SPSS software -- 17.8. Websites -- Index. 
650 0 |a Data mining. 
650 0 |a Statistical decision. 
650 7 |a Data mining.  |2 fast  |0 (OCoLC)fst00887946. 
650 7 |a Statistical decision.  |2 fast  |0 (OCoLC)fst01132059. 
776 0 8 |i Druckausg.  |z 9780470688298. 
856 4 0 |u https://ebookcentral.proquest.com/lib/ucb/detail.action?docID=792450  |z Full Text (via ProQuest) 
880 0 0 |6 505-01/(S  |g Contents note continued:  |g 11.9.1.  |t Logistic regression on individuals with different weights --  |g 11.9.2.  |t Logistic regression with correlated data --  |g 11.9.3.  |t Ordinal logistic regression --  |g 11.9.4.  |t Multinomial logistic regression --  |g 11.9.5.  |t PLS logistic regression --  |g 11.9.6.  |t generalized linear model --  |g 11.9.7.  |t Poisson regression --  |g 11.9.8.  |t generalized additive model --  |g 11.10.  |t Bayesian methods --  |g 11.10.1.  |t naive Bayesian classifier --  |g 11.10.2.  |t Bayesian networks --  |g 11.11.  |t Classification and prediction by neural networks --  |g 11.11.1.  |t Advantages of neural networks --  |g 11.11.2.  |t Disadvantages of neural networks --  |g 11.12.  |t Classification by support vector machines --  |g 11.12.1.  |t Introduction to SVMs --  |g 11.12.2.  |t Example --  |g 11.12.3.  |t Advantages of SVMs --  |g 11.12.4.  |t Disadvantages of SVMs --  |g 11.13.  |t Prediction by genetic algorithms --  |g 11.13.1.  |t Random generation of initial rules --  |g 11.13.2.  |t Selecting the best rules --  |g 11.13.3.  |t Generating new rules --  |g 11.13.4.  |t End of the algorithm --  |g 11.13.5.  |t Applications of genetic algorithms --  |g 11.13.6.  |t Disadvantages of genetic algorithms --  |g 11.14.  |t Improving the performance of a predictive model --  |g 11.15.  |t Bootstrapping and ensemble methods --  |g 11.15.1.  |t Bootstrapping --  |g 11.15.2.  |t Bagging --  |g 11.15.3.  |t Boosting --  |g 11.15.4.  |t Some applications --  |g 11.15.5.  |t Conclusion --  |g 11.16.  |t Using classification and prediction methods --  |g 11.16.1.  |t Choosing the modelling methods --  |g 11.16.2.  |t training phase of a model --  |g 11.16.3.  |t Reject inference --  |g 11.16.4.  |t test phase of a model --  |g 11.16.5.  |t ROC curve, the lift curve and the Gini index --  |g 11.16.6.  |t classification table of a model --  |g 11.16.7.  |t validation phase of a model --  |g 11.16.8.  |t application phase of a model --  |g 12.  |t application of data mining: scoring --  |g 12.1.  |t different types of score --  |g 12.2.  |t Using propensity scores and risk scores --  |g 12.3.  |t Methodology --  |g 12.3.1.  |t Determining the objectives --  |g 12.3.2.  |t Data inventory and preparation --  |g 12.3.3.  |t Creating the analysis base --  |g 12.3.4.  |t Developing a predictive model --  |g 12.3.5.  |t Using the score --  |g 12.3.6.  |t Deploying the score --  |g 12.3.7.  |t Monitoring the available tools --  |g 12.4.  |t Implementing a strategic score --  |g 12.5.  |t Implementing an operational score --  |g 12.6.  |t Scoring solutions used in a business --  |g 12.6.1.  |t In-house or outsourced--  |g 12.6.2.  |t Generic or personalized score --  |g 12.6.3.  |t Summary of the possible solutions --  |g 12.7.  |t example of credit scoring (data preparation) --  |g 12.8.  |t example of credit scoring (modelling by logistic regression) --  |g 12.9.  |t example of credit scoring (modelling by DISQUAL discriminant analysis) --  |g 12.10.  |t brief history of credit scoring --  |t References --  |g 13.  |t Factors for success in a data mining project --  |g 13.1.  |t subject --  |g 13.2.  |t people --  |g 13.3.  |t data --  |g 13.4.  |t IT systems --  |g 13.5.  |t business culture --  |g 13.6.  |t Data mining: eight common misconceptions --  |g 13.6.1.  |t No a priori knowledge is needed --  |g 13.6.2.  |t No specialist staff are needed --  |g 13.6.3.  |t No statisticians are needed (ỳou can just press a button') --  |g 13.6.4.  |t Data mining will reveal unbelievable wonders --  |g 13.6.5.  |t Data mining is revolutionary --  |g 13.6.6.  |t You must use all the available data --  |g 13.6.7.  |t You must always sample --  |g 13.6.8.  |t You must never sample --  |g 13.7.  |t Return on investment --  |g 14.  |t Text mining --  |g 14.1.  |t Definition of text mining --  |g 14.2.  |t Text sources used --  |g 14.3.  |t Using text mining --  |g 14.4.  |t Information retrieval --  |g 14.4.1.  |t Linguistic analysis --  |g 14.4.2.  |t Application of statistics and data mining --  |g 14.4.3.  |t Suitable methods --  |g 14.5.  |t Information extraction --  |g 14.5.1.  |t Principles of information extraction --  |g 14.5.2.  |t Example of application: transcription of business interviews --  |g 14.6.  |t Multi-type data mining --  |g 15.  |t Web mining --  |g 15.1.  |t aims of web mining --  |g 15.2.  |t Global analyses --  |g 15.2.1.  |t What can they be used for--  |g 15.2.2.  |t structure of the log file --  |g 15.2.3.  |t Using the log file --  |g 15.3.  |t Individual analyses --  |g 15.4.  |t Personal analysis --  |g Appendix  |t A Elements of statistics --  |g A.1.  |t brief history --  |g A.1.1.  |t few dates --  |g A.1.2.  |t From statistics ... to data mining --  |g A.2.  |t Elements of statistics --  |g A.2.1.  |t Statistical characteristics --  |g A.2.2.  |t Box and whisker plot --  |g A.2.3.  |t Hypothesis testing --  |g A.2.4.  |t Asymptotic, exact, parametric and non-parametric tests --  |g A.2.5.  |t Confidence interval for a mean: student's r lest --  |g A.2.6.  |t Confidence interval of a frequency (or proportion) --  |g A.2.7.  |t relationship between two continuous variables: the linear correlation coefficient --  |g A.2.8.  |t relationship between two numeric or ordinal variables: Spearman's rank correlation coefficient and Kendall's tau --  |g A.2.9.  |t relationship between n sets of several continuous or binary variables: canonical correlation analysis --  |g A.2.10.  |t relationship between two nominal variables: the Χ2 test --  |g A.2.11.  |t Example of use of the Χ2 test --  |g A.2.12.  |t relationship between two nominal variables: Cramer's coefficient --  |g A.2.13.  |t relationship between a nominal variable and a numeric variable: the variance test (one-way ANOVA test) --  |g A.2.14.  |t cox semi-parametric survival model --  |g A.3.  |t Statistical tables --  |g A.3.1.  |t Table of the standard normal distribution --  |g A.3.2.  |t Table of student's t distribution --  |g A.3.3.  |t Chi-Square table --  |g A.3.4.  |t Table of the Fisher-Snedecor distribution at the 0.05 significance level --  |g A.3.5.  |t Table of the Fisher-Snedecor distribution at the 0.10 significance level --  |g Appendix B  |t Further reading --  |g B.1.  |t Statistics and data analysis --  |g B.2.  |t Data mining and statistical learning --  |g B.3.  |t Text mining --  |g B.4.  |t Web mining --  |g B.5.  |t R software --  |g B.6.  |t SAS software --  |g B.7.  |t IBM SPSS software --  |g B.8.  |t Websites. 
907 |a .b96581335  |b 08-25-20  |c 10-03-17 
998 |a web  |b  - -   |c f  |d b   |e z  |f eng  |g gw   |h 0  |i 2 
915 |a - 
956 |a Ebook Central Academic Complete 
956 |b Ebook Central Academic Complete 
999 f f |i 0b56ad97-faa4-59d6-bb70-f6da01a59139  |s 4526576b-8535-5a1e-a3c7-f99e1b61c6cb 
952 f f |p Can circulate  |a University of Colorado Boulder  |b Online  |c Online  |d Online  |e QA76.9.D343 T84 2011  |h Library of Congress classification  |i web  |n 1