Data Mining and Statistics for Decision Making Stéphane Tufféry.
Saved in:
Online Access: |
Full Text (via ProQuest) |
---|---|
Main Author: | |
Format: | eBook |
Language: | English French |
Published: |
New York, NY
John Wiley & Sons
2011.
|
Series: | Wiley Series in Computational Statistics.
|
Subjects: |
MARC
LEADER | 00000cam a2200000 c 4500 | ||
---|---|---|---|
001 | b9658133 | ||
003 | CoU | ||
005 | 20200821172030.3 | ||
006 | m o d | ||
007 | cr ||||||||||| | ||
008 | 150129s2011 gw o 000 0 eng | ||
019 | |a 711780360 |a 765144014 |a 769189252 |a 769849270 |a 771999468 |a 772397870 |a 799078712 |a 816879070 |a 852505227 |a 961503244 |a 961597673 |a 962613432 |a 962729284 |a 988429457 |a 991924332 |a 992926383 |a 1004475275 |a 1004783163 |a 1008892615 |a 1013730948 |a 1017950860 |a 1021224489 |a 1055368157 |a 1066410657 |a 1077278673 |a 1081214494 |a 1153518407 |a 1162542199 | ||
020 | |a 9780470979167 | ||
020 | |a 047097916X | ||
020 | |a 9780470979174 |q (electronic bk.) | ||
020 | |a 0470979178 |q (electronic bk.) | ||
020 | |a 9780470979280 |q (electronic bk.) | ||
020 | |a 0470979283 |q (electronic bk.) | ||
020 | |z 1283373971 | ||
020 | |z 9781283373975 | ||
020 | |z 9780470688298 |q (hardback) | ||
020 | |z 0470688297 | ||
020 | |a 9786613373977 | ||
020 | |a 6613373974 | ||
024 | 3 | |a 9780470979167 | |
024 | 8 | |a urn:nbn:de:101:1-201501291327 | |
024 | 8 | |a 9786613373977 | |
024 | 3 | |a 9780470979280 | |
024 | 8 | |a urn:nbn:de:101:1-201501032208 | |
035 | |a (OCoLC)ebqac716215543 | ||
035 | |a (OCoLC)716215543 |z (OCoLC)711780360 |z (OCoLC)765144014 |z (OCoLC)769189252 |z (OCoLC)769849270 |z (OCoLC)771999468 |z (OCoLC)772397870 |z (OCoLC)799078712 |z (OCoLC)816879070 |z (OCoLC)852505227 |z (OCoLC)961503244 |z (OCoLC)961597673 |z (OCoLC)962613432 |z (OCoLC)962729284 |z (OCoLC)988429457 |z (OCoLC)991924332 |z (OCoLC)992926383 |z (OCoLC)1004475275 |z (OCoLC)1004783163 |z (OCoLC)1008892615 |z (OCoLC)1013730948 |z (OCoLC)1017950860 |z (OCoLC)1021224489 |z (OCoLC)1055368157 |z (OCoLC)1066410657 |z (OCoLC)1077278673 |z (OCoLC)1081214494 |z (OCoLC)1153518407 |z (OCoLC)1162542199 | ||
037 | |a ebqac792450 | ||
040 | |a GWDNB |b ger |c GWDNB |d YDXCP |d DG1 |d COO |d B24X7 |d E7B |d CDX |d REDDC |d EBLCP |d DEBSZ |d N$T |d OCLCF |d IDEBK |d DEBBG |d S3O |d AZK |d MOR |d LIP |d PIFAG |d ZCU |d LIV |d MERUC |d TEFOD |d SAV |d MERER |d U3W |d UUM |d COCUF |d ICG |d INT |d VT2 |d AU@ |d WYU |d TKN |d DKC |d OL$ |d UKCRE |d VLY |d BRF |d UIU |d GWDNB | ||
041 | 1 | |a eng |h fre | |
044 | |c XA-DE-BW | ||
049 | |a GWRE | ||
050 | 4 | |a QA76.9.D343 |b T84 2011 | |
066 | |c (S | ||
100 | 1 | |a Tufféry, Stéphane |e Verfasser |4 aut. | |
245 | 1 | 0 | |a Data Mining and Statistics for Decision Making |c Stéphane Tufféry. |
264 | 1 | |a New York, NY |b John Wiley & Sons |c 2011. | |
300 | |a Online-Ressource. | ||
336 | |a Text |b txt |2 rdacontent/ger. | ||
337 | |a Computermedien |b c |2 rdamedia/ger. | ||
338 | |a Online-Ressource |b cr |2 rdacarrier/ger. | ||
347 | |a data file |2 rda. | ||
490 | 0 | |a Wiley Series in Computational Statistics. | |
500 | |a Lizenzpflichtig. | ||
505 | 0 | |6 880-01 |a Front Matter -- Overview of Data Mining -- The Development of a Data Mining Study -- Data Exploration and Preparation -- Using Commercial Data -- Statistical and Data Mining Software -- An Outline of Data Mining Methods -- Factor Analysis -- Neural Networks -- Cluster Analysis -- Association Analysis -- Classification and Prediction Methods -- An Application of Data Mining: Scoring -- Factors for Success in a Data Mining Project -- Text Mining -- Web Mining -- Appendix A: Elements of Statistics -- Appendix B: Further Reading -- Index. | |
505 | 8 | |a Machine generated contents note: Preface -- Foreword -- Contents -- Overview of data mining -- 1.1. What is data mining? -- 1.2. What is data mining used for? -- 1.3. Data Mining and statistics -- 1.4. Data mining and information technology -- 1.5. Data mining and protection of personal data -- 1.6. Implementation of data mining -- The development of a data mining study -- 2.1. Defining the aims -- 2.2. Listing the existing data -- 2.3. Collecting the data -- 2.4. Exploring and preparing the data -- 2.5. Population segmentation -- 2.6. Drawing up and validating predictive models -- 2.7. Synthesizing predictive models of different segments -- 2.8. Iteration of the preceding steps -- 2.9. Deploying the models -- 2.10. Training the model users -- 2.11. Monitoring the models -- 2.12. Enriching the models -- 2.13. Remarks -- 2.14. Life cycle of a model -- 2.15. Costs of a pilot project -- Data exploration and preparation -- 3.1. The different types of data -- 3.2. Examining the distribution of variables -- 3.3. Detection of rare or missing values -- 3.4. Detection of aberrant values -- 3.5. Detection of extreme values -- 3.6. Tests of normality -- 3.7. Homoscedasticity and heteroscedasticity -- 3.8. Detection of the most discriminating variables -- 3.9. Transformation of variables -- 3.10. Choosing ranges of values of continuous variables -- 3.11. Creating new variables -- 3.12. Detecting interactions 89 -- 3.13. Automatic variable selection -- 3.14. Detection of collinearity -- 3.15. Sampling -- Using commercial data -- 4.1. Data used in commercial applications -- 4.2. Special data -- 4.3. Data used by business sector -- Statistical and data mining software -- 5.1. Types of data mining and statistical software -- 5.2. Essential characteristics of the software -- 5.3. The main software packages -- 5.4. Comparison of R, SAS and IBM SPSS -- 5.5. How to reduce processing time -- An outline of data mining methods -- 6.1. A note on terminology -- 6.2. Classification of the methods -- 6.3. Comparison of the methods -- 6.4. Using these methods in the business world -- Factor analysis -- 7.1. Principal component analysis -- 7.2. Variants of principal component analysis -- 7.3. Correspondence analysis -- 7.4. Multiple correspondence analysis -- Neural networks -- 8.1. General information on neural networks -- 8.2. Structure of a neural network -- 8.3. Choosing the training sample -- 8.4. Some empirical rules for network design -- 8.5. Data normalization -- 8.6. Learning algorithms -- 8.7. The main neural networks -- Automatic clustering methods -- 9.1. Definition of clustering -- 9.2. Applications of clustering -- 9.3. Complexity of clustering -- 9.4. Clustering structures -- 9.5. Some methodological considerations -- 9.6. Comparison of factor analysis and clustering -- 9.7. Intra-class and inter-class inertias -- 9.8. Measurements of clustering quality -- 9.9. Partitioning methods -- 9.10. Hierarchical ascending clustering -- 9.11. Hybrid clustering methods -- 9.12. Neural clustering -- 9.13. Clustering by aggregation of similarities -- 9.14. Clustering of numeric variables -- 9.15. Overview of clustering methods -- Finding associations -- 10.1. Principles -- 10.2. Using taxonomy -- 10.3. Using supplementary variables -- 10.4. Applications -- 10.5. Example of use -- Classification and prediction methods -- 11.1. Introduction -- 11.2. Inductive and transductive methods -- 11.3. Overview of classification and prediction methods -- 11.4. Classification by decision tree -- 11.5. Prediction by decision tree -- 11.6. Classification by discriminant analysis -- 11.7. Prediction by linear regression -- 11.8. Classification by logistic regression -- 11.9. Developments in logistic regression -- 11.10. Bayesian methods -- 11.11. Classification and prediction by neural networks -- 11.12. Classification by support vector machines (SVMs) -- 11.13. Prediction by genetic algorithms -- 11.14. Improving the performance of a predictive model -- 11.15. Bootstrapping and aggregation of models -- 11.16. Using classification and prediction methods -- An application of data mining: scoring -- 12.1. The different types of score -- 12.2. Using propensity scores and risk scores -- 12.3. Methodology -- 12.4. Implementing a strategic score -- 12.5. Implementing an operational score -- 12.6. The kinds of scoring solutions used in a business -- 12.7. An example of credit scoring (data preparation) -- 12.8. An example of credit scoring (modelling by logistic regression) -- 12.9. An example of credit scoring (modelling by DISQUAL discriminant analysis) -- 12.10. A brief history of credit scoring -- Factors for success in a data mining project -- 13.1. The subject -- 13.2. The people -- 13.3. The data -- 13.4. The IT systems -- 13.5. The business culture -- 13.6. Data mining: eight common misconceptions -- 13.7. Return on investment -- Text mining -- 14.1. Definition of text mining -- 14.2. Text sources used -- 14.3. Using text mining -- 14.4. Information retrieval -- 14.5. Information extraction -- 14.6. Multi-type data mining -- Web mining -- 15.1. The aims of web mining -- 15.2. Global analyses -- 15.3. Individual analyses -- 15.4. Personal analyses -- Appendix: Elements of statistics -- 16.1. A brief history -- 16.2. Elements of statistics -- 16.3. Statistical tables -- Further reading -- 17.1. Statistics and data analysis -- 17.2. Data mining and statistical learning -- 17.3. Text mining -- 17.4. Web mining -- 17.5. R software -- 17.6. SAS software -- 17.7. IBM SPSS software -- 17.8. Websites -- Index. | |
650 | 0 | |a Data mining. | |
650 | 0 | |a Statistical decision. | |
650 | 7 | |a Data mining. |2 fast |0 (OCoLC)fst00887946. | |
650 | 7 | |a Statistical decision. |2 fast |0 (OCoLC)fst01132059. | |
776 | 0 | 8 | |i Druckausg. |z 9780470688298. |
856 | 4 | 0 | |u https://ebookcentral.proquest.com/lib/ucb/detail.action?docID=792450 |z Full Text (via ProQuest) |
880 | 0 | 0 | |6 505-01/(S |g Contents note continued: |g 11.9.1. |t Logistic regression on individuals with different weights -- |g 11.9.2. |t Logistic regression with correlated data -- |g 11.9.3. |t Ordinal logistic regression -- |g 11.9.4. |t Multinomial logistic regression -- |g 11.9.5. |t PLS logistic regression -- |g 11.9.6. |t generalized linear model -- |g 11.9.7. |t Poisson regression -- |g 11.9.8. |t generalized additive model -- |g 11.10. |t Bayesian methods -- |g 11.10.1. |t naive Bayesian classifier -- |g 11.10.2. |t Bayesian networks -- |g 11.11. |t Classification and prediction by neural networks -- |g 11.11.1. |t Advantages of neural networks -- |g 11.11.2. |t Disadvantages of neural networks -- |g 11.12. |t Classification by support vector machines -- |g 11.12.1. |t Introduction to SVMs -- |g 11.12.2. |t Example -- |g 11.12.3. |t Advantages of SVMs -- |g 11.12.4. |t Disadvantages of SVMs -- |g 11.13. |t Prediction by genetic algorithms -- |g 11.13.1. |t Random generation of initial rules -- |g 11.13.2. |t Selecting the best rules -- |g 11.13.3. |t Generating new rules -- |g 11.13.4. |t End of the algorithm -- |g 11.13.5. |t Applications of genetic algorithms -- |g 11.13.6. |t Disadvantages of genetic algorithms -- |g 11.14. |t Improving the performance of a predictive model -- |g 11.15. |t Bootstrapping and ensemble methods -- |g 11.15.1. |t Bootstrapping -- |g 11.15.2. |t Bagging -- |g 11.15.3. |t Boosting -- |g 11.15.4. |t Some applications -- |g 11.15.5. |t Conclusion -- |g 11.16. |t Using classification and prediction methods -- |g 11.16.1. |t Choosing the modelling methods -- |g 11.16.2. |t training phase of a model -- |g 11.16.3. |t Reject inference -- |g 11.16.4. |t test phase of a model -- |g 11.16.5. |t ROC curve, the lift curve and the Gini index -- |g 11.16.6. |t classification table of a model -- |g 11.16.7. |t validation phase of a model -- |g 11.16.8. |t application phase of a model -- |g 12. |t application of data mining: scoring -- |g 12.1. |t different types of score -- |g 12.2. |t Using propensity scores and risk scores -- |g 12.3. |t Methodology -- |g 12.3.1. |t Determining the objectives -- |g 12.3.2. |t Data inventory and preparation -- |g 12.3.3. |t Creating the analysis base -- |g 12.3.4. |t Developing a predictive model -- |g 12.3.5. |t Using the score -- |g 12.3.6. |t Deploying the score -- |g 12.3.7. |t Monitoring the available tools -- |g 12.4. |t Implementing a strategic score -- |g 12.5. |t Implementing an operational score -- |g 12.6. |t Scoring solutions used in a business -- |g 12.6.1. |t In-house or outsourced-- |g 12.6.2. |t Generic or personalized score -- |g 12.6.3. |t Summary of the possible solutions -- |g 12.7. |t example of credit scoring (data preparation) -- |g 12.8. |t example of credit scoring (modelling by logistic regression) -- |g 12.9. |t example of credit scoring (modelling by DISQUAL discriminant analysis) -- |g 12.10. |t brief history of credit scoring -- |t References -- |g 13. |t Factors for success in a data mining project -- |g 13.1. |t subject -- |g 13.2. |t people -- |g 13.3. |t data -- |g 13.4. |t IT systems -- |g 13.5. |t business culture -- |g 13.6. |t Data mining: eight common misconceptions -- |g 13.6.1. |t No a priori knowledge is needed -- |g 13.6.2. |t No specialist staff are needed -- |g 13.6.3. |t No statisticians are needed (ỳou can just press a button') -- |g 13.6.4. |t Data mining will reveal unbelievable wonders -- |g 13.6.5. |t Data mining is revolutionary -- |g 13.6.6. |t You must use all the available data -- |g 13.6.7. |t You must always sample -- |g 13.6.8. |t You must never sample -- |g 13.7. |t Return on investment -- |g 14. |t Text mining -- |g 14.1. |t Definition of text mining -- |g 14.2. |t Text sources used -- |g 14.3. |t Using text mining -- |g 14.4. |t Information retrieval -- |g 14.4.1. |t Linguistic analysis -- |g 14.4.2. |t Application of statistics and data mining -- |g 14.4.3. |t Suitable methods -- |g 14.5. |t Information extraction -- |g 14.5.1. |t Principles of information extraction -- |g 14.5.2. |t Example of application: transcription of business interviews -- |g 14.6. |t Multi-type data mining -- |g 15. |t Web mining -- |g 15.1. |t aims of web mining -- |g 15.2. |t Global analyses -- |g 15.2.1. |t What can they be used for-- |g 15.2.2. |t structure of the log file -- |g 15.2.3. |t Using the log file -- |g 15.3. |t Individual analyses -- |g 15.4. |t Personal analysis -- |g Appendix |t A Elements of statistics -- |g A.1. |t brief history -- |g A.1.1. |t few dates -- |g A.1.2. |t From statistics ... to data mining -- |g A.2. |t Elements of statistics -- |g A.2.1. |t Statistical characteristics -- |g A.2.2. |t Box and whisker plot -- |g A.2.3. |t Hypothesis testing -- |g A.2.4. |t Asymptotic, exact, parametric and non-parametric tests -- |g A.2.5. |t Confidence interval for a mean: student's r lest -- |g A.2.6. |t Confidence interval of a frequency (or proportion) -- |g A.2.7. |t relationship between two continuous variables: the linear correlation coefficient -- |g A.2.8. |t relationship between two numeric or ordinal variables: Spearman's rank correlation coefficient and Kendall's tau -- |g A.2.9. |t relationship between n sets of several continuous or binary variables: canonical correlation analysis -- |g A.2.10. |t relationship between two nominal variables: the Χ2 test -- |g A.2.11. |t Example of use of the Χ2 test -- |g A.2.12. |t relationship between two nominal variables: Cramer's coefficient -- |g A.2.13. |t relationship between a nominal variable and a numeric variable: the variance test (one-way ANOVA test) -- |g A.2.14. |t cox semi-parametric survival model -- |g A.3. |t Statistical tables -- |g A.3.1. |t Table of the standard normal distribution -- |g A.3.2. |t Table of student's t distribution -- |g A.3.3. |t Chi-Square table -- |g A.3.4. |t Table of the Fisher-Snedecor distribution at the 0.05 significance level -- |g A.3.5. |t Table of the Fisher-Snedecor distribution at the 0.10 significance level -- |g Appendix B |t Further reading -- |g B.1. |t Statistics and data analysis -- |g B.2. |t Data mining and statistical learning -- |g B.3. |t Text mining -- |g B.4. |t Web mining -- |g B.5. |t R software -- |g B.6. |t SAS software -- |g B.7. |t IBM SPSS software -- |g B.8. |t Websites. |
907 | |a .b96581335 |b 08-25-20 |c 10-03-17 | ||
998 | |a web |b - - |c f |d b |e z |f eng |g gw |h 0 |i 2 | ||
915 | |a - | ||
956 | |a Ebook Central Academic Complete | ||
956 | |b Ebook Central Academic Complete | ||
999 | f | f | |i 0b56ad97-faa4-59d6-bb70-f6da01a59139 |s 4526576b-8535-5a1e-a3c7-f99e1b61c6cb |
952 | f | f | |p Can circulate |a University of Colorado Boulder |b Online |c Online |d Online |e QA76.9.D343 T84 2011 |h Library of Congress classification |i web |n 1 |