Data Analysis : A Gentle Introduction for Future Data Scientists.

This slim volume provides a very approachable guide to the techniques and basic ideas of probability and statistics and more advanced techniques such as generalised linear models, classification using logistic regression, and support-vector machines.

Saved in:
Bibliographic Details
Online Access: Full Text (via EBSCO)
Main Author: Upton, Graham
Other Authors: Brawn, Dan
Format: Electronic eBook
Language:English
Published: Oxford : Oxford University Press, Incorporated, 2023.
Subjects:
Table of Contents:
  • Cover
  • Titlepage
  • Copyright
  • Contents
  • Preface
  • 1 First steps
  • 1.1 Types of data
  • 1.2 Sample and population
  • 1.2.1 Observations and random variables
  • 1.2.2 Sampling variation
  • 1.3 Methods for sampling a population
  • 1.3.1 The simple random sample
  • 1.3.2 Cluster sampling
  • 1.3.3 Stratified sampling
  • 1.3.4 Systematic sampling
  • 1.4 Oversampling and the use of weights
  • 2 Summarizing data
  • 2.1 Measures of location
  • 2.1.1 The mode
  • 2.1.2 The mean
  • 2.1.3 The trimmed mean
  • 2.1.4 The Winsorized mean
  • 2.1.5 The median
  • 2.2 Measures of spread
  • 2.2.1 The range
  • 2.2.2 The interquartile range
  • 2.3 Boxplot
  • 2.4 Histograms
  • 2.5 Cumulative frequency diagrams
  • 2.6 Step diagrams
  • 2.7 The variance and standard deviation
  • 2.8 Symmetric and skewed data
  • 3 Probability
  • 3.1 Probability
  • 3.2 The rules of probability
  • 3.3 Conditional probability and independence
  • 3.4 The total probability theorem
  • 3.5 Bayes' theorem
  • 4 Probability distributions
  • 4.1 Notation
  • 4.2 Mean and variance of a probability distribution
  • 4.3 The relation between sample and population
  • 4.4 Combining means and variances
  • 4.5 Discrete uniform distribution
  • 4.6 Probability density function
  • 4.7 The continuous uniform distribution
  • 5 Estimation and confidence
  • 5.1 Point estimates
  • 5.1.1 Maximum likelihood estimation (mle)
  • 5.2 Confidence intervals
  • 5.3 Confidence interval for the population mean
  • 5.3.1 The normal distribution
  • 5.3.2 The Central Limit Theorem
  • 5.3.3 Construction of the confidence interval
  • 5.4 Confidence interval for a proportion
  • 5.4.1 The binomial distribution
  • 5.4.2 Confidence interval for a proportion (large sample case)
  • 6.3.1 Do the two samples come from the same population?
  • 6.3.2 Do the two populations have the same mean?
  • 7 Comparing proportions
  • 7.1 The 2 2 table
  • 7.2 Some terminology
  • 7.2.1 Odds, odds ratios, and independence
  • 7.2.2 Relative risk
  • 7.2.3 Sensitivity, specificity, and related quantities
  • 7.3 The R C table
  • 7.3.1 Residuals
  • 7.3.2 Partitioning
  • 8 Relations between two continuous variables
  • 8.1 Scatter diagrams
  • 8.2 Correlation
  • 8.2.1 Testing for independence
  • 8.3 The equation of a line
  • 8.4 The method of least squares
  • 8.5 A random dependent variable, Y
  • 8.5.1 Estimation of σ2
  • 5.4.3 Confidence interval for a proportion (small sample)
  • 5.5 Confidence bounds for other summary statistics
  • 5.5.1 The bootstrap
  • 5.6 Some other probability distributions
  • 5.6.1 The Poisson and exponential distributions
  • 5.6.2 The Weibull distribution
  • 5.6.3 The chi-squared (χ2) distribution
  • 6 Models, p-values, and hypotheses
  • 6.1 Models
  • 6.2 p-values and the null hypothesis
  • 6.2.1 Two-sided or one-sided-- 6.2.2 Interpreting p-values
  • 6.2.3 Comparing p-values
  • 6.2.4 Link with confidence interval
  • 6.3 p-values when comparing two samples