### Synopses & Reviews

Biostatistics can be a language nearly indecipherable to those not trained to speak it. In Topics in Biostatistics, a broad survey of biostatiscal methods, techniques are illustrated by clear, step-by-step instructions able to be performed with paper, a pencil, and a calculator. With subjects as diverse as descriptive statistics, study design, statistical inference, and linear and logistic regression, this volume invites the reader to better understand the language of statistics to aid in collaborations with biostatisticians or the deciphering of software manuals. Following the format of the highly successful Methods in Molecular Biology™ format, each protocol offers readily reproducible results, and, specifically in this case, provides the reader with a concise introduction to complicated statistical methods. Comprehensive and enlightening, Topics in Biostatistics is the perfect statistical resource for scientists in all disciplines attempting to comprehend this challenging mathematical field.

#### Review

From the reviews: "It is useful to biologists as well as statisticians. The book describes the use of appropriate statistical methods and offers guidelines on selecting an appropriate test. ... The main strength of the book is the clear exposition of statistical tests. ... As biostatisticians we strongly recommend this book for libraries where research on molecular biology is going on, and students ... in biostatistics. ... helpful in understanding the essence of the statistical techniques to a wide range of researchers and students in biology and medicine." (Mukesh Srivastava and M. Abbas, Journal of Applied Statistics, Vol. 36 (4), April, 2009)

#### Synopsis

This book presents a multidisciplinary survey of biostatics methods, each illustrated with hands-on examples. It introduces advanced methods in statistics, including how to choose and work with statistical packages. Specific topics of interest include microarray analysis, missing data techniques, power and sample size, statistical methods in genetics. The book is an essential resource for researchers at every level of their career.

#### Synopsis

Basic Biostatistics presents a multidisciplinary survey of biostatics methods, each illustrated with hands-on examples. Methods range from the elementary, including descriptive statistics, study design, statistical interference, categorical variables, evaluation of diagnostic tests, comparison of means, linear regression, and logistic regression. These introductory methods create a portfolio of biostatistical techniques for both novice and expert researchers. More complicated statistical methods are introduced as well, including those requiring either collaboration with a biostatistician or the use of a statistical package. Specific topics of interest include microarray analysis, missing data techniques, power and sample size, statistical methods in genetics. Expert advice is given on when to seek statistical help, and how to conduct a meeting with the statistical collaborator or consultant. Basic Biostatistics is an essential resource for researchers at every level of their career.

### Table of Contents

Outline: 1. Study Design-The Basics Hyun Ja Lim and Raymond Hoffmann, Medical College of Wisconsin 1. Introduction 2. Experimental Studies 2.1 Randomized controlled studies 2.2 Historically controlled studies 2.3 Crossover studies 2.4 Factorial designs 2.5 Cluster or group allocation designs 3. Randomization 3.1 Complete or simple randomization 3.2 Block randomization 3.3 Stratified randomization 4. Blinding/Masking 5. Biases 6. Analyses 6.1 Compliance 6.2 Intention-to-treat (ITT) analysis 6.3 As received and per-protocol (PP) analysis 6.4 Subgroup analysis 6.5 Exploratory analyses 7. Study Interpretation 2. Observational Study Design Raymond Hoffmann and Hyun Ja Lim, Medical College of Wisconsin 1. Introduction 2. Cohort Studies 3. Prospective Cohort Studies and Retrospective Cohort Studies 4. Case-Control Studies 4.1 Odds Ratios 4.2 Choice of Controls 4.2 Case-Control Genetic Association Studies 4.3 Matching and Case-Control Studies 4.4 Biases in Case-Control Studies 4.5 Cross-Sectional Studies 5. Outcomes 6. More on Odds Ratios and Relative Risks 6.1 Relative Risks 6.2 Odds Ratios 7. Summary 3. Descriptive Statistics Todd Nick, Cincinnati Children's Hospital 1. Types of Data 2. Measures of location and spread 3. Normal distribution 4. Distribution of a mean 5. Distribution of a variance (including degrees of freedom) 6. Distribution of a proportion 4. Basic Principles of Statistical Inference Wanzhu Tu, Indiana University School of Medicine 1. Introduction 2. Parameter Estimation 2.1 Point Estimation 2.2 Confidence Interval Estimation 2.2.1 Large Sample Confidence Interval for the Mean 2.2.2 Student t-distribution 2.2.3 Small Sample Confidence Interval for the Mean 2.2.4 Simultaneous Inference: Bonferroni's Multiplicity Adjustment 2.2.5 Confidence Interval for the Variance2.2.6 One-Sided Confidence Intervals 3. Hypothesis Testing 3.1 Understanding Hypothesis Testing 3.2 One sample t test 3.3 An alternaive Decision Rule: P-value 3.4 Errors, Power, and Sample Size 3.5 Statistical Significance and Practical Significance 5. Statistical Inference on Categorical Variables Susan Perkins, Indiana University School of Medicine 1. Introduction 1.1 What is Categorical Data? 1.2 Categorical Data Distributions 1.3 General Notation 1.4 Statistical Analysis Using Categorical Data 2. The Binomial Distribution and the Normal Approximation to the Binomial Distribution 2.1 The Binomial Experiment 2.2 The Binomial Distribution 2.3 The Normal Approximation to the Binomial 3. Estimation and Testing of Single Proportions/Two Proportions 3.1 Estimation of a Single Proportion or the Difference Between Two Proportions 3.2 Hypothesis Testing with a Single Proportion or the Difference Between Two Proportions 3.3 Assumptions 4. Tests of Association 4.1 2x2 Tables 4.2 RxC Tables 4.3 Relationship Between Tests of Independence and Homogeneity 4.4 Fisher's Exact Test 5. McNemar's Test 6. Sample Size Estimation 7. Discussion 6. Development and Evaluation of Classifiers Todd A. Alonzo, University of Southern California, and Margaret Sullivan Pepe, Fred Hutchinson Cancer Research Center and University of Washington 1. Introduction 2. Measures of Classification Accuracy 2.1 True and False Positive Fractions 2.2 Predictive Values 2.3 Diagnostic Likelihood Ratios 2.4 ROC Curves 2.5 Selecting a Measure of Accuracy 3. Basics of Study Design 3.1 Case-control versus Cohort Designs 3.2 Paired versus Unpaired Designs 3.3 Blinding 3.4 Avoiding Bias 3.5 Factors Affecting Test Performance 4. Estimating Performance from Data 4.1 Single binary test 4.2 Comparison of TPF and FPF for two binary tests 4.2.1 Unpaired Design 4.2.2 Paired Design 4.3 Estimating ROC Curves and Summary Indices 4.3.1 Empirical ROC urve 4.3.2. Binormal ROC curve* 4.4 Comparing ROC Curves* 4.4.1 Empirical ROC curves* 4.4.2 Binormal ROC curve* 5. Combining Tests 5.1 Binary tests 5.2 Continuous tests 6. Additional Topics 6.1 Verification Bias 6.2 Errors in the Reference Test 6.3 Regression 6.4 Evaluating Usefulness 7. Summary 7. Comparison of Means Nancy Berman, Statistical Research Associates 1. Introduction 2. Test Statistics 2.1 The t-Test 2.2 The F Distribution 3. t-Tests Comparing Two Means 3.1 Paired Samples 3.2 The Two Sample t-Test in Independent Groups 4. Tests of Central Tendency when the Distribution is not Normal. 4.1 The Sign Test for a Single Sample 4.2. The Wilcoxon Signed Rank Test for Paired Samples 4.3. The Wilcoxon-Mann-Whitney Test to Compare Two Groups 5. Comparisons of Means in more than Two Groups: ANOVA 5.1 ANOVA 5.2 Contrasts 5.3 A priori comparisons 5.4 Posteriori Contrasts 6. Kruskal-Wallis test 7. Sample size considerations 8. Correlation and Simple Linear Regression Lynn E. Eberly, University of Minnesota 1. Introduction 2. Correlation 2.1. Pearson Product-Moment Correlation Coefficient 2.1.1. Estimation and Interpretation 2.1.2. Inference 2.2. Spearman Rank Correlation Coefficient 3. Simple Linear Regression 3.1. The Linear Relation 3.2. Estimation of the Linear Relation} 3.3. The Simple Linear Regression Model 3.4 Regression Through the Origin 4. Diagnostics: Assessing the Regression Model Fit 4.1. What to Assess 4.2 Tools Used to Assess 4.2.1. Plot of Residuals vs. X 4.2.2. Summary Plots of Residuals 4.2.3. Plot of Residuals vs. Additional Predictor Variables 4.3. When Assessments Show a Problem 5. Inferences from the Regression Model 5.1. Inferences About the Estimated Linear Relation 5.2. Inferences about Y 5.3. Effect of Departures from Normality 6. ANOVA Tables for Regression 7. Study Design for Simple Linear Regression 8. Discussion 9. Multiple Linear Regression Lynn E. Eberly, University of Minnesota 1. Introduction 2. Regression with Multiple Explanatory Variables 2.1. Multiple Linear Regression Model 2.2. Inference 2.3. Overall ANOVA Table 2.4. Partitioning the ANOVA Table by Predictor 3. Assessing Model Fit 3.1. What to Assess 3.2. Tools Used to Assess 3.3. When Assessments Show a Problem 4. Special Cases: Polynomials and Interactions* 4.1. Polynomial Regression 4.2. Regression with Interactions 5. Parallelism: Comparing the Linear Trend Across Groups* 5.1. The ANOVA-Regression Connection: Class Variables for Groups 5.2. Regressions with Continuous and Class Variables 5.3. Interactions with Class Variables 6. Variable Selection: Choosing Among Many Explanatory Variables* 6.1. Overview of Automatic Selection Procedures 6.2. Stepwise Selection Procedures 6.3. Cautionary Notes 7. Discussion 10. General Linear Models Edward H. Ip, Wake Forest University School of Medicine 1. Introduction 2. ANOVA Table 2.1 A simple ANOVA Table for the Common Mean Model 2.2 One-way ANOVA Table 2.3 Two-way ANOVA Table 2.4 Other Cases of One- and Two-way ANOVA 3. F-Tests 3.1 Distributions of Sums of Squares 3.2 Example of F-test 4. Testing of Nested Hypotheses 5. Summary 11. Linear Mixed Effects Models Ann L. Oberg and Douglas W. Mahoney, Division of Biostatistics, Health Sciences Research, Mayo Foundation 1. Introduction 2. Random Block Design 3. Multiple Sources of Variation 4. Correlated Data and Random Effects Regression 5. Model Fitting 6. Power and Sample Size 7. Extensions 12. Design and Analysis of Experiments Jonathan J. Shuster, University of Florida 1. Introduction 2. The Completely Randomized Block Design 2.1 Parametric Basis of Inference: Student's t-test 2.2 Large Sample Inference 2.3 Allocation of the Sample to the Two Treatments: Is 50-50 Best? 3. Randomized Block Designs 4. Stratified Designs 4.1 How to Plan the Sample Size of a Stratified Study 4.2 Post Stratification in a Completely Randomized Design 5. Crossover Designs 6. Two by Two Factorial Designs 7. Randomized Designs with Random Effects 8. Summary 13. Analysis of Change James J. Grady, University of Texas Medical Branch 1. Introduction 2 The One Group Study, Pre- and Post-Test Design 2.1. Graphical Displays and Other Data Summaries 2.2. Assessing Statistical Significance 3. A More Complicated Design: One-Group Study with Baseline and Two Follow-Up Times 4. Repeated Measures Designs 5 Comparison of Change Among Subgroups in a One-Group Study 5.1 Analysis Using Tests for Paired Data 5.2 Analysis of Change Using Analysis of Covariance 14. Logistic Regression Todd G. Nick and Kathleen M. Campbell, Cincinnati Children's Hospital 1. Introduction 2 Example: Effect of TGF- 1 Gene Polymorphism on Renal Dysfunction after Liver Transplantation in Children 3 Measures of Effect for Categorical Outcomes 3.1 Odds and Odds Ratio 3.2 Relative Risk and Absolute Risk Measures 4 Logistic Regression 4.1 Formulating a Model 4.2 Relationship between Logit and Probability Scale 4.3 Interpretation of Coefficients 4.4 Odds Ratio 5 Simple Logistic Regression Model 5.1 Results of Fitting a Simple Logistic Regression Model 5.2 Coding Categorical Predictors 5.2.1 Binary Predictor Variable 5.2.2 Nominal and Ordinal Predictor Variables 5.3 Continuous Predictor Variables 5.3.1 Interpretation of Odds Ratios 5.3.2 Relaxing the Linearity Assumption 6 Logistic Regression with Multiple Predictors 6.1 Introduction 6.2 Model Assuming Additivity 6.3 Model with Interaction among Predictors 6.4 Other issues with Logistic Regression 6.4.1 Global test of Model and Testing a Group of Predictors 6.4.2 Assessing Lack of Fit and Influential Data 6.4.3 Assessing Predictive Accuracy 6.4.4 Sample Size/Power and Automated Selection Routines 7. Conclusion 15. Survival Analysis Hongyu Jiang, Harvard University, Jason P. Fine, University of Wisconsin 1. Introduction 2. Censoring Versus Failure 3. Life Table Methods 4. Kaplan-Meier Curves 5. Log-rank Test 6. Proportional Hazards Model 7. Applicaton 8. Summary 16. Basic Bayesian Methods Mark E. Glickman, Boston University School of Public Health, David A. van Dyk, University of California, Irvine 1. Fundamentals of a Bayesian Analysis 1.1 Data Models 1.2 Prior Distribution 1.3 From the Likelihood to the Posterior Distribution 1.4 Posterior Summaries 1.5 Predictive Distributions 2. Application to Multi-Level Models 2.1 Monte Carlo Methods 2.2 Multi-Level Models 3. Other Resources 17. Overview of Missing Data Techniques Ralph B. D'Agostino, Jr., Wake Forest University School of Medicine 1. Introduction 2. Notation 3. Missing Data Mechanisms 3.1 Missing Completely at Random (MCAR) 3.2 Missing at Random (MAR) 3.3 Missing Not at Random (MNAR)/Nonignorable Missing Data 4. Ad Hoc Methods for Handling Missing Data 4.1 Complete Case Analysis 4.2 Last observation carried forward (LOCF) 4.3 Mean/Regression Imputation 5. Model Based Approaches to Missing Data 5.1 Likelihood Based Modeling 5.2 Stochastic Imputation: Single and Multiple 6 Summary 18. Statistical Topics in the Laboratory Sciences Curtis A. Parvin, Washington University School of Medicine 1. Introduction 2. Estimating Analytical Imprecision 2.1 The Precision Performance StudyQuality Control 2.2 Confidence Interval for Total Imprecision 3. Designing a Laboratory Quality Control Strategy 3.1 Defining Laboratory Quality 3.2 Quality Control Performance Measures 3.2.1 Batch Mode Testing 3.2.2 Continuous Mode Testing 4. Establishing Reference Ranges 4.1 Reference Limit Estimation 4.2 Confidence Intervals for a Reference Limit 4.3 Sample Size Considerations 5. Summary 19. Power and Sample Size L. Douglas Case and Walter T. Ambrosius, Wake Forest University School of Medicine 1. Introduction 2. Power of a Test 2.1 Two-Sided Hypothesis Tests 2.2 Simple versus Composite Hypotheses 2.3 One-Sided versus Two-Sided Hypothesis Tests 3. Sample Size Determination 3.1 Continuous Outcomes: One Group 3.2 Continuous Outcomes: Two Groups 3.3 Dichotomous Outcomes 3.4 Calculation of Power or Sample Size 4. Other Considerations 5. Conclusion 20. Microarray analysis Grier P. Page, Stanislav O. Zakharkin, Kyoungmi Kim, Tapan Mehta, Lang Chen, and Kui Zhang, University of Alabama at Birmingham (all) and University of California, Davis (KK) 1. Abstract 2. Introduction 3. Materials 3.1. What is a Microarray? 3.2. Types of Microarrays 4. Define Objectives of the Study 5. Experimental Design for Microarray Studies 5.1. Randomization 5.2. Replication 5.2.1. Types of Replication 5.2.2. Power and Sample Size 5.3. Design of the Experiment 5.3.1. Reference Design 5.3.2. Incomplete and Complete Balanced Block Design 5.3.3. Loop Design 6. Microarray Analysis 6.1. Image Processing from cDNA and Long Oligo Arrays 6.1.1. Gridding /Addressing 6.1.2. Segmentation 6.1.3. Information Extraction 6.2. Image Analysis of Affymetrix® GeneChip™ Microarrays 6.3. Normalization of DNA Data 6.4. Analysis 6.4.1. Class Prediction Analysis 6.4.2. Class Discovery Analysis 6.4.3. Class Differentiation Analysis 6.4.4. Adjusting for Multiple Testing 7. Interpretation 8. Validation of Microarray Experiments 9. Microarray Informatics 9.1. Data Handling 9.2. MIAME and Standards 9.3. Databases 10. Conclusions 21. Statistical Methods in Genetics Carl D. Langefeld, Wake Forest University School of Medicine, Tasha E. Fingerlin, University of Colorado at Denver and Health Sciences Center 1.0 Fundamental Concepts in Human Genetics 2.0 Estimation of Familial Aggregation 3.0 Estimation of Allele Frequency 4.0 Hardy-Weinberg Principle 5.0 Linkage Disequilibrium 6.0 Testing for Genetic Association 6.1 Association Methods for Unrelated Individuals 6.2 Haplotype Analysis 7.0 Population Stratification 8.0 Family-Based Methods of Association 8.1 Affected Family-Based Control Designs (AFBAC) 8.2 Transmission/Disequilibrium Tests 8.3 Pedigree Disequilibrium Tests 9.0 Quantitative Measures 10.0 High density SNP Mapping 10.1 Restricted Regions 10.2 Genome-Wide Association 11.0 Summary 22. Genome Mapping Statistics and Bioinformatics Josyf Mychaleckyj, Wake Forest University School of Medicine 1. Introduction 2. Genome Sequence Mapping 3. Discrete Sequence Matching 3.1. Exact Matching of Query Strings 3.1.1. Finding Sequence Matches 3.1.2. How Many Exact Matches Are Expected? 3.1.3. Genome Distribution of Query String Matches 3.1.4. Distribution of Non-Overlapping Query String Matches 3.2. Inexact Matching of a Query Sequence 3.3. Joint Mapping of Two Query Sequences 3.4. Mapping Analysis of Over-represented Sequences 3.5. Extended Length Query Sequences 3.5.1 Sequence Alignment Algorithms Are Needed For Longer Sequences 3.5.2 Discontiguous (Gapped) Sequence Mapping 3.6. Significance of Genome Search Results 4. Programs for Mapping Discrete Sequences 5. Discussion 23. Working with a Statistician Nancy Berman, Statistical Research Associates and Christina Gullion, Kaiser Permanente 1. Introduction 2. Why Work with a Statistician? 3. When to seek statistical help 4. Collaborator versus Consultant 5. Roles and Tasks in a Statistical Consulting Relationship 5.1 Introductory meeting(s) 5.1.1. Describing the problem 5.1.2. Supplemental Material 5.1.3. Learning about the Statistician 5.2 Specific Tasks 5.3 Ongoing Process 6. Business and Professional Arrangements 6.1 Expectations Regarding Payment for Statistical Services 6.2. What Costs are Included? 6.3. The Timetable 6.4 Flexibility 6.5 Authorship 6.6. Confidentiality and Security 7. Consulting with More than One Individual