Synopses & Reviews
With its highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis. But if you're a biologist with little or no programming experience, starting out in Perl can be a challenge. Many biologists have a difficult time learning how to apply the language to bioinformatics. The most popular Perl programming books are often too theoretical and too focused on computer science for a non-programming biologist who needs to solve very specific problems.Beginning Perl for Bioinformatics is designed to get you quickly over the Perl language barrier by approaching programming as an important new laboratory skill, revealing Perl programs and techniques that are immediately useful in the lab. Each chapter focuses on solving a particular bioinformatics problem or class of problems, starting with the simplest and increasing in complexity as the book progresses. Each chapter includes programming exercises and teaches bioinformatics by showing and modifying programs that deal with various kinds of practical biological problems. By the end of the book you'll have a solid understanding of Perl basics, a collection of programs for such tasks as parsing BLAST and GenBank, and the skills to take on more advanced bioinformatics programming. Some of the later chapters focus in greater detail on specific bioinformatics topics. This book is suitable for use as a classroom textbook, for self-study, and as a reference.The book covers:
- Programming basics and working with DNA sequences and strings
- Debugging your code
- Simulating gene mutations using random number generators
- Regular expressions and finding motifs in data
- Arrays, hashes, and relational databases
- Regular expressions and restriction maps
- Using Perl to parse PDB records, annotations in GenBank, and BLAST output
This text is designed to get you quickly over the Perl language barrier by approaching programming as an important new laboratory skill, revealing Perl programs and techniques that are immediately useful in the lab. Each chapter focuses on solving a particular bioinformatics problem.
A practical introduction to Perl designed for biologists with little or no programming experience. The book approaches programming as an important new laboratory skill, and shows many Perl programs and Perl programming techniques that can be immediately useful in the lab. Each chapter focuses on a problem or class of problems in bioinformatics, and shows how to use Perl to solve them.
About the Author
James Tisdall has worked as a musician, a programmer at Bell Labs (where he programmed for speech research and discovered a formal language for musical rhythm), and as a bioinformaticist at Mercator Genetics in Menlo Park, California, and at Fox Chase Cancer Center in Philadelphia. He has a B.A. in mathematics from the City College of New York and an M.S. in computer science from Columbia University; he is working towards a Ph.D. in computer science at the University of Pennsylvania. In his spare time, Jim teaches computer music at the Settlement Music School in Philadelphia. He is also the author of O'Reilly's Beginning Perl for Bioinformatics.
Table of Contents
Preface; What Is Bioinformatics?; About This Book; Who This Book Is For; Why Should I Learn to Program?; Structure of This Book; Conventions Used in This Book; Comments and Questions; Acknowledgments; Chapter 1: Biology and Computer Science; 1.1 The Organization of DNA; 1.2 The Organization of Proteins; 1.3 In Silico; 1.4 Limits to Computation; Chapter 2: Getting Started with Perl; 2.1 A Low and Long Learning Curve; 2.2 Perl's Benefits; 2.3 Installing Perl on Your Computer; 2.4 How to Run Perl Programs; 2.5 Text Editors; 2.6 Finding Help; Chapter 3: The Art of Programming; 3.1 Individual Approaches to Programming; 3.2 Edit--Run--Revise (and Save); 3.3 An Environment of Programs; 3.4 Programming Strategies; 3.5 The Programming Process; Chapter 4: Sequences and Strings; 4.1 Representing Sequence Data; 4.2 A Program to Store a DNA Sequence; 4.3 Concatenating DNA Fragments; 4.4 Transcription: DNA to RNA; 4.5 Using the Perl Documentation; 4.6 Calculating the Reverse Complement in Perl; 4.7 Proteins, Files, and Arrays; 4.8 Reading Proteins in Files; 4.9 Arrays; 4.10 Scalar and List Context; 4.11 Exercises; Chapter 5: Motifs and Loops; 5.1 Flow Control; 5.2 Code Layout; 5.3 Finding Motifs; 5.4 Counting Nucleotides; 5.5 Exploding Strings into Arrays; 5.6 Operating on Strings; 5.7 Writing to Files; 5.8 Exercises; Chapter 6: Subroutines and Bugs; 6.1 Subroutines; 6.2 Scoping and Subroutines; 6.3 Command-Line Arguments and Arrays; 6.4 Passing Data to Subroutines; 6.5 Modules and Libraries of Subroutines; 6.6 Fixing Bugs in Your Code; 6.7 Exercises; Chapter 7: Mutations and Randomization; 7.1 Random Number Generators; 7.2 A Program Using Randomization; 7.3 A Program to Simulate DNA Mutation; 7.4 Generating Random DNA; 7.5 Analyzing DNA; 7.6 Exercises; Chapter 8: The Genetic Code; 8.1 Hashes; 8.2 Data Structures and Algorithms for Biology; 8.3 The Genetic Code; 8.4 Translating DNA into Proteins; 8.5 Reading DNA from Files in FASTA Format; 8.6 Reading Frames; 8.7 Exercises; Chapter 9: Restriction Maps and Regular Expressions; 9.1 Regular Expressions; 9.2 Restriction Maps and Restriction Enzymes; 9.3 Perl Operations; 9.4 Exercises; Chapter 10: GenBank; 10.1 GenBank Files; 10.2 GenBank Libraries; 10.3 Separating Sequence and Annotation; 10.4 Parsing Annotations; 10.5 Indexing GenBank with DBM; 10.6 Exercises; Chapter 11: Protein Data Bank; 11.1 Overview of PDB; 11.2 Files and Folders; 11.3 PDB Files; 11.4 Parsing PDB Files; 11.5 Controlling Other Programs; 11.6 Exercises; Chapter 12: BLAST; 12.1 Obtaining BLAST; 12.2 String Matching and Homology; 12.3 BLAST Output Files; 12.4 Parsing BLAST Output; 12.5 Presenting Data; 12.6 Bioperl; 12.7 Exercises; Chapter 13: Further Topics; 13.1 The Art of Program Design; 13.2 Web Programming; 13.3 Algorithms and Sequence Alignment; 13.4 Object-Oriented Programming; 13.5 Perl Modules; 13.6 Complex Data Structures; 13.7 Relational Databases; 13.8 Microarrays and XML; 13.9 Graphics Programming; 13.10 Modeling Networks; 13.11 DNA Computers; Appendix A: Resources; A.1 Perl; A.2 Computer Science; A.3 Linux; A.4 Bioinformatics; A.5 Molecular Biology; Appendix B: Perl Summary; B.1 Command Interpretation; B.2 Comments; B.3 Scalar Values and Scalar Variables; B.4 Assignment; B.5 Statements and Blocks; B.6 Arrays; B.7 Hashes; B.8 Operators; B.9 Operator Precedence; B.10 Basic Operators; B.11 Conditionals and Logical Operators; B.12 Binding Operators; B.13 Loops; B.14 Input/Output; B.15 Regular Expressions; B.16 Scalar and List Context; B.17 Subroutines and Modules; B.18 Built-in Functions; Colophon;