Synopses & Reviews
This book shows you how to use two Unix utilities, lex andyacc, in program development. These tools help programmers build compilers and interpreters, but they also have a wider range of applications.The second edition contains completely revised tutorial sections for novice users and reference sections for advanced users. This edition is twice the size of the first and has an expanded index.The following material has been added:
- Each utility is explained in a chapter that covers basic usage and simple, stand-alone applications
- How to implement a full SQL grammar, with full sample code
- Major MS-DOS and Unix versions of lex and yacc are explored in depth, including AT&T lex and yacc, Berkeley yacc, Berkeley/GNU Flex, GNU Bison, MKS lex andyacc, and Abraxas PCYACC
Synopsis
Shows programmers how to use two UNIX utilities, lex and yacc, in program development. This second edition contains completely revised tutorial sections for novice users and reference sections for advanced users. Twice the size of the original book, this edition features an expanding index; an explanation of each utility that covers basic usage and simple, stand-alone applications; and more.
Description
Includes bibliographical references (p. 339-340) and index.
About the Author
Gregory Satir helps develop online publishing tools in the Portland, Oregon, office of Electronic Book Technologies. He graduated with a B.S. in computer science from Brown University. Doug Brown is a consultant/contractor in Beaverton, Oregon. He has been developing software for circuit simulation, synthesis, and testing since 1977. Doug coauthored lex & yacc, another O'Reilly & Associates Nutshell Handbook. He received an M.S. in electrical engineering from the University of Illinois at Urbana-Champaign in 1976.
John R. Levine writes, lectures, and consults on Unix and compiler topics. He moderates the online comp.compilers discussion group at Usenet. He worked on Unix versions Lotus 1-2-3 and the Norton Utilities and was one of the architects of AIX for the IBM RT PC. He received a Ph.D in computer science from Yale in 1984.
Tony Mason is currently a member of the AFS development team at Transarc Corporation, a small start-up company specializing in distributed systems software. Previously, he worked with the Distributed Systems Group at Stanford University in the area of distributed operating systems and data communications. He received a B.S. in mathematics from the University of Chicago in 1987.
Table of Contents
Table of Contents
Preface
What's New in the Second Edition
Scope of This Book
Availability of Lex and Yacc
Sample Programs
Conventions Used in This Handbook
Acknowledgments
1. Lex and Yacc
The Simplest Lex Program
Recognizing Words with Lex
Symbol Tables
Grammars
Parser-Lexer Communication
The Parts of Speech Lexer
A Yacc Parser
The Rules Section
Running Lex and Yacc
Lex vs. Hand-written Lexers
Exercises
2. Using Lex
Regular Expressions
Examples of Regular Expressions
A Word Counting Program
Parsing a Command Line
Start States
A C Source Code Analyzer
Summary
Exercises
3. Using Yacc
Grammars
Recursive Rules
Shift/Reduce Parsing
What Yacc Cannot Parse
A Yacc Parser
The Definition Section
The Rules Section
Symbol Values and Actions
The Lexer
Compiling and Running a Simple Parser
Arithmetic Expressions and Ambiguity
When Not to Use Precedence Rules
Variables and Typed Tokens
Symbol Values and 0nion
Symbol Tables
Functions and Reserved Words
Reserved Words in the Symbol Table
Interchangeable Function and Variable Names
Building Parsers with Make
Summary
Exercises
4. A Menu Generation Language
Overview of the MGL
Developing the MGL
Building the MGL
Initialization
Screen Processing
Termination
Sample MGL Code
Exercises
5. Parsing SQL
A Quick Overview of SQL
Relational Data Bases
Manipulating Relations
Three Ways to Use SQL
The Syntax Checker
The Lexer
Error and Main Routines
The Parser
Definitions
Top Level Rules
The Schema Sublanguage
The Module Sublanguage
The Manipulation Sublanguage
Odds and Ends
Using the Syntax Checker
Embedded SQL
Changes to the Lexer
Changes to the Parser
Auxiliary Routines
Using the Preprocessor
Exercises
6. A Reference for Lex Specifications
Structure of a Lex Specification
Definition Section
Rules Section
User Subroutines
BEGIN
Bugs
Ambiguous Lookahead
AT&T Lex
Flex
Character Translations
Context Sensitivity
Left Context
Right Context
Definitions (Substitutions)
ECHO
Include Operations (Logical Nesting of Files)
File Chaining with yywrap()
File Nesting
Input from Strings
AT&T Lex
Flex
Abraxas Pclex
MKS Lex
POSIX Lex
input()
Internal Tables (N Declarations)
lex Library
main()
Other Library Routines
Line Numbers and yylineno
Literal Block
Multiple Lexers in One Program
Combined Lexers
Multiple Lexers
output()
Portability of Lex Lexers
Porting Lex Specifications
Porting Generated C Lexers
Regular Expression Syntax
Metacharacters
POSIX Extensions
REJECT
Returning Values from yylex()
Start States
unput()
yyinput(), yyoutput(), yyunput()
yyleng
yyless()
yylex()
User Code in yylex()
yymore()
yytext
Enlarging yytext
yywrap()
7. A Reference for Yacc Grammars
Structure of a Yacc Grammar
Symbols
Definition Section
Rules Section
User Subroutines Section
Actions
Embedded Actions
Symbol Types for Embedded Actions
Obsolescent Feature
Ambiguity and Conflicts
Types of Conflicts
Bugs in Yacc
Real Bugs
Infinite Recursion
Unreal Bugs
End Marker
Error Token and Error Recovery
423936dent Declaration
Inherited Attributes ($0)
Symbol Types for Inherited Attributes
Lexical Feedback
Literal Block
Literal Tokens
Portability of Yacc Parsers
Porting Yacc Grammars
Porting Generated C Lexers
Precedence, Associativity, and Operator Declarations
Precedence and Associativity
Operator Declarations
Using Precedence and Associativity to Resolve Conflicts
Typical Uses of Precedence
Recursive Rules
Left and Right Recursion
Rules
Special Characters
Start Declaration
Symbol Values
Declaring Symbol Types
Example
Explicit Symbol Types
Tokens
Token Numbers
Token Values
type Declaration
0nion Declaration
Variant and Multiple Grammars
Combined Parsers
Multiple Parsers
Recursive Parsing
Lexers for Multiple Parsers
y.output Files
Yacc Library
main()
yyerror()
YYABORT
YYACCEPT
YYBACKUP
yyclearin
yydebug and YYDEBUG
YYDEBUG
yydebug
yyerrok
YYERROR
yyerror()
yyparse()
YYRECOVERING()
8. Yacc Ambiguities and Conflicts
The Pointer Model and Conflicts
Types of Conflicts
Parser States
Contents of y.output
Review of Conflicts in y.output
Common Examples of Conflicts
Expression Grammars
IF-THEN-ELSE
Nested List Grammars
How Do I Fix the Conflict?
IF-THEN-ELSE (Shift/Reduce)
Loop Within a Loop (Shift/Reduce)
Expression Precedence (Shift/Reduce)
Limited Lookahead (Shift/Reduce or Reduce/Reduce)
Overlap of Alternatives (Reduce/Reduce)
Summary
Exercises
9. Error Reporting and Recovery
Error Reporting
Better Lex Error Reports
Error Recovery
Yacc Error Recovery
Where to Put Error Tokens
Compiler Error Recovery
Exercises
A. AT&T Lex
Error Messages
B. AT&T Yacc
Options
Error Messages
C. Berkeley Yacc
Options
Error Messages
Fatal Errors
Regular Errors
Warnings
Informative Messages
D. GNU Bison
Differences
E. Flex
Flex Differences
Options
Error Messages
Flex Versions of Lexer Examples
F. MKS lex and yacc
Differences
New Features
G. Abraxas lex and yacc
Differences
New Features
H. POSIX lex and yacc
Options
Differences
I. MGL Compiler Code
MGL Yacc Source
MGL Lex Source
Supporting C Code
J. SQL Parser Code
Yacc Parser
Cross-reference
Lex Scanner
Supporting Code
Glossary
Bibliography
Index
Figures
3. Using Yacc
3-1 A parse tree
3-2 A parse using recursive rules
3-3 Ambiguous input 2+3x4
5. Parsing SQL
5-1 Two relational tables
8. Yacc Ambiguities and Conflicts
8-1 Ambiguous input expr - expr - expr
Examples
1. Lex and Yacc
1-1 Word recognizer ch1-02.l
1-2 Lex example with multiple parts of speech ch1-03.l
1-3 Lexer with symbol table (part 1 of 3) ch1-04.l
1-4 Lexer with symbol table (part 2 of 3) ch1-04.l
1-5 Lexer with symbol table (part 3 of 3) ch1-04.l
1-6 Lexer to be called from the parser ch1-05.l
1-7 Simple yacc sentence parser ch1-05.y
1-8 Extended English parser ch1-06.y
1-9 A lexer written in C
1-10 The same lexer written in lex
2. Using Lex
2-1 Lex specification for decimal numbers
2-2 User subroutines for word count program ch2-02.l
2-3 Multi-file word count program ch2-03.l
2-4 Lex specification to parse command-line input ch2-04.l
2-5 Lex specification to parse a command line ch2-05.l
2-6 Lex command scanner with filenames ch2-06.l
2-7 Start state example ch2-07.l
2-8 Broken start state example ch2-08.l
2-9 C source analyzer ch2-09.l
3. Using Yacc
3-1 The calculator grammar with expressions and precedence ch3-02.y
3-2 Calculator grammar with variables and real values ch3-03.y
3-3 Lexer for calculator with variables and real values ch3-03.l
3-4 Header for parser with symbol table ch3hdr.h
3-5 Rules for parser with symbol table ch3-04.y
3-6 Symbol table routine ch3-04.pgm
3-7 Lexer with symbol table ch3-04.l
3-8 Final calculator header ch3hdr2.h
3-9 Rules for final calculator parser ch3-05.y
3-10 User subroutines for final calculator parser ch3-05.y
3-11 Final calculator lexer ch3-05.l
3-12 Makefile for the calculator
4. A Menu Generation Language
4-1 First version of MGL lexer
4-2 First version of MGL parser
4-3 Grammar with items and actions
4-4 Grammar with command identifiers
4-5 Grammar with titles
4-6 Complete MGL grammar
4-7 MGL lex specification
4-8 Alternative lex specification
4-9 MGL main() routine
4-10 Screen end code
5. Parsing SQL
5-1 Example of SQL module language
5-2 Example of embedded SQL
5-3 The first SQL lexer
5-4 Definition section of first SQL parser
5-6 Schema sublanguage, top part
5-7 Schema sublanguage, base tables
5-8 Schema view definitions
5-9 Schema privilege definitions
5-10 Cursor definition
5-11 Manipulation sublanguage, top part
5-12 Simple manipulative statements
5-13 FETCH statement
5-14 INSERT statement
5-15 DELETE statement
5-16 UPDATE statement
5-17 Scalar expressions
5-18 SELECT statement, query specifications and expressions
5-19 Table expressions
5-20 Search conditions
5-21 Conditions for embedded SQL
5-22 Makefile for SQL syntax checker
5-23 Definitions in embedded lexer
5-24 Embedded lexer rules
5-25 Highlights of embedded SQL text support routines
5-26 Output from embedded SQL preprocessor
6. A Reference for Lex Specifications
6-1 Taking flex input from a string
E. Flex
E-1 Flex specification to parse a command line ape-05.l
E-2 Flex command scanner with filenames ape-06.l