The network is everywhere. At the office, machines are wired together into local area networks, and the local networks are interconnected via the Internet. At home, personal computers are either intermittently connected to the Internet, or, increasingly, "always-on" cable and DSL modems. New wireless technologies, such as Bluetooth, promise to vastly expand the network realm, embracing everything from cell phones to kitchen appliances.
Such an environment creates tremendous opportunities for innovation. Whole new classes of applications are now predicated on the availability of high-bandwidth, always-on connectivity. Interactive games allow players from around the globe to compete on virtual playing fields and the instant messaging protocols let them broadcast news of their triumphs to their friends. New peer-to-peer systems, such as Napster and Gnutella, allow people to directly exchange MP3 audio files and other types of digital content. The SETI@Home project takes advantage of idle time on the millions of personal computers around the world to search for signs of extraterrestrial life in a vast collection of cosmic noise.
The ubiquity of the network allows for more earthbound applications as well. With the right knowledge, you can write a robot that will fetch and summarize prices from competitors' Web sites; a script to page you when a certain stock drops below a specified level; a program to generate daily management reports and send them off via e-mail; a server that centralizes some number-crunching task on a single high-powered machine, or alternatively distributes that task among the multiple nodes of a computer cluster.
Whether you are searching for the best price on a futon or for life in a distant galaxy, you'll need to understand how network applications work in order to take full advantage of these opportunities. You'll need a working understanding of the TCP/IP protocol--the common denominator for all Internet-based communications and the most common protocol in use in local area networks as well. You'll need to know how to connect to a remote program, to exchange data with that program, and what to do when something goes wrong. To work with existing applications, such as Web servers, you'll have to understand how the application-level protocols are built on top of TCP/IP, and how to deal with common data exchange formats such as XML and MIME.
This book uses the Perl programming language to illustrate how to design and implement practical network applications. Perl is an ideal language for network programming for a number of reasons. First, like the rest of the language, Perl's networking facilities were designed to make the easy things easy. It takes just two lines of code to open a network connection to a server somewhere on the Internet and send it a message. A fully capable Web server can be written in a few dozen lines of code.
Second, Perl's open architecture has encouraged many talented programmers to contribute to an ever-expanding library of useful third-party modules. Many of these modules provide powerful interfaces to common network applications. For example, after loading the LWP::Simple module, a single function call allows you to fetch the contents of a remote Web page and store it in a variable. Other third-party modules provide intuitive interfaces to e-mail, FTP, net news, and a variety of network databases.
Perl also provides impressive portability. Most of the applications developed in this book will run without modification on UNIX machines, Windows boxes, Macintoshes, VMS systems, and OS/2.
However, the most compelling reason to choose Perl for network application development is that it allows you to fully exploit the power of TCP/IP. Perl provides you with full access to the same low-level networking calls that are available to C programs and other natively compiled languages. You can create multicast applications, implement multiplexed servers, and design peer-to-peer systems. Using Perl, you can rapidly prototype new networking applications and develop interfaces to existing ones. Should you ever need to write a networking application in C or Java, you'll be delighted to discover how much of the Perl API carries over into these languages.
This Book's Audience
Network Programming with Perl is written for novice and intermediate Perl programmers. I assume you know the basics of Perl programming, including how to write loops, how to construct if-else statements, how to write regular expression pattern matches, the concept of the automatic
$_ variable, and the basics of arrays and hashes.
You should have access to a Perl interpreter and some experience writing, running, and debugging scripts. Just as important, you should have access to a computer that is connected both to a local area network and to the Internet! Although the recipes in Chapter 10 on setting Perl-based network servers to start automatically when a machine is booted do require superuser (administrative) access, none of the other examples require privileged access to a machine.
This book does take advantage of the object-oriented features in Perl version 5 and higher, but most chapters do not assume a deep knowledge of this system. Chapter 1 addresses all the details you will need as a casual user of Perl objects.
This book is a thorough review of the TCP/IP protocol at the lowest level, or a guide to installing and configuring network hubs, routers, and name servers. Many good books on the mechanics of the TCP/IP protocol and network administration are listed in the references in Appendix D.
Roadmap
This book is organized into four main parts, Basics, Developing Cients for Common Services, Developing TCP Client/Server Systems, and Advanced Topics.
Part I, Basics, introduces the fundamentals of TCP/IP network communications.
- Chapters 1 and 2, Networking Basics and Processes, Pipes, and Signals, review Perl's functions and variables for input and output, discuss the exceptions that can occur during I/O operations, and use the piped filehandle as the basis for introducing sockets. These chapters also review Perl's process model, including signals and forking, and introduce Perl's object-oriented extensions.
- Chapter 3, Introduction to Berkeley Sockets, discusses the basics of Internet networking and describes IP addresses, network ports, and the principles of client/server applications. It then turns to the Berkeley Socket API, which provides the programmer's interface to TCP/IP.
- Chapters 4 and 5, The TCP Protocol and The IO::Socket API and Simple TCP Applications, show the basics of TCP, the networking protocol that provides reliable stream-oriented communications. These chapters demonstrate how to create client and server applications and then introduce examples that show the power of technique as well as some common roadblocks.
Part II, Developing Clients for Common Services, looks at a collection of the best third-party modules that developers have contributed to the Comprehensive Perl Archive Network (CPAN).
- Chapter 6, FTP and Telnet, introduces modules that provide access to the FTP file-sharing service, as well as to the flexible Net::Telnet module which allows you to create clients to access all sorts of network services.
- E-mail is still the dominant application on the Internet, and Chapter 7, SMTP: Sending Mail, introduces half of the equation. This chapter shows you how to create e-mail messages on the fly, including binary attachments, and send them to their destinations.
- Chapter 8, POP, IMAP, and NNTP: Processing Mail and Netnews, covers the other half of e-mail, explaining modules that make it possible to receive mail from mail drop systems and process their contents, including binary attachments.
- Chapter 9, Web Clients, discusses the LWP module, which provides everything you need to talk to Web servers, download and process HTML documents, and parse XML.
Part III, Developing TCP Client/Server Systems--the longest part of the book--discusses the alternatives for designing TCP-based client/server systems. The major example used in these chapters is an interactive psychotherapist server, based on Joseph Weizenbaum's classic Eliza program.
- Chapter 10, Forking Servers and the inetd Daemon, covers the common type of TCP server that forks a new process to handle each incoming connection. This chapter also covers the UNIX and Windows inetd daemons, which allow programs not specifically designed for networking to act as servers.
- Chapter 11, Multithreaded Applications, explains Perl's experimental multithreaded API, and shows how it can greatly simplify the design of TCP clients and servers.
- Chapters 12 and 13, Multiplexed Operations and Nonblocking I/O, discuss the
select()
call, which enables an application to process multiple I/O streams concurrently without using multiprocessing or multithreading. - Chapter 14, Bulletproofing Servers, discusses techniques for enhancing the reliability and maintainability of network servers. Among the topics are logging, signal handling, and exceptions, as well as the important topic of network security.
- Chapter 15, Preforking and Prethreading, presents the forking and threading models discussed in earlier chapters. These enhancements increase a server's ability to perform well under heavy loads.
- Chapter 16, IO::Poll, discusses an alternative to select() available on UNIX platforms. This module allows applications to multiplex multiple I/O streams using an API that some people find more natural than select()'s.
Part IV, Advanced Topics, addresses techniques that are useful for specialized applications.
- Chapter 17, TCP Urgent Data, is devoted to TCP urgent or "out of band" data. This technique is often used in highly interactive applications in which the user urgently needs to signal the remote server.
- Chapters 18 and 19, The UDP Protocol and UDP Servers, introduce the User Datagram Protocol, which provides lightweight, message-oriented communications service. Chapter 18 introduces the protocol, and Chapter 19 shows how to design UDP servers. The major example in this and the next two chapters contain a live online chat and messaging system written entirely in Perl.
- Chapters 20 and 21, Broadcasting and Multicasting, extend the UDP discussion by showing how to build one-to-all and one-to-many message broadcasting systems. In these chapters we extend the chat system to take advantage of automatic server discovery and multicasting.
- Chapter 22, UNIX-Domain Sockets, shows how to create lightweight communications channels between processes on the same machine. This can be useful for specialized applications such as loggers.
The Many Versions of Perl
All good things evolve to meet changing conditions, and Perl has gone through several major changes in the course of its short life. This book was written for versions of Perl in the 5.X series (5.003 and higher recommended). At the time I wrote this preface (August 2000), the most recent version of Perl was 5.6, with the release of 5.7 expected imminently. I expect that Perl versions 5.8 and 5.9 (assuming there will be such versions) will be compatible with the code examples given here as well.
Over the horizon, however, is Perl version 6. Version 6, which is expected to be in early alpha form by the summer of 2001, will fix many of the idiosyncrasies and misfeatures of earlier versions of Perl. In so doing, however, it is expected to break most existing scripts. Fortunately, the Perl language developers are committed to developing tools to automatically port existing scripts to version 6. With an eye to this, I have tried to make the examples in this book generic, avoiding the more obscure Perl constructions.
Cross-Platform Compatibility
More serious are the differences between implementations of Perl on various operating systems. Perl started out on UNIX (and Linux) systems, but has been ported to many different operating systems, including Microsoft Windows, the Macintosh, VMS, OS/2, Plan9, and others. A script written for the Windows platform will run on UNIX or Macintosh without modifications.
The problem is that the I/O subsystem (the part of the system that manages input and output operations) is the part that differs most dramatically from operating system to operating system. This restricts the ability of Perl to make its I/O system completely portable. While Perl's basic I/O functionality is identical from port to port, some of the more sophisticated operations are either missing or behave significantly differently on non-UNIX platforms. This affects network programming, of course, because networking is fundamentally about input and output.
In this book, Chapters 1 through 9, use generic networking calls that will run on all platforms. The exception to this rule is the last example in Chapter 5, which calls a function that isn't implemented on the Macintosh, fork(), and some of the introductory discussion in Chapter 2 of process management on UNIX systems. The techniques discussed in these chapters are all you need for the vast majority of client programs, and are sufficient to get a simple server up and running. Chapters 10 through 22 deal with more advanced topics in server design.
The nice thing is that the non-UNIX ports of Perl are improving rapidly, and there is a good chance that new features will be available at the time you read this.
Getting the Code for the Code Examples
All the sample scripts and modules discussed in this book are available on the Web in ZIP and TAR/GZIP formats. The URL for downloading the source is http://www.modperl.com/perl_networking. This page also includes instructions for unpacking and installing the source code.
Installing Modules
Many of Perl's networking modules are preinstalled in the standard distribution. Others are third-party modules that you must download and install from the Web. Most third-party modules are written in pure Perl, but some, including several that are mentioned in this book, are written partly in C and must be compiled before they can be used.
CPAN is a large Web-based collection of contributed Perl modules. You can get access to it via a Web or FTP browser, or by using a command-line application built into Perl itself.
Installing from the Web
To find a CPAN site near you, point your Web browser at http://www.cpan.org/. This will present a page that allows you to search for specific modules, or to browse the entire list of contributed modules sorted in various ways. When you find the module you want, download it to disk.
Perl modules are distributed as gzipped tar archives. You can unpack them like this: % gunzip -c Digest-MD5-2.00.tar.gz tar xvf -
Digest-MD5-2.00/
Digest-MD5-2.00/typemap
Digest-MD5-2.00/MD2/
Digest-MD5-2.00/MD2/MD2.pm
...
Once the archives are unpacked, you'll enter the newly created directory and give the perl Makefile.PL, make, make test, and make install commands. These will build, test, and install the module. % cd Digest-MD5-2.00
% perl Makefile.PL
Testing alignment requirements for U32...
Checking if your kit is complete...
Looks good
Writing Makefile for Digest::MD2
Writing Makefile for Digest::MD5
% make
mkdir ./blib
mkdir ./blib/lib
mkdir ./blib/lib/Digest
...
% make test
make1: Entering directory '/home/lstein/Digest-MD5-2.00/MD2'
make1: Leaving directory '/home/lstein/Digest-MD5-2.00/MD2'
PERL_DL_NONLAZY=1 /usr/local/bin/perl -I./blib/arch -I./blib/lib...
t/digest............ok
t/files.............ok
t/md5-aaa...........ok
t/md5...............ok
t/rfc2202...........ok
t/sha1..............skipping test on this platform
All tests successful.
Files=6, Tests=291, 1 secs ( 1.37 cusr 0.08 csys = 1.45 cpu)
% make install
make1: Entering directory '/home/lstein/Digest-MD5-2.00/MD2'
make1: Leaving directory '/home/lstein/Digest-MD5-2.00/MD2'
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.so
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.bs
...
On UNIX systems, you may need superuser privileges to perform the final step. If you don't have such privileges, you can install the modules in your home directory. At the perl Makefile.PL step, provide a PREFIX= argument with the path of your home directory. For example, assuming your home directory can be found at /home/jdoe, you would type: % perl Makefile.PL PREFIX=/home/jdoe
The rest of the install procedure is identical to what was shown earlier.
If you are using a custom install directory, you must tell Perl to look in this directory for installed modules. One way to do this is to add the name of the directory to the environment variable PERL5LIB
. For example: setenv PERL5LIB /home/jdoe # C shell
PERL5LIB=/home/jdoe; export PERL5LIB # bourne shell
Another way is to place the following line at the top of each script that uses an installed module. use lib '/home/jdoe';
Installing from the Command Line
A simpler way to do the same thing is to use Andreas Koenig's wonderful CPAN shell. With it, you can search, download, build, and install Perl modules from a simple command-line shell. The install command does it all: % perl -MCPAN -e shell cpan shell -- CPAN exploration and modules installation (v1.40)
ReadLine support enabled
cpan> install MD5
Running make for GAAS/Digest-MD5-2.00.tar.gz
Fetching with LWP: ftp://ftp.cis.ufl.edu/pub/perl/CPAN/authors/id/GAAS/Digest-MD5-2.00.tar.gz
CPAN: MD5 loaded ok
Fetching with LWP: ftp://ftp.cis.ufl.edu/pub/perl/CPAN/authors/id/GAAS/CHECKSUMS
...
Checksum for /home/lstein/.cpan/sources/authors/id/GAAS/Digest-MD5-2.00.tar.gz
ok Digest-MD5-2.00/
Digest-MD5-2.00/typemap
...
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.so
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.bs
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/MD5/MD5.so
...
Writing /usr/local/lib/perl5/site_perl/i586-linux/auto/MD5/.packlist
Appending installation info to /usr/local/lib/perl5/i586-linux/5.00404/perllocal.pod
cpan> exit
Installing Modules with the Perl Package Manager
These examples all assume that you have UNIX-compatible versions of the gzip, tar, and make commands. Virgin Windows systems do not have these utilities. The Cygwin package, available from http://www.cygnus.com/cygwin/, provides these utilities as part of a complete set of UNIX-compatible tools.
It is easier, however, to use the ActiveState Perl Package Manager (PPM). This Perl script is installed by default in the ActiveState distribution of Perl, available at http://www.activestate.com. Its interface is similar to the command-line CPAN interface shown in the previous section, except that it can install precompiled binaries as well as pure-Perl scripts. For example: C:\WINDOWS> ppm
PPM interactive shell (1.1.3) - type 'help' for available commands.
PPM> install MD5
Install package 'MD5?' (y/N) : Y
Retrieving package 'MD5'
Installing C:\Perl\site\lib\auto\MD5\MD5.bs
Installing C:\Perl\site\lib\auto\MD5\MD5.dll
Installing C:\Perl\site\lib\auto\MD5\MD5.exp
Installing C:\Perl\site\lib\auto\MD5\MD5.lib
Installing C:\Perl\site\lib\MD5.pm
Installing C:\Perl\site\lib\auto\MD5\autosplit.ix
Writing C:\Perl\sitelib\auto\MD5\.packlist
PPM> exit
Quit!
C:\WINDOWS>
Installing Modules from MacPerl
The MacPerl Module Porters site, http://pudge.net/cgi-bin/mmp.plx, contains a series of modules that have been ported for use in MacPerl. A variety of helper programs have been developed to make module installation easier on the Macintosh. The packages are described at http://pudge.net/macperl/macperlmodinstall.html, which also gives instructions on downloading and installing them.
Online Documentation
In addition to books and Web sites, Network Programming with Perl refers to two major sources of online information, Internet RFCs and Perl POD documentation.
Internet RFCs
The specifications of all the fundamental protocols of the Internet are described in a series of Requests for Comment (RFC) submitted to the Internet Engineering Task Force (IETF). These documents are numbered sequentially. For example RFC 1927--"Suggested Additional MIME Types for Associating Documents"--was the 1927th RFC submitted. Some of these RFCs eventually become Internet Standards, in which case they are given sequentially numbered STD names. However, most of them remain RFCs. Even though the RFCs are unofficial, they are the references that people use to learn the details of networking protocols and to validate that a particular implementation is correct.
The RFC archives are mirrored at many locations on the Internet, and maintained in searchable form by several organizations. One of the best archives is maintained at http://www.faqs.org/rfcs/. To retrieve an RFC from this site, go to the indicated page and type the number of the desired RFC in the text field labeled "Display the document by number." The document will be delivered in a minimally HTMLized form. This page also allows you to search for standards documents, and to search the archive by keywords and phrases. If you prefer a text-only form, the www.faqs.org site contains a link to their FTP site, where you can find and download the RFCs in their original form.
Plain Old Documentation
Much of Perl's internal documentation comes in Plain Old Documentation (POD) format. These are mostly plain text, with a few markup elements inserted to indicate headings, subheadings, and itemized lists.
When you installed Perl, the POD documentation was installed as well. The POD files are located in the pod subdirectory of the Perl library directory. You can either read them directly, or use the perldoc script to format and display them in a text pager such as more.
To use perldoc type the command and the name of the POD file you wish to view. The best place to start is the Perl table of contents, perltoc: % perldoc perltoc
This will give you a list of other POD pages that you can display.
For a quick summary of a particular Perl function, perldoc accepts the -f flag. For example, to see a summary of the socket() function, type: % perldoc -f socket
For Macintosh user's the MacPerl distribution comes with a "helper" application called shuck. This adds POD viewing facilities to the MacPerl Help menu.
0201615711P04062001