From f914515cdb80edb1627cb6a6a95ae047e33a65ab Mon Sep 17 00:00:00 2001 From: "Eric S. Raymond" Date: Tue, 8 Jul 1997 21:23:54 +0000 Subject: Initial revision svn path=/trunk/; revision=1161 --- design-notes.html | 382 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 382 insertions(+) create mode 100644 design-notes.html (limited to 'design-notes.html') diff --git a/design-notes.html b/design-notes.html new file mode 100644 index 00000000..8b08efe8 --- /dev/null +++ b/design-notes.html @@ -0,0 +1,382 @@ + + + +Design notes on fetchmail + + + + + +

Design Notes On Fetchmail

+ +Back to Fetchmail Home Page. +
+ +This notes are for the benefit of future hackers and maintainers. +The following sections are both functional and narrative, read from +beginning to end.

+ +

History

+ +A direct ancestor of the fetchmail program was originally authored +(under the name popclient) by Carl Harris . I took +over development in June 1996 and subsequently renamed the program +`fetchmail' to reflect the addition of IMAP support. In early +November 1996 Carl officially ended support for the last popclient +versions.

+ +Before accepting responsibility for the popclient sources from Carl, I +had investigated and used and tinkered with every other UNIX +remote-mail forwarder I could find, including fetchpop1.9, +PopTart-0.9.3, get-mail, gwpop, pimp-1.0, pop-perl5-1.2, popc, +popmail-1.6 and upop. My major goal was to get a header-rewrite +feature like fetchmail's working so I wouldn't have reply problems +anymore.

+ +Despite having done a good bit of work on fetchpop1.9, when I found +popclient I quickly concluded that it offered the solidest base for +future development. I was convinced of this primarily by the presence +of multiple-protocol support. The competition didn't do +POP2/RPOP/APOP, and I was already having vague thoughts of maybe +adding IMAP. (This would advance two other goals: learn IMAP and get +comfortable writing TCP/IP client software.)

+ +Until popclient 3.05 I was simply following out the implications of +Carl's basic design. He already had daemon.c in the distribution, +and I wanted daemon mode almost as badly as I wanted the header +rewrite feature. The other things I added were bug fixes or +minor extensions.

+ +After 3.1, when I put in SMTP-forwarding support (more about this +below) the nature of the project changed -- it became a +carefully-thought-out attempt to render obsolete every other program +in its class. The name change quickly followed.

+ +

The rewrite option

+ +RFC 1123 stipulates that MTAs ought to canonicalize the addresses of +outgoing mail so that From:, To:, Cc:, Bcc: and other address headers +contain only fully qualified domain names. Failure to do so can break +the reply function on many mailers.

+ +This problem only becomes obvious when a reply is generated on a +machine different from where the message was delivered. The +two machines will have different local username spaces, potentially +leading to misrouted mail.

+ +Most MTAs (and sendmail in particular) do not canonicalize address headers +in this way (violating RFC 1123). Fetchmail therefore has to do it. This +is the first feature I added to the ancestral popclient.

+ +

Reorganization

+ +The second thing I did reorganize and simplify popclient a lot. Carl +Harris's implementation was very sound, but exhibited a kind of +unnecessary complexity common to many C programmers. He treated the +code as central and the data structures as support for the code. As a +result, the code was beautiful but the data structure design ad-hoc +and rather ugly (at least to this old LISP hacker).

+ +I was able to improve matters significantly by reorganizing most of the +program around the `query' data structure and eliminating a bunch of +global context. This especially simplified the main sequence in +fetchmail.c and was critical in enabling the daemon mode changes.

+ +

IMAP support and the method table

+ +The next step was IMAP support. I initially wrote the IMAP code +as a generic query driver and a method table. The idea was to have +all the protocol-independent setup logic and flow of control in the +driver, and the protocol-specific stuff in the method table.

+ +Once this worked, I rewrote the POP3 code to use the same organization. +The POP2 code kept its own driver for a couple more releases, until +I found sources of a POP2 server to test against (the breed seems +to be nearly extinct).

+ +The purpose of this reorganization, of course, is to trivialize +the development of support for future protocols as much as possible. +All mail-retrieval protocols have to have pretty similar logical +design by the nature of the task. By abstracting out that common +logic and its interface to the rest of the program, both the common +and protocol-specific parts become easier to understand.

+ +Furthermore, many kinds of new features can instantly be supported +across all protocols by modifying the one driver module.

+ +

Implications of smtp forwarding

+ +The direction of the project changed radically when Harry Hochheiser +sent me his scratch code for forwarding fetched mail to the SMTP port. +I realized almost immediately that a reliable implementation of this +feature would make all the other delivery modes obsolete.

+ +Why mess with all the complexity of configuring an MDA or setting up +lock-and-append on a mailbox when port 25 is guaranteed to be there on +any platform with TCP/IP support in the first place? Especially when +this means retrieved mail is guaranteed to look like normal sender- +initiated SMTP mail, which is really what we want anyway.

+ +Clearly, the right thing to do was (1) hack SMTP forwarding support +into the generic driver, (2) make it the default mode, and (3) eventually +throw out all the other delivery modes.

+ +I hesitated over step 3 for some time, fearing to upset long-time +popclient users dependent on the alternate delivery mechanisms. In +theory, they could immediately switch to .forward files or their +non-sendmail equivalents to get the same effects. In practice the +transition might have been messy.

+ +But when I did it (see the NEWS note on the great options massacre) +the benefits proved huge. The cruftiest parts of the driver code +vanished. Configuration got radically simpler -- no more grovelling +around for the system MDA and user's mailbox, no more worries about +whether the underlying OS supports file locking.

+ +Also, the only way to lose mail vanished. If you specified localfolder +and the disk got full, your mail got lost. This can't happen with +SMTP forwarding because your SMTP listener won't return OK unless +the message can be spooled or processed.

+ +Also, performance improved (though not so you'd notice it in a single +run). Another not insignificant benefit of this change was that the +manual page got a lot simpler.

+ +Later, I had to bring --mda back in order to allow handling of some +obscure situations involving dynamic SLIP. But I found a much simpler +way to do it.

+ +The moral? Don't hesitate to throw away superannuated features when +you can do it without loss of effectiveness. I tanked a couple I'd +added myself and have no regrets at all. As Saint-Exupery said, +"Perfection [in design] is achieved not when there is nothing more to +add, but rather when there is nothing more to take away." This +program isn't perfect, but it's trying.

+ +

The most-requested features that I will never add, and why not:

+ +

1. Password encryption in .fetchmailrc

+ +The reason there's no facility to store passwords encrypted in the +.fetchmailrc file is because this doesn't actually add protection.

+ +Anyone who's acquired the 0600 permissions needed to read your +.fetchmailrc file will be able to run fetchmail as you anyway -- and +if it's your password they're after, they'd be able to rip the +necessary decoder out of the fetchmail code itself to get it.

+ +All .fetchmailrc encryption would do is give a false sense of +security to people who don't think very hard.

+ +

2. Truly concurrent queries to multiple hosts

+ +Occasionally I get a request for this on "efficiency" grounds. These +people aren't thinking either. True concurrency would do nothing to lessen +fetchmail's total IP volume. The best it could possibly do is change the +usage profile to shorten the duration of the active part of a poll cycle +at the cost of increasing its demand on IP volume per unit time.

+ +If one could thread the protocol code so that fetchmail didn't block +on waiting for a protocol response, but rather switched to trying to +process another host query, one might get an efficiency gain (close to +constant loading at the single-host level).

+ +Fortunately, I've only seldom seen a server that incurred significant +wait time on an individual response. I judge the gain from this not +worth the hideous complexity increase it would require in the code.

+ +

Multidrop and alias handling

+ +I decided to add the multidrop support partly because some users were +clamoring for it, but mostly because I thought it would shake bugs out +of the single-drop code by forcing me to deal with addressing in full +generality. And so it proved.

+ +There are two important aspects of the features for handling +multiple-drop aliases and mailing lists which future hackers should be +careful to preserve.

+ +

    +
  1. + The logic path for single-recipient mailboxes doesn't involve header + parsing or DNS lookups at all. This is important -- it means the code + for the most common case can be much simpler and more robust.

    + +

  2. + The multidrop handing does not rely on doing the equivalent of passing + the message to sendmail -oem -t. Instead, it explicitly mines members + of a specified set of local usernames out of the header.

    + +

  3. + We do not attempt delivery to multidrop mailboxes in the presence of DNS + errors. Before each multidrop poll we probe DNS to see if we have a + nameserver handy. If not, the poll is skipped. If DNS crashes during a + poll, the error return from the next nameserver lookup aborts message + delivery and ends the poll. The daemon mode will then quietly spin until + DNS comes up again, at which point it will resume delivering mail.

    +

+ +When I designed this support, I was terrified of doing anything that could +conceivably cause a mail loop (you should be too). That's why the code +as written can only append local names (never @-addresses) to the +recipients list.

+ +The code in mxget.c is nasty, no two ways about it. But it's utterly +necessary, there are a lot of MX pointers out there. It really ought +to be a (documented!) entry point in the bind library.

+ +

DNS error handling

+ +Fetchmail's behavior on DNS errors is to suppress forwarding and +deletion of the individual message that each occurs in, leaving it +queued on the server for retrieval on a subsequent poll. The +assumption is that DNS errors are transient, due to temporary server +outages.

+ +Unfortunately this means that if a DNS error is permanent a message +can be perpetually stuck in the server mailbox. We've had a couple +bug reports of this kind due to subtle RFC822 parsing errors in the fetchmail +code that resulted in impossible things getting passed to the DNS lookup +routines.

+ +Alternative ways to handle the problem: ignore DNS errors (treating +them as a non-match on the mailserver domain), or forward messages +with errors to fetchmail's invoking user in addition to any other +recipients. These would fit an assumption that DNS lookup errors are +likely to be permanent problems associated with an address.

+ +

Lessons learned

+ +

1. Server-side state is essential

+ +The person(s) responsible for removing LAST from POP3 deserve to suffer. +Without it, a client has no way to know which messages in a box have been +read by other means, such as an MUA running on the server.

+ +The POP3 UID feature described in RFC1725 to replace LAST is +insufficient. The only problem it solves is tracking which messages +have been read by this client -- and even that requires +tricky, fragile implementation.

+ +The underlying lesson is that maintaining accessible server-side +`seen' state bits associated with Status headers is indispensible in a +Unix/RFC822 mail server protocol. IMAP gets this right.

+ +

2. Readable text protocol transactions are a Good Thing

+ +A nice thing about the general class of text-based protocols that SMTP, +POP2, POP3, and IMAP belongs to is that client/server transactions are +easy to watch and transaction code correspondingly easy to debug. Given +a decent layer of socket utility functions (which Carl provided) it's +easy to write protocol engines and not hard to show that they're working +correctly.

+ +This is an advantage not to be despised! Because of it, this project has +been interesting and fun -- no serious or persistent bugs, no long +hours spent looking for subtle pathologies.

+ +

3. IMAP is a Good Thing.

+ +If there were a standard IMAP equivalent of the POP3 APOP validation, +POP3 would be completely obsolete.

+ +

4. SMTP is the Right Thing

+ +In retrospect it seems clear that this program (and others like it) +should have been designed to forward via SMTP from the beginning. +This lesson may be applicable to other Unix programs that now call the +local MDA/MTA as a program.

+ +

5. Syntactic noise can be your friend

+ +The optional `noise' keywords in the rc file syntax started out as +a late-night experiment. The English-like syntax they allow is +considerably more readable than the traditional terse keyword-value +pairs you get when you strip them all out. I think there may be a +wider lesson here.

+ +

Motivation and validation

+ +It is truly written: the best hacks start out as personal solutions to +the author's everyday problems, and spread because the problem turns +out to be typical for a large class of users. So it was with Carl Harris +and the ancestral popclient, and so with me and fetchmail.

+ +It's gratifying that fetchmail has become so popular. Until just before +1.9 I was designing strictly to my own taste. The multi-drop mailbox +support and the new --limit option were the first features to go in that +I didn't need myself.

+ +By 1.9, four months after I started hacking on popclient and a month +after the first fetchmail release, there were literally a hundred +people on the fetchmail-friends contact list. That's pretty powerful +motivation. And they were a good crowd, too, sending fixes and +intelligent bug reports in volume. A user population like that is +a gift from the gods, and this is my expression of gratitude.

+ +The beta testers didn't know it at the time, but they were also the +subjects of a sociological experiment. The results are described in +my paper, The Cathedral And The Bazaar, available on the +Fetchmail home page. + +

Credits

+ +Special thanks go to Carl Harris, who built a good solid code base +and then tolerated me hacking it out of recognition. And to Harry +Hochheiser, who gave me the idea of the SMTP-forwarding delivery mode.

+ +Other significant contributors to the code have included Dave Bodenstab +(error.c code and --syslog), George Sipe (--monitor and --interface), +Gordon Matzigkeit (netrc.c), Al Longyear (UIDL support), and Nalin +Dahyabhai (Kerberos V4 support).

+ +

Conclusion

+ +At this point, the fetchmail code appears to be pretty stable. +It will probably undergo substantial change only if and when support +for a new retrieval protocol or authentication method is added.

+ +

Relevant RFCS

+ +Not all of these describe standards explicitly used in fetchmail, but they +all shaped the design in one way or another.

+ +

+
RFC821
SMTP protocol +
RFC822
Mail header format +
RFC937
Post Office Protocol - Version 2 +
RFC974
MX routing +
RFC976
UUCP mail format +
RFC1081
Post Office Protocol - Version 3 +
RFC1123
Host requirements (modifies 821, 822, and 974) +
RFC1176
Interactive Mail Access Protocol - Version 2 +
RFC1203
Interactive Mail Access Protocol - Version 3 +
RFC1225
Post Office Protocol - Version 3 +
RFC1344
Implications of MIME for Internet Mail Gateways +
RFC1413
Identification server +
RFC1428
Transition of Internet Mail from Just-Send-8 to 8-bit SMTP/MIME +
RFC1460
Post Office Protocol - Version 3 +
RFC1521
MIME: Multipurpose Internet Mail Extensions +
RFC1869
SMTP Service Extensions (ESMTP spec) +
RFC1652
SMTP Service Extension for 8bit-MIMEtransport +
RFC1725
Post Office Protocol - Version 3 +
RFC1730
Interactive Mail Access Protocol - Version 4 +
RFC1731
IMAP4 Authentication Mechanisms +
RFC1732
IMAP4 Compatibility With IMAP2 And IMAP2bis +
RFC1734
POP3 AUTHentication command +
RFC1870
SMTP Service Extension for Message Size Declaration +
RFC1891
SMTP Service Extension for Delivery Status Notifications +
RFC1893
Enhanced Mail System Status Codes +
RFC1894
An Extensible Message Format for Delivery Status Notifications +
RFC1939
Post Office Protocol - Version 3 +
RFC1985
SMTP Service Extension for Remote Message Queue Starting +
RFC2060
Internet Message Access Protocol - Version 4rev1 +
RFC2061
IMAP4 Compatibility With IMAP2bis +
RFC2062
Internet Message Access Protocol - Obsolete Syntax +
+ +
+Back to Fetchmail Home Page.

+

Eric S. Raymond <esr@snark.thyrsus.com>
+ + -- cgit v1.2.3