Figure 23.1: INN architecture (simplified for clarity) When (Virtual web hosting)
Figure 23.1: INN architecture (simplified for clarity) When receiving an article, innd first looks up its message ID in the history file. Duplicate articles are dropped and the occurrences are optionally logged. The same goes for articles that are too old or lack some required header field, such as Subject:.136 If innd finds that the article is acceptable, it looks at the Newsgroups: header line to find out what groups it has been posted to. If one or more of these groups are found in the active file, the article is filed to disk. Otherwise, it is filed to the special group junk. Individual articles are kept below /var/spool/news, also called the news spool. Each newsgroup has a separate directory, in which each article is stored in a separate file. The file names are consecutive numbers, so that an article in comp.risks may be filed as comp/risks/217, for instance. When innd finds that the directory it wants to store the article in does not exist, it creates it automatically. Apart from storing articles locally, you may also want to pass them on to outgoing feeds. This is governed by the newsfeeds file that lists all downstream sites along with the newsgroups that should be fed to them. Just like innd’s receiving end, the processing of outgoing news is handled by a single interface, too. Instead of doing all the transport-specific handling itself, innd relies on various backends to manage the transmission of articles to other news servers. Outgoing facilities are collectively dubbed channels. Depending on its purpose, a channel can have different attributes that determine exactly what information innd passes on to it. For an outgoing NNTP feed, for instance, innd might fork the innxmit program at startup, and, for each article that should be sent across that feed, pass its message ID, size, and filename to innxmit’s standard input. For an outgoing UUCP feed, on the other hand, it might write the article’s size and file name to a special logfile, which is head by a different process at regular intervals in order to create batches and queue them to the UUCP subsystem. Besides these two examples, there are other types of channels that are not strictly outgoing feeds. These are used, for instance, when archiving certain newsgroups, or when generating overview information. Overview information is intended to help newsreaders thread articles more efficiently. Old-style newsreaders had to scan all articles separately in order to obtain the header information required for threading. This would put an immense strain on the server machine, especially when using NNTP; furthermore, it was very slow.137 The overview mechanism alleviates this problem by prerecording all relevant headers in a separate file (called .overview) for each newsgroup. This information can then be picked up by newsreaders either by reading it directly from the spool directory, or by using the XOVER command when connected via NNTP. INN has the innd daemon feed all articles to the overchan command, which is attached to the daemon through a channel. We’ll see how this is done when we discuss configuring news feeds later. 136 This is indicated by the Date: header field; the limit is usually two weeks. 137 Threading 1,000 articles when talking to a loaded server could easily take around five minutes, which only the most dedicated Usenet addict would find acceptable.