Why does a message get split into mulitple messages with no headers?

If you are processing UUCP mailbox files, messages are separated by a line starting with "From " (ie. The word "From" followed by a space). Some mail software will prefix lines in message bodies with a `>' to avoid MUA's from incorrectly treating the line as a message separator. However, some mail software doesn't.

To avoid incorrect separator detection, many MUAs perform a more stricter detection of separators beyond "From ". MHonArc, by default, will treat lines starting with "From " as a message separator, which can lead to incorrect message termination if the From line has not been escaped with a `>'.

To fix the problem, use the MSGSEP resource to instruct MHonArc to use a stricter test detecting a message separator. The following MSGSEP resource setting is known to work well:

^From \S+\s+\S+\s+\S+\s+\d+\s+\d+:\d+:\d+\s+\d+

If this fails, you can try the CONLEN resource available in v2.0 and later. The CONLEN resource, when set, tells MHonArc to utilize the Content-Length fields in the message head. If your MTA defines this field accurately, then you can utilize this feature.

Can I move a message from one archive to another?

No. In order to achieve the same effect, you must add the original, unprocessed, message to the destination archive, then remove the appropriate HTML version of the message from the source archive.

Can I reconstruct a database from the HTML messages?

Yes. v2.3 of MHonArc introduced a utility program called mha-dbrecover. It gets installed with the other MHonArc files during the installation process. See the documentation for usage information.

Is it safe to add messages to an archive as they are received?

Yes. MHonArc performs archive locking to protect from multiple MHonArc process attempting to write to an archive at the same time. This locking allows MHonArc to safely be used to add messages as they are received.


As an archive increases in size, performing updates as a message is received takes more processing time. Therefore, for large archives, you may need to do updates through a periodic batch process (like via cron(8)) to avoid time-out problems from MHonArc.

So it is safe. How do I do it??

Many users use Procmail <URL:http://www.ii.com/internet/robots/procmail/> to call MHonArc to archive messages. Procmail provides the ability to preprocess mail as it arrives to do selective processing and automated tasks with your mail.

For illustrative purposes, the following simple example shows a possible way of archiving messages as it arrives w/o using a tool like Procmail. This example assumes you are on a Unix-based system using sendmail as the mail transfer agent. Please refer to documentation about sendmail if you are not familiar with it (sendmail, 2ed, from O'Reilly is an excellent source).

The approach shown here uses a .forward file in the home directory of the account you want mailed archived. For this example, let's assume it is my account. Here is how to set up the .forward file to invoke MHonArc on incoming mail:

\ehood, "|/home/ehood/bin/webnewmail #ehood"
NOTE on .forward entry:

The "\ehood" tells sendmail to still deposit the incoming message to my mail spool file. The "#ehood" Bourne shell comment is needed to insure the command is unique from another user. Otherwise, sendmail may not invoke the program for you or the other user.

webnewmail is a Perl program that calls MHonArc with the appropriate arguments. A wrapper program is used instead of calling MHonArc directly to keep the .forward file simple, but you can call MHonArc directly if you want. Here is the code to the webnewmail program:

# Edit above path to point to where perl is on your system.

##	Specify a package to protect names from MHonArc.

package WebNewMail;

##	Edit to point to installed mhonarc.

$MHonArc = "/home/ehood/bin/mhonarc";

##	Define ARGV (ARGV is same across all packages).
##	Edit options as required/desired.

@ARGV = ("-add",
	 "-outdir", "/home/ehood/public_html/newmail");

##	Just require mhonarc, this prevents the overhead of a
##	fork/exec.

require $MHonArc;

The webnewmail program has to have the executable bit set. This is achieved by using "chmod a+x webnewmail".

How can I do it with Majordomo lists?

Here is a template for archiving messages as they arrive for a Majordomo list to include in your sendmail.cf:

xxxx:                "|/usr/lib/majordomo/wrapper resend -l xxxx xxxx-outgoing"
xxxx-outgoing:       :include:/var/lib/majordomo/lists/xxxx, xxxx-mhonarc
xxxx-request:        list-admin-address
owner-xxxx:          list-admin-address
xxxx-owner:          list-admin-address

xxxx-mhonarc:        "|/usr/lib/majordomo/wrapper mhonarc -add -quiet -outdir /home/httpd/html/yyyyyyy -rcfile rcs.mrc" 

Replace text that is rendered like this with what is appropriate for your configuration.

Can I get MHonArc to filter messages to different archives?

No. This is outside of the MHonArc's scope. You can grow your own filter, using the method described in the previous question, to scan the message header an invoke MHonArc with the proper arguments. Or. you can use a tool like Procmail <URL:http://www.ii.com/internet/robots/procmail/>. Here are a some messages from users about using Procmail:

... some text deleted ...

Here is what I use in .procmailrc to archive the mhonarc list:

NEWDATE="`/usr/bin/date +%Y-%m`"
* ^Sender:.*owner-mhonarc@
        :0 c

        :0 c
        | /local/mail/mhonarc-1.2.2/mailarchive -add mhonarc "$NEWDATE"

Mailarchive is nothing more than a wrapper around mhonarc with my long.
list of options.

P.S. Procmail itself comes with an example manual page. It's worth
     looking into it.

You can actually dispense with the wrapper if you use environment
variables to pass options to MHonArc, but I'm sure Achim has a good
reason for doing it his way.  Just for the purposes of comparion,
here's how I do it:

eeeweb% cat .procmailrc
#Set on when debugging
#Replace `mail' with your mail directory (Pine uses mail, Elm uses Mail)
#Directory for storing procmail log and rc files
#Path and options for mhonarc
MHONARC='/dcs/packages/infosys/bin/mhonarc -add -quiet -umask 022 -idxfname inde
* ^Originator:.*@classes.uci.edu
:0 E

and then in the file .procmail/rc.classlists or rc.otherlists (depending
on the Originator: of the message), lots of the following:

# Procmail Entry for uci-www
:0 E
* ^TOuci-www
  :0 c

  |$MHONARC -rcfile $MHHOME/uci-www/0-rcfile.html -outdir $MHHOME/uci-www

Eric D. Friedman
... some text deleted ...

I use procmail to drive mhonarc archives from Majordomo.  I set up a
single pseudouser and drive several archives from the one pseudouser. 

Here's a sample .forward file:

"|/usr/ucb/rsh cappuccino \"set IFS=' '; exec
/usr/local/procmail/bin/procmail #widget\""

Another example is:

"|/bin/csh -c \"set IFS=' '; exec /usr/local/procmail/bin/procmail

Two reasons to use the "rsh cappuccino":
1. doesn't require the user to be able to login to server, although
   the username must still be valid
2. gets the processing load off the mail server

Here's an example .procmail recipe:


# widget: list short description
:0 H
* ^List-Name: widget
  # The rotate call (under construction) does archive rotation
  # leave commented!
  #:0c i
  #| /home/web-arch/bin/rotate /usr/local/web/webarchive/widget

  # Put the mail in the mailbox, which is used by archiver to re-generate
  # the html indexes
  :0 cA

  # The mhonarc call examines mbox, turns the mail messages into .html
  # documents, and compiles the indexes.
  # -reverse -treverse\
  :0 ia
  | /usr/local/mhonarc/bin/mhonarc \
    -idxfname index.shtml \
    -tidxfname threads.shtml \
    -rcfile widget.rc\
    -outdir /usr/local/web/webarchive/widget/current \


I have a directory per archive, and put the current period in directory
"current".  Then I have an index page per archive that indexes the
periods, plus gives information about the list and how to
subscribe/unsubscribe.  The widget.rc file resides in the pseudouser's
home directory.

Note the 
* ^List-Name: widget
I put the following in the majordomo list's config file:

message_headers   <<  END
List-Name: widget

This adds the "List-Name" header to messages, which is what procmail
filters for.

Hope this helps

Paul McKinley
Unix SysAdmin Contractor

Does MHonArc support the "no archive" flag in messages?

No. However, you can use a pre-processor like Procmail to do the filtering. Here is a message sent to the MHonArc mailing list:

> Subscribers who don't want their messages to be archived
> could add a "no archive" flag within their mail.

The most common way to do this is by checking for the existence
of an 'X-no-archive: yes' or 'Restrict: no-external-archive' header.

> As I'm invoking MHonArc through a procmail recipe I guess
> it's possible to do this within the recipe.

Very easy:

   # If people don't want to be archived, then remove their
   # message
   * ^(X-no-archive: yes|Restrict: no-external-archive)


Is it safe to specify -add when no archive exists?

Yes. If MHonArc sees no archive exists when perform an add, it will automatically create the archive.


Make sure the file maillist.html (or the value of the IDXFNAME resource) does not exist if no archive exists and -add has been specified. Otherwise, unpredictable output of the maillist.html file may result if maillist.html is not in the proper format.

Why are there "jumps" in message numbers?

Big gaps in the message number sequence may occur if you defined the MAXSIZE resource and you have MHonArc rescanning a mail folder for adding new messages. The problem occurs when MHonArc reads in messages that will automatically get deleted due to MAXSIZE. Ie. Messages subject to automatic deletion are the oldest ones. If the input contains old messages that will get deleted at the end of processing, the old messages will still use up message numbers since messages to be deleted are not determined until all input is read. Since MHonArc does not keep information about deleted messages, if the messages are fed into MHonArc again, the "jumping" will occur again (and the jump will get larger for each additional update).

To avoid the problem, try to pass only new, never processed, messages to MHonArc instead of having MHonArc rescanning the same mail folder for new messages. Another approach is to set either the EXPIREAGE or EXPIREDATE resources (available in v2.0 beta 2, or later). These work as an alternative to MAXSIZE and will help in preventing message number jumping since expiration of a message is checked when it is initially read (bypassing the assignment of a message number).

Why do some messages get re-added each time MHonArc processes a mail folder?

This condition may occur when you have MHonArc examine the same folder periodically to add any new message. If there are messages in the folder without message-ids, then those messages will be re-added each time MHonArc runs.

Why? Well, MHonArc uses message-ids for determining if a message has been archived, or not. Therefore, if a message-id is missing for a message, then MHonArc believes it is new.

In general, mail has message-ids. They get assigned by MTAs. However, if messages are generated by a CGI program, or other non-mail specific software, then the program in question should create a message-id. Else, you will need to move already-processed messages into a different area so MHonArc does not read them again.

A related problem is messages showing up again in the archive after you deleted them with RMM. MHonArc does not keep track of delete message-ids. Therefore, if want to make sure that a message will not appear in the archive after explicitly deleted via RMM, make sure to remove the message from input source.

How do I remove messages from an archive?

Automatic removal can be done via the EXPIREAGE or EXPIREDATE resources (available in v2.0 beta 2, or later).

Explicit message removal can be done with the RMM resource. Please read the RMM resource page for more information and examples.


98/10/03 15:44:56
Copyright © 1997-1998, Earl Hood, ehood@medusa.acs.uci.edu