MIA: Volunteer: Administrator's Handbook II

Technical Guide to Archive Administration
Part II

Work with your rifle nearby!

Links: Check on a regular basis to ensure that your archive does not have link problems! In order for our archive to work in any medium, whether on a user's local computer or on our CD, and to insure it is possible to mirror our archive, three basic guidelines must be followed for making links.

1. All links should be relative:

Not: http://www.marxists.org/archive/marx/index.htm

2. Links should must be made directly to a file name:

Not: archive/marx/

3. Do not switch to the root directory with a starting slash:

Not: /archive/marx/index.htm

Perl Script: A good deal of the markup process can be automated, particularly when working with books. This script will save you hours of html work when marking up a book — taking care of 95% of the work for you. You start with a single text file that you've scanned. This script will piece it out into seperate chapters and name them accordingly; it will build a table of contents linking to each chapter, and it will format every page into our standard html format. All in about 5 to 10 seconds, depending on the size of the file. :)

Macintosh OS X: Open your Applications folder, then the utilities folder. Double click the application that says "Terminal". Now it is just a matter of drag and drop. First, drag the script onto the terminal window, and you'll see a "path name" to that file come up. Type in a single space, and then drag the text file you want converted onto the terminal window! Easy, isn't it? :) Then just hit enter. You'll get acouple of questions in the terminal, like what is the name of the book and so on; answer them, and you'll have the job done! :)

Windows: If you are using Microsoft Windows you need to download and install the program Perl. Perl is a text processing program language, and is free. Click on the start menu, then select "All Programs". Inside that open the folder "Accessories", and open the program called "MS-DOS" or "Command line" inside. This part can be slightly difficult. First, we want to get the script and the file we want to convert in the same place, so let's say your desktop. Different versions of windows put this in a different place, so we will assume you are using Windows XP. When you opened the Command Prompt program, it started us in your "home" directory. So, we need to type:

cd Desktop

Then press enter. Since we have the files on your desktop, all we need to do is type:

perl markup-book.pl file.txt

Of course, file.txt is actually whatever you have named the file that you want to convert. You'll get acouple of questions in the terminal, like what is the name of the book and so on; answer them, and you'll have the job done! :)

If you have any problems or questions, ask one of the other volunteers.

Tx2html: is an Emacs Lisp script that converts .tx files to web pages. Files with extension .tx are called "Poor Man's Markup" (or PMM). Each file contains at most two pages, a verso and/or a recto. The idea with PMM files is to model an image of a verso/recto, using plain ASCII text, with minimal markup. This helps detect errors in less-than-perfect from OCR programs. Some TeX and HTML markup is allowed, but not much. The script results in HTML pages with uniform look-and-feel. The PMM (.tx) files can be used for other purposes besides web pages, such as transformation into XML, which can help in archiving for Project Gutenberg.

MIA style: Grab our sheet to reference as you read along. About half the stuff in that sheet is there for particular "tweaks", and will be easy to understand once you see how the basics work. Also, be aware that we have a basic document template (copy the source of the html), to get you started.

Basic paragraphs: Most paragraphs will simply use the <p> tag. Note that all tags need to be closed, i.e. at the end of each paragraph, use </p> to close it. Some paragraphs shouldn't have an indent on the first line; this paragraph is a good example! In this case, use the tag <p class="fst"> (Example), which is the exact same as the <p> tag, except without the first line indent.

Quoted material: For material that is quoted by the author use the tag <p class="quote"> . When the quoted material presents a number of points, as a "1)..., 2)..., 3)...", the text shouldn't have a first line indent, so use <p class="quoteb"> (Example). The two quote tags are otherwise identical. Finally, for quoted material that is a subsection of other quoted material, the margin needs to be increased, so use <p class="quotec">.

Indented material: When the author is making her own sub sections in text, use <p class="indent">. This element essentially works the same as <p class="quote"> — when the author makes a number of points, use <p class="indentb"> (Example); when an author makes a further sub-section within a section, use <p class="indentc">.

Different sections on the same page: A few documents have multiple headers on them, dividing the document into separate sections. To show this, at the top of the page write rite <p class="index">Document Contents</p> and use the tag <p class="index"> and inside of make links to all the sections of the document. The html marking a section header within a document should be named with an <h4> tag, and the anchor can be s1, s2, or the name of the section. Subsections can be named with <h5> or if the section tag needs to be aligned to the left, use an <h6>.

Separating different sections: To separate different sections on the same page, you can insert a <hr class="section"> Otherwise, you may simply like to put some space between one section and another; for this use the <p class="skip">&#160;</p>. This is our "skip a line tag", it needs the "&#160;" (i.e. a blank space) so that the html will register it as a valid line.


By the Author: Use the tag: <sup class="anote"><a href="#foot-1" name="doc-1">(1)</a></sup>. Note the a href is the link down to the footnote text, the name="doc-1" is an anchor, so the corresponding number at the bottom of the page links back to the note in the text. Thus, at the bottom of the page, the footnote would look like: <sup class="anote"><a href="#doc-1" name="foot-1">(1)</a></sup>.

By the Editor: There are four types of notes we make:

Editor's note specific to the document: For notes specific to the document; i.e. for translations of foreign text, historical references, etc. Use <sup class="ednote"><a href="#footed-1" name="doced-1">[A]</a></sup>. Note that the notation is by letter, not number, and is in brackets, not parenthesis — this is to make the distinction clear between an author's and editor's footnote. Again, you want to reverse the name and href with the footnote at the bottom.

By the Editor inside the text: This is primarily used for editorial corrections, when words are left out or an additional word would clarify the meaning. These notes should be in brackets and surrounded by a <span class="inote"> tag, which makes the text a dull grey, like a pencil mark. You can also use the "inote" tags for short editorial notes, i.e. for short translations of foreign text, citations the author didn't include, etc.

Document information notes: These are notes that describe the document (i.e. in the context of an event, etc). For these notes use the tag: <p class="pagenote">.

Further HTML Resources: You can read loads of stuff on the web about HTML! :) Below are two good guides, as well as two validators. It is important to use validators for all your html, to ensure good compatability with the many different platforms and web browsers out there:
CSS pointers
W3C HTML Validation Service
WDG HTML Validator

Contact the Marxists Internet Archive Admin Committee for further information