Augmented Plain Text (APT) Version 1.0simplified web authoring
|
| ||||||||||
Copyright ©2002 Murray Altheim. All Rights Reserved. |
The Augmented Plain Text (APT) specification is a design for a simple set of keyword tokens, that when added to a plain text document enables an APT processor to autogenerate valid XHTML documents. This can be used for conversion of existing plain text sources for the Web, or as a simple alternative to hypertext authoring.
This document is intended for review and comment by interested parties. It is a “work in progress,” currently has no formal status, and its publication should not be construed as endorsement by any corporate or academic body. This document may be updated, replaced, rendered obsolete by other documents, or removed from circulation at any time. It is inappropriate to use this document as reference material, or cite it as anything other than a “work in progress.” Distribution of this document is unlimited.
The 'APT' notation is a simple, augmented notation designed to simplify creation of XHTML documents from existing text sources such as email messages. Most current editing software has some facility to generate plain text output, with some claiming to generate HTML. Unfortunately, the "HTML" generated from most every single known product is far from adhering to any known HTML specification, and in the case of (for example, since it is the worst) MS Word, its output is so obtuse and convoluted that a slew of translators have been written to translate its "HTML" output into something akin to HTML, though not without substantial losses of content (and certainly formatting) in many cases.
APT was designed to fill a different niche, namely for those wishing to author in plain text, those who have existing text sources, or who'd like an XHTML-valid document with an autogenerated table of contents and hierarchically-numbered sections from what is ostensibly a plain text document (with a few simple codes added). Yes, APT is simple. It doesn't have every known Web feature, doesn't create JavaScript buttons or fry your bacon for you. It does simplify web authoring for those who think web authoring should be simple and straightforward. You can spend some time playing with the CSS stylesheet if you really want your output to look different than the default, or take the output document as input for further processing (perhaps adding your own JavaScript buttons and bacon).
An APT document looks something like this:
#APT 1.0
#AUTHOR Tim Bunwich
#TITLE Not a Normal Day at the Park
Today I went to http://centralpark.org/home.html #LINK Central Park
to feed some squirrels, a thing I do most every day. Well, for some reason
the squirrels seemed agitated, a bit put off at simply accepting the nuts
I was handing out.
I was at the point of positioning a piece of pecan in front of this big
black squirrel's nose when suddenly he ran up my arm and stood on the top
of my head, then began squealing wildly. I froze in place, not knowing
quite what to do. These little buggers have very sharp claws and teeth,
and me with no hair... well, I was afraid for my scalp.
All around me were squirrels, all just a few feet from my Berkenstock'd
toes. I knew my life was about to change for the worse.
Pretty simple? Note that inline HTML links are a Level 2 feature, so the inline link isn't active in this sample. Here's the above fragment converted into XHTML, plus some more samples of before and after:
| before | after | description |
|---|---|---|
| sample1.apt | sample-apt1.html | a simple APT conversion showing a mostly plain text source |
| sample2.apt | sample-apt2.html | a display of the hierarchical headings and autogenerated TOC |
| sample3.apt | sample-apt3.html | the same as sample 2 except in Spanish |
| sample4.apt | sample-apt4.html | an exercise of Level 1 and some Level 2 features |
HTML bases its formatting model on a distinction between block and inline content, that is, content that appears as separated blocks in paragraph style, and content that appears within the individual lines of those blocks.
Likewise (since the destination of APT processing is XHTML), APT's
formatting model uses the same distinction. Blocks of plain text
separated by one or more blank lines are considered separate paragraphs
of content, and several of the APT statements expect blocks of content,
scanning until a blank line. These include
#P (paragraph),
#PRE (preformatted),
and #PRE (definition).
The rest of the APT statements scan until the end of the line in
which the keyword occurs. Examples include
#AUTHOR (author name),
and #H1 through
#H6 (heading levels 1 through 6).
APT uses keywords that start with a hash symbol (e.g., #AUTHOR)
occurring in column 1 (i.e., the beginning of a line) to denote an APT
statement. APT parsers should ignore keywords occurring elsewhere, or
unknown keywords (perhaps emitting a warning when this occurs), so that other
instances of hash characters followed by unknown tokens have no effect (other
than to be replicated in the output file).
The APT syntax is designed according to implementation levels, to allow for varying levels of support. The core level, Level 1.0, is quite simple but provides support for headings and other block text types. Level 2.0 provides general linking support, with Level 3.0 providing inclusions and other advanced features.
Level 1.0 and 2.0 processors will autogenerate the document title, heading markup, divisions and paragraphs, as well as a table of contents using heading titles. Optional features include autogeneration of hierarchical section numbers. Note that if necessary, a hash character can be escaped using its XML character entity equivalent ("#"). A processor should be labeled as according to its implementation level. If an APT processor receives an APT source document whose level is greater than its implementation supports, it should refuse to process the document.
Most APT statements start with an APT keyword beginning in column one and continue to the end of the line. Lines may be continued with a backslash onto the next line. The exception to this rule (and there is only one) is the inline augmentation in APT Level 2 to Level 1's #LINK: #LINK).
The following keywords should be supported in all APT Level 1.0 processors (APT header keywords appear in red):
#APT WS level #TITLE WS content #SUB WS content #AUTHOR WS content #EMAIL WS content #REF WS linkURL #REV WS content #TOC #COM WS content #H1 WS content #H2 WS content #H3 WS content #H4 WS content #H5 WS content #H6 WS content #H WS content #DFN WS term "|" definition #LI WS content #PRE WS content #HR WS [percentage] #LINK WS linkURL linkText
The keywords #APT, #TITLE, #SUB,
#AUTHOR, #EMAIL, #REF, #REV,
and #TOC are considered APT header keywords, and should only
occur once in an APT file, grouped at the top (order is unimportant).
"WS" stands for whitespace: space or tab characters.
The following keywords should be supported in all APT Level 2.0 processors:
linkURL [#LINK WS link text]#IMG WS imageURL WS alt-text #LOGO WS imageURL WS alt-text #FIG WS imageURL WS id WS alt-text #SETHEAD WS content The following keywords should be supported in all APT Level 3.0 processors:
#INCLUDE WS linkURLPlaintext (ie., whitespace-based) tables are supported using three statements:
#TABLE WS [parameters]#TH WS [column markers] [heading titles]#ENDPlaintext-based forms are currently under development.
[Other Level 3 features under consideration.]
Quick Reference:
1: #APT
| #TITLE
| #SUB
| #AUTHOR
| #EMAIL
| #REF
| #REV
| #TOC
| #COM
| #H1
| #H2
| #H3
| #H4
| #H5
| #H6
| #H
| #DFN
| #LI
| #PRE
| #HR
| #LINK
2:
#LINK
| #IMG
| #LOGO
| #FIG
3:
#SETHEAD
| #INCLUDE
| #TABLE
| #TH
| #END
#APT WS number
The #APT keyword signals the APT processor as to the
expected implementation level necessary for correctly processing
the document. Accepted implementation levels are "1.0", "2.0" or "3.0".
If this APT statement occurs, it should be the first statement in
the document. If missing, "1.0" is the default.
#TITLE WS title
The #TITLE keyword indicates that the text following
is the document title. This should only occur once in an APT file,
and a warning should be generated by the processor on encountering
more than one. This title is used as the HTML <title>
as well as the displayed document title at the top of the document
itself.
#SUB WS subtitle
The #SUB keyword indicates that the text following
is the document subtitle. This should only occur once in an APT file,
and a warning should be generated by the processor on encountering
more than one.
#AUTHOR WS author name
The #AUTHOR keyword indicates that the text following
is the author name. This should only occur once in an APT file,
and a warning should be generated by the processor on encountering
more than one.
#EMAIL WS email address
The #EMAIL keyword indicates that the text following
is the contact email address. This should only occur once in an APT file,
and a warning should be generated by the processor on encountering
more than one.
#REF WS url
The #REF keyword indicates that the text following
is a URL reference to the document. This should only occur once in
an APT file, and a warning should be generated by the processor on
encountering more than one.
#REV WS revision number
The #REV keyword indicates that the text following
is the revision number of the document. This should only occur once
in an APT file, and a warning should be generated by the processor
on encountering more than one.
#TOC WS [title prefix]
The #TOC keyword indicates that a table of contents is
desired in the output document. Absent this keyword, a table of contents
will not be generated, though some processors may wish to generate a
table of contents at some default location (Ceryle has a property
"urn:ceryle:properties:defaultTableOfContents" that
determines which behaviour, though its default is false).
The optional literal string (i.e., in quotes) following the keyword is used as the prefix string for the section numbers appearing in the headings and table of contents. If an empty string (""), no text is used (just the section numbers); if "NONE" no section numbers will appear. A typical string might be "Section" or "Sec".
#COM WS comment
The #COM keyword indicates that the text following
is a text comment, to be used in creating an XHTML comment.
Such comments are visible in the XHTML source, but are not displayed
by Web browsers.
#H1 WS level 1 heading text
#H2 WS level 2 heading text
#H3 WS level 3 heading text
#H4 WS level 4 heading text
#H5 WS level 5 heading text
#H6 WS level 6 heading text
[description]
The #H1 through #H6 keywords indicate that
the text following is content for use in a heading. Such headings
and the sections they head play a role in the general document hierarchy.
#H WS heading text
The #H keyword indicates that the text following
is heading text for use in a neutral heading. Such headings
are used when the heading and its section are not to play a role
in the document hierarchy, when the document does not have or need
such a hierarchy, or when your life is already too complicated to
care about such things as heading levels.
Neutral headings still create a wrapper <div> element, so that document manipulation via division is possible. Such "neutral divisions" are never nested; they are always leaf node divisions in the overall division structure.
This keyword is useful for expressing the document abstract, foreword, preface, or other front or back matter sections. It will have a class attribute of "ndivx" (where 'x' is the display heading level of 1-6) so that display can be customized in a stylesheet.
#DFN WS term WS "|" WS definition
The #DFN keyword indicates that the text following
is a term and its definition, divided by a vertical bar and whitespace.
If a series of definitions are located together without intervening
codes, blank lines, or text, the group will be treated together as
an XHTML definition list (<dl> element).
#LI WS list item text
The #LI keyword indicates that the text following
is a list item. If a series of list items are located together
without intervening codes, blank lines, or text, the group will
be treated together as an XHTML unordered list (<ul>
element) with a class attribute of "list" so that
display can be customized in a stylesheet.
#PRE WS preformatted content
The #PRE keyword indicates that the text following
is preformatted text. In such content, whitespace is preserved.
This is used for display of poetry, program code, ASCII tables
or art, etc. The preformatted block will continue until the
first blank line. Lines will be continued if their last character
is a backslash ("\"). A line containing only a backslash is
considered as one blank line.
#HR WS [width]
The #HR keyword indicates display of a horizontal
line in the text. An optional width may be included as a percentage.
If a width is included, the line will be left-justified.
#LINK WS linkURL linkText
In APT Level 1, links must begin on a line with the #LINK
keyword, and cannot occur inline.
The following APT statement
#LINK http://www.altheim.com/lit/everest.html archy climbs everest
would be rendered as a link to archy climbs everest.
url WS [#LINK WS link text]
In APT Level 1, links must begin on a line with the #LINK
keyword, and cannot occur inline.
With APT Level 2, all http:, ftp:, and file:
URLs embedded in APT source files are converted to links automatically*, so the #LINK keyword can be
considered optional in cases where you want the link text to show up
as a URL. Now if a URL is immediately followed (after some whitespace)
by a #LINK keyword, the text following the keyword up until
the end of the line is used as the link text.
* The APT processor scans for the strings "http://", "ftp://" and "file://" and ends the URL with the first whitespace encountered.
The following APT statement
Amazing is the http://www.altheim.com/murray/compbun.html #LINK The \
Comprehensive Bunny Name List
would be rendered as a link to The Comprehensive Bunny Name List. Note the use of the backslash.
#IMG WS url WS alternate text
The #IMG keyword indicates display of an image.
The source URL follows the APT keyword, then after the first whitespace
following the URL all text until the end of line is used as the
alternate text (used when the image is not displayed, in text-only
browsers, and as an aid for the sight-impaired).
The following APT statement
#IMG img/neocortext-net.png Neocortext.net: Where Brain and Document Meet
would be rendered as the image located at img/neocortext-net.png, relative to the location of the generated HTML file, with the remaining text as 'alt' text. In this example, the image file is located an a subdirectory of the document's directory named "img". If desired, an absolute URL can be used rather than a relative reference.
#LOGO WS url WS alternate text
The #LOGO keyword indicates display of an image at the
top right of the page as a logo. The source URL follows the APT keyword,
then after the first whitespace following the URL all text until the
end of line is used as the alternate text.
#FIG WS url WS id WS alternate text
The #FIG keyword indicates display of an image as a figure.
The source URL follows the APT keyword, then the figure ID, then all
text until the end of line is used as the alternate text. The ID may
be used to link to the figure.
#SETHEAD WS ["heading prefix" WS ] initial heading number
Because documents are sometimes broken up into multiple web files, it is necessary to be able to set the initial heading number of a document so that it doesn't start a new hierarchy (i.e., displaying a "1."). The number is of the form of the desired section number.
Following the #SETHEAD keyword is an optional, double quote
delimited string containing a text string to be prefixed in front of
each heading, eg., the word "Section" in "Section 1.1", "Section 1.2".
This can be set to an empty string ("") if no text is desired, or
to "NONE" if no section numbers are desired at all.
Note:
This keyword may appear anywhere in a file in order to allow
setting of the heading prefix and a section number, but results
may be rather unpredictable if care is not taken.
#INCLUDE WS url
The #INCLUDE keyword indicates that an external APT file
is to be transcluded at the point in which it occurs. Heading information
in the transcluded file is ignored.
Example:
#APT 3.0
#AUTHOR Don Marquis
#TITLE The Cruise of the Jasper B.
#SUB Night, Tempest, Love and Battle
#INCLUDE "http://www.altheim.com/lit/jasperb.15.html"
Transclusions: To include an external file as an answer to a question:
...
#DFN My first question is how many monkeys? |
#INCLUDE answer1.html
Ed.Note:
This will include either an APT or XHTML document, in this case the
XHTML document 'jasperb.15.html'. Because the included file may be
an APT, HTML or XHTML file, and of the latter two may turn out to
not be well-formed content, the processor must be capable of cleaning
up the content to a degree necessary to include without corrupting
the output, so error handling is currently the biggest design question.
#TABLE [ WS parameter ]*
The #TABLE keyword begins a series of statements considered
together as a table.
The end of the table content is marked by a #END
keyword (absent this keyword, parsers will scan to the end of the file). The
#TABLE keyword may be followed (on the same line) by
parameters of the form name="value" separated by whitespace.
These allow setting of optional table attributes including
summary, width, border, frame,
rules, width, cellpadding and cellspacing.
Whitespace is used to delineate both rows and columns. Column breaks are indicated by the first line of the table, using vertical bar characters (|) as indicators. Absent a first line whose first non-whitespace character is a vertical bar, the table will be created as one column. Row breaks are empty lines, or lines containing only whitespace. Use of tab characters is not recommended due to potential confusion (i.e., editor configuration may alter display of tabs), but when they occur are each converted to four space characters.
Table headings are set by the #TH
statement, using vertical bar characters as delimiters between individual
table headings. If there is a mismatch between the number of headings
and the number of indicated columns (as described in the above paragraph),
a warning should be issued.
Example:
#TABLE border="0" width="80%"
#TH Qty. | Description | Price (ea.) | Total
| | |
3doz. Free Range Eggs £1.55 £4.65
12 medium 12s
2lbs. Cornhill Bacon £1.79 £3.58
cooked smoked 40g
1 loaf Hovis white bread £0.59 £0.59
medium sliced 800g
Total £8.82
#END
Note:
In the above example, note that the vertical bar characters of the
heading (#TH) do not line up with those in the first
line of data. In the former case they are used as delimiters between
heading cell contents, in the latter as column indicators. But since
extra whitespace is not significant within table cells, there would
have been no harm in aligning them, and indeed it would make the table
easier to read during editing of the APT file.
#TH WS [ heading_title "|" ] [ WS + ]
The #TH keyword indicates the table headings for the current
table. If the first character of the last line is a "+" character, that
line will be used to set the column breaks for the table.
Since alignment of the column divisions requires alignment of the "+"
characters, it's recommended that this string of characters begin on a
newline, using a backslash to begin the new line (see the example below).
See #TABLE for more information.
Note that any characters occurring at the column breaks are ignored, as they are expected to be whitespace. The following example shows conversion of a code table:
#TABLE
#TH | UN Code | Country Name | ISO 3166 Code \
+ +
246 Finland FIN
250 France FRA
254 French Guiana GUF
258 French Polynesia PYF
266 Gabon GAB
270 Gambia GMB
268 eorgia GEO
276 Germany DEU
288 Ghana GHA
292 Gibraltar GIB
300 Greece GRC
304 Greenland GRL
#END
#END
The #END keyword indicates the end of a table.
See #TABLE for more information.
Ed.Note:
This section is currently in progress...
If APT were not to support the generation of hypertext documents, it might be considered rather deficient, though creating a web of hyperlinks can also be quite difficult to manage.
A link consists of a link and its target. Each link in a hypertext document has a purpose — discerning this purpose simplifies to some extent their creation and management.
An APT source document must be a text document. This source document undergoes a number of transformations, enumerated below:
Ed.Note:
This section is currently in progress...
#P (paragraph) keyword
with the explicit goal of avoiding the conversion of existing
XHTML markup.
Feedback on the APT design can be sent to its author at <m.altheim@open.ac.uk>