Basic Formatting Suggestions for DDB Entries


Topics:

A. Introduction

B. Basic DDB Entry Format

C. Basic DDB Entry Format (Simple XML Markup)

D. Basic DDB Entry Format (Fully Developed XML Markup)

 

Updated 2005.01.05


A. Introduction

First and foremost, please understand well: the usage of XML tagging is not necessary for contributing to the DDB. We will happily accept contributions in popular word processor file formats with no XML markup whatsoever. If, however, you are interested in going a step or two beyond that, and would like to learn something about how we encode our materials, then please read on.

B. Basic DDB Entry Format

Up to now, the basic organization of a DDB entry has been like this (with some abridgments for the sake of simplicity):


Headword: (Han characters)

 

Pronunciations:

 

Chinese (Pinyin):

Chinese (Wade-Giles):

Korean (Hangul):

Korean (Ministry of Education System):

Korean (McCuneReischauer):

Japanese (Katakana):

Japanese (Hepburn):

Translation: (Simple, short-phrase equivalent of the headword, if available)

Explanation: (Detailed explanation of the entry headword)


If you were adding a term, you would type the Chinese next to "Headword." You would then add the pronunciations for the languages you know. Someone else can supply the readings for the languages you can't handle. After the pronunciations, we usually make an attempt to offer one (or up to a few) common renderings of the term. If it were a person, place, temple, etc., we would just supply the commonly used name, such as "Zongmi," "Dongshan," "Jinglingsi," etc. If it were a concept, "middle way," etc. This is followed by a detailed explanation, which can have multiple nodes for multiple contributors, as necessary.

Let's look at example. This is an entry regarding the Korean monk Iryŏn. It is an entry for which I provided minimal information many years ago, and which badly needs to be expanded. But its present brevity makes it useful here:


Headword: 一然

 

Pronunciations:

 

Chinese (Pinyin): Yīrán

Chinese (Wade-Giles): I-jan

Korean (Ministry of Education System): Iryeon

Korean (McCuneReischauer): Iryŏn

Japanese (Hepburn): Ichinen

 

Translation: Iryeon

Explanation: (1206-1289) An important Goryeo monk. A prolific writer, who is most famous for his Samguk Yusa [Chinese title here], a collection of facts and anecdotes which is a basic text for the study of the history of Korean Buddhism.

 


 

C. Basic DDB Entry Format: XML

Now, for XML. Rather than starting off with an explanation of XML theory, I think it is simpler if I just re-present the above example using a simplified form of XML.

 

<entry>

<hdwd>一然</hdwd>

<pron_list>

<pron>Yiran</pron>

<pron>I-jan</pron>

<pron>Iryeon</pron>

<pron>Iryŏn</pron>

<pron>Ichinen</pron>

</pron_list>

 

<trans>Iryeon</trans>

<sense> (1206-1289) An important Goryeo monk. A prolific writer, who is most famous for his <title>Samguk Yusa</title> 三 國遺事, a collection of facts and anecdotes which is a basic text for the study of the history of Korean Buddhism.</sense>

</entry>

 


 

If you look at this for a minute, you will see that there is not much difference between the first example and the XML-tagged example. The basic difference is that here we are using opening and closing tags to delimit information. You will notice that inside the <sense> tags, the title of Iryeon's text, Samguk Yusa, is enclosed with the tags <title></title>, indicating that this is the name of written work. We also use similar tags for <term>technical terms</term>, <foreign>foreign words</foreign> and other elements. When this entry is published as HTML, these words will automatically be italicized. We can also use these tags to build indexes. I will provide a full list of DDB tags later for those who are interested in working with XML.

If can cooperate by using this simple level of XML structuring, it would be greatly appreciated. But once again, it is not absolutely necessary for the task.

D. Basic DDB Entry Format (Fully Developed XML Markup)

The above example shows the barest XML framework—what are called ELEMENT tags. The tags <entry>, <pron>, <title>, etc. are all known as "elements" in XML parlance. But elements can also be enhanced by a very useful secondary layer of information, which is known as ATTRIBUTE information. Please see the same entry, again presented in a manner much closer to the way it is actually contained in our data set:


 

<entry added_by="cmuller" add_date="1990-09-21" update="">

<hdwd>一然</hdwd>

<pron_list>

<pron lang="zh" system="py" resp="c.wittern">Yīrán</pron>

<pron lang="zh" system="wg" resp="cmuller">I-jan</pron>

<pron lang="ko" system="mc" resp="cmuller">Iryeon</pron>

<pron lang="ko" system="mr" resp="cmuller">Iryŏn</pron>

<pron lang="ja" system="kk" resp="cmuller">イチネン</pron>

<pron lang="ja" system="hb" resp="cmuller">Ichinen</pron>

</pron_list>

<sense_area>

<trans resp="cmuller"><person_entry loc="ko">Iryeon</person_entry> </trans>

<sense resp="cmuller"> (1206-1289) An important Goryeo monk. A prolific writer, who is most famous for his <title lang="ko">Samguk Yusa</title> 三國遺事, a collection of facts and anecdotes which is a basic text for the study of the history of Korean Buddhism.</sense>

</sense_area>

</entry>


 

I believe that the point of most of the attributes should be obvious, but one of the most important that I would like to draw your attention to is that of "resp", which means "responsibility"—thus, "accreditation." Far distinguished from paper publishing counterparts, the usage of XML in a digital reference work allows us to give credit to the person responsible for every small part of the <entry>. Thus, if someone wanted to add another <sense> element (or "node") to this entry, it could easily be done, giving that person credit in the "resp" attribute.

Also commonly used in the DDB is the "lang" attribute, which tells us the language of the text or foreign word that will be italicized. For texts, we also have a "prov" (provenance) attribute. For temples and geographical entries, we have a "loc" (location) attribute. There are a number of others as well.

Using attributes allows for all kinds of programming possibilities, including various font transformations on presentation, creation of detailed indexes, and so forth.

However, once again, for those for whom this is a headache, it is fine if you want to terminate your exposure to XML here. Ensuing discussions will go into a bit more detail on XML for those who are interested, so you may ignore these if you wish.

Also, please feel free to write to me with questions at acmuller[at]jj.em-net.jp.

Regards,

Chuck