The Haeinsa EBTI Meeting

by Christian Wittern and Urs App


This article (which is also published in the Electronic Bodhidharma No. 4) is a report on the third meeting of the Electronic Buddhist Text Initiative which took place from Sep 30 to Oct 2, 1994, at Haiensa Monastery on Mount Kaya in Korea.

Oct. 1, 1994

The meeting was attended by around 40 participants from Asia, North America, and Europe, representing both the research community and various monastic institutions. It was hosted by the Goryeo Daejanggyong Institute of Haeinsa Monastery under the direction of Ven. Chongnim, and organized with the help of Prof. Jae-ryong Shim (Seoul National University) and Prof. Lewis Lancaster (University of California, Berkeley, U.S.A.).

The meeting opened with a reception at Haiensa International Hotel on Sep 30. The next morning, after a morning service and a short dharma talk by the spiritual leader of the monastery, there were welcoming words from the director of the Koryo Daejanggyong Institute, Ven. Chongnim; the president of the EBTI, Prof. Lancaster; and the EBTI's coordinator, Prof. Urs App (Hanazono University, Kyoto, Japan).

Prof. Urs App, associate director of the International Research Institute for Zen-Buddhism (IRIZ), opened the first session by describing the "Zen Knowledgebase" project that he leads since 1990. It aims at collecting, processing, and (mostly electronic) publishing of all kinds of information related to Zen Buddhism. This includes a marked-up text database, information about texts, information about primary and secondary literature, persons, reference works, maps, works of art etc. In talking about the way this project changed over the years since its inception, Prof. App emphasized the need to stay self-critical and adaptable to new needs and to engage in solid basic research at each step of the project to avoid getting stranded with worthless data. It has been a major part of his project to communicate the problems and findings at every stage with other interested parties, to produce not just results, but also information about how they were achieved.

The next speaker was Prof. John McRae (Cornell University, U.S.A), who showed slides and screenshots that introduced his project of inputting manuscripts from China's Yunnan province which he is just about to start. As work has not yet actually begun, he gave some background about his reasons for becoming interested in these manuscripts and how he plans to interweave the electronic texts with audio and visual information. His second topic was the World Wide Web site he is planning at Cornell that will furnish this information to any Internet site all over the world.

As the last topic before lunch, we heard Prof. T. Supachai (Mahidol University, Thailand) give a presentation about the computerization of the Siam edition of the Pali Canon, a project that began in 1987 at Mahidol University, Bangkok. The input of 45 volumes of canonic text and 57 volumes of commentary has since been completed and put on a CD-ROM, together with a fast search engine. Prof. Supachai demonstrated his software which comes as an integrated system that do es not allow for interaction with other programs and prevents export of the text to a standalone text editor. He was supported by Ven. Phra Thammapidok who introduced the content of the canon and Prof. Wangsawang (Mahidol University). Searches for multiple keywords by Boolean expressions and a synopsis of found keywords in their context (KWIC, keyword in context) has not yet been implemented but will hopefully be integrated in upcoming versions.So far, no tags have been added to the text except what was necessary for search and linkage of the main text body to the commentaries.

After a short break that gave the participants the opportunity to experience authentic monastic food at the monastery's canteen, we went on a tour of the monastery, which took us to the main Buddha Hall, where the monastic services are held and to the repository of the more than 80.000 wooden printing blocks of the Tripitaka Koreana. The preservation and computerization of these printing blocks are the major objectives of the Daejanggyong Institute headed by Rev. Chongnim who later gave us a brief introduction to the scope and early achievements of his institute's Tripitaka Koreana input project (update May 1995: this project now has received considerable support from the Samsung company and is progressing quickly). The printing blocks of the Tripitaka have been perfectly preserved for more than 700 years because the buildings that house these blocks have been built in a way that produces an ideal climate. The exact way how this is achieved is not yet completely unde rstood by today's researchers.

The afternoon session opened with a demonstration of a system developed by Mr. Chang Derming of Academia Sinica, Taiwan that allows to mark up the document structure of a text and use this as a basis for a knowledge structure. The strikingly simple user interface allows markup by selecting tags from a table and connects them directly to a selected part from the text. It even allows for overlapping tags, which can be useful for tagging of documents. Unfortunately there is no interface yet to standard document markup languages like SGML, a serious drawback for long-term projects that need both exchangeability and stability.

Ven. Valananda from Colombo, Sri Lanka, then talked about the Singhalese Pali Canon Project which is an attempt to input works in Pali and Sinhala script which have been written on ola leaves. To date, about 35 MB worth of data have been input, but proofreading is still ongoing. As this is done through voluntary service, the process is rather slow and may take approximately 5 years to complete. A program that allows access and data retrieval has already been built. Search results can be saved to a file and used with any wordprocessor, for example for pasting them into scholarly documents.

Ven. Yifa, Ven. Yung Chin and Ven. Yung Ming from Fo Kuang Shan Monastery, Taiwan, came next to report about their efforts to computerize the Fo Kuang Shan Buddhist Dictionary and to produce their own edition of a Buddhist Canon. Both projects have first been planned as printed editions, with the publication of the dictionary completed in 1987. At a later stage, the desirability of electronic versions was recognized, and efforts are under way to realize them. The dictionary was printed with metal letters, so the input is attempted by scanning of the printed pages. After that, important numbers of Buddhist texts were input and edited on computer. Thus the planned electronic version will be built from the same source. Problems encountered have been largely due to the fact that the character set available was too limited and did not contain the desired forms of characters (glyphs). To overcome this, additional characters have been created, but only if no character with exactly the same meaning could be found in the code set. For the typesetting system, additional characters have been created to ensure the correct appearance on the printed page. So far, about 14MB of Chan texts have been input; the publication of the printed version in 51 volumes is scheduled to begin by the end of 1994. Work on a CD-ROM of these texts will begin in January of 1995.

Robert Chilton of the Asian Classics Input Project followed with a demonstration of the CD-ROM which his group has produced so far. Besides providing texts and information for scholars, ACIP is also supporting the Tibetan input operators and their educational institutions. All texts are provided free of charge, for the CD-ROM a modest donation of US$ 15 is required. As Robert showed us, the CD contains a built-in search menu that allows for locating a text by catalogue number, or by either author or subject or title. The material on the CD amounts to ca 25000 pages dating from 200AD to modern times. Some of the texts included here have not been finally released, but are awaiting proofreading. So far, no markup except for basic location markers is included. However, Mr. Chilton and others mentioned in discussion that the meeting has convinced them of the necessity of markup and of the need to address this at early project stages.

After a break, Urs App from Hanazono University showed his recent work in producing electronic transcriptions of Dunhuang manuscripts. His first point was to introduce some basic principles of the European text-critical tradition, developed mainly in dealing with the various manuscripts of European classical literature and the Bible. This tradition relies heavily on imperfections and "bad texts." It is just these that allow to trace the dependencies and filiations of texts. Thus mistakes and imperfections are prized, while in Dunhuangology the prevalent tendency is still to eliminate all imperfections in order to produce a "good text." He emphasized that electronic versions of such manuscripts allow comparison that goes much beyond simple textual differences and can take a variety of other factors into consideration (for example various kinds of corrections, lacunae, differences of ink, etc). Many observations about the manuscripts that can not be included in a printed apparatus can be recorded in form of tags in an electronic version; they can be retrieved where needed and also used for other purposes such as elaborate collations. He then showed how such electronic transcriptions can be used to create a number of different views of a collated text.

Christian Wittern, Researcher with the IRIZ, came next to show his work on extending the number of characters available to common applications. He first compared two large code sets of Chinese characters, CCCII and CNS, both from Taiwan. He concluded that the principle of encoding variants in CCCII by assigning the internal codes in a certain way makes it difficult to use in practice. He therefore prefers to use CNS as an extension of the popular Big-5 code used in Taiwan and elsewhere; CNS is downward compatible to BIG5 code. Using the approach Mr. Wittern outlined, Big-5 based systems can gain access to an additional 35000 characters, bringing the total amount of usable characters to about 48000. He then demonstrated how such an approach actually be implemented and used in popular operational environments, showing how a character is searched and selected from a database to be pasted into a document.

Mr. Chuang Derming from Academia Sinica followed and demonstrated how his markup program can be practically used for structural markup of documents. He showed how the table of tags can be created from scratch and thus be adopted to the individual requirements of different texts and projects. It turned out to be very easy, as it can be done completely with drag and drop operations on the screen.

This concluded the afternoon session and gave the participants time for a break in the monastic canteen. The evening session began at eight, when Lou Burnard introduced the complex Framework of the 'Guidelines for Electronic Text Encoding and Interchange'. As these Guidelines are an attempt to enable electronic encoding of any text of any time in any language, the resulting printed version is an impressive 1300 pages long. Lou Burnard helped the audience imagining access to those detailed recommendations with his 'Chicago Pizza Model' that separates base, core and toppings. The base is usually prose, verse or drama (like the base of a pizza can come with a limited number of flavors), the core is what comes with any text (like the cheese on a pizza) and toppings can be an assortment of anything that is required additionally. With this flexible model a great variety of texts can be encoded with identical or at least compatible elements for the main parts and with spec ialized elements added where the special requirements needed it.

The morning session of October 2nd was devoted to a practical exercise entitled "Creating a Buddhist DTD". Jan Nattier of Indiana University began the session with observations and remarks on what elements would need to be distinguished in Buddhist texts. Under the direction of C.M. Sperberg McQueen, a group tagging session took place on the opening sentences of the Heart Sutra in a translation by Lewis Lancaster. It turned out im mediately that some of the categories in the TEI Header can not be easily applied to such texts; for example, our text had no known author but rather two translators, namely, Kumarajiva and Lewis Lancaster. In the process of tagging, most phenomena that in someone's opinion needed tagging could actually be marked with an appropriate element from the TEI-DTD.

EBTI Business

The afternoon session dealt with the business of EBTI. The following was discussed and decided:

Seoul session

On the morning of Oct. 2, the meeting was continued at Dongguk University in Seoul. There were numerous visitors from various universities and other institutions to hear Howie Lan and see Prof. T. Supachai's demonstration. Howie Lan from Berkeley spoke about Character Codes and Unicode. He explained the logic and coverage of Unicode in relation to other East Asian character codes. So far, the main aim of Unicode is pulling together all extant national and corporate character codes to form a code set into which any of these codes can easily be translated. However, with respect to Chinese characters, a unification of various national code sets is necessary. Going into technical detail, he commented on the internal logic of the coding and on the employed strategy for the unification of characters. He also urged interested parties to submit additional needed characters to the national standardization authorities involved in designing additions to Unicode.

After these last talks and demonstrations, the participants were invited to lunch, and the EBTI meeting was declared closed.