Information Systems, Context Dictionaries, Hartmann von Aue
and Those Little Glass Beads
This tractate was written by Prof. Roy A. Boggs to honor Prof. Kurt Gärtner upon his retirement. A more complete version appeared in: Magister et amicus. Festschrift für Kurt Gärtner, ed. by Frank Shaw and Václav Bok, Wien: Edition Praesens - Verlag für Literatur- und Sprachwissenschaft, 2003, pp. 265 - 268.
In the beginning it was the beads. When accurately arranged, they provided man with a tool for managing numbers and for calculating totals. The time was perhaps somewhere around 2000 BC and the tool was the abacus. Ever since this time beads have played a role in the history of the world. In western literature their use reached a figurative zenith in Herman Hesse's Glass Bead Game when Master Ludi raised his game to a final, exquisite level. His was a somewhat abstract and isolated world, but his game was in itself a marvel. Yet, for all his success with his beads, Master Ludi was confronted with a world moving forward outside the game, a world with in which he was less comfortable. He would have to come to terms with it.
Since the beginning of the year 2000 AD many of us still have our beads. For the lexicographer our beads are called by many labels, including lemmata, word forms, tagmems, adverbs, and the like. The process has changed little over the last century. Wilhelm Müller described the basic process in the introduction to the Beneke, Müller, Zarncke (BMZ) Middle High German Dictionary (Leipzig, 1854). One determined the existence of a lemma, collected related word forms displaying interesting usage and entered these on little slips of paper for manipulation and eventual display on paper documents such as lexical studies and dictionaries. This is also the process that was used for the Oxford English Dictionary (OED). One reads of limitations on the number and completeness of citations, limitations on page sizes and on the number of pages, and limitations by the publishers on the final format because of time and financial considerations.
An Internet and CD-ROM version of the BMZ is available from the University of Trier under the guidance of Kurt Gärtner. It presents an excellent reproduction of the original and makes the dictionary readily available to all. The electronic version includes links to other important MHG dictionaries and reference works, such as Lexer's MHG dictionary and the Findebuch zum mittelhochdeutschen Wortschatz. As valuable as the material may be, it remains a historical snapshot and can not be accorded academic validity. Its contents remain as limited now as they were over a century ago.
Modern lexicography is little different. While technology may now be employed to make lists, retrieve instances from databases and manage documents, little has changed for many lexicographers as they prepare for publication. Sidney Landau in an interesting tour of modern dictionary writing reiterates again and again the constraints imposed by publishers, mainly due to financial considerations. (Sidney I. Landau, Dictionaries. The Art and Craft of Lexicography. Cambridge, 1989.) The objective of a commercial dictionary is and remains profit, and for the user the phrase is 'What You See Is What You Get.'
Even during the congress on medieval German lexicography at the University of Göttingen in the spring of 1998 one heard comments regarding the limitations set by modern publishers that continue to delimit final results. "So many wonderful quotations must be left behind," "How many columns can the average article span?" "Which is the best logical construction for a dictionary: grammatical, topical, etc." Which is the best way to arrange our little glass beads in intricate manner to please both the publisher and the public - whoever this may be?
As information scientists we also are not free from our little glass beards. They are called bits and bytes, tuples, instances, objects and the like. Processes include Internet applications, data warehousing/mining and knowledge bases. Like the lexicographer above, data are collected, manipulated and displayed. They are assembled into giant repositories where routines seek to find relationships among the data and to extract new and useful constructs. We seek to gather as many data as possible in an attempt to discover what we know about ourselves and what we can learn about others. The processes are continually expanding with advances in technology.
However, the information scientist very often has so much data that perspectives are lost and results cannot be verified. They are simply generated and proudly displayed for approval. There exists a mountain of little beads without form or substance. There may exist no publisher to limit output. Yet, in the end someone's computer routine dictates ultimate structures and opportunities. It is often a matter of unknown who and a not-yet conceptualized how.
The two paths of the lexicographer and the information scientist would seem to be similar. Yet, there are distinct differences. The world of the lexicographer as described above is a top-down process. The world of the information scientist is a bottom-up process. It is matter of approach and basic assumptions. There is a conflict between who is in control, the supplier or the consumer. For instance, a dictionary maker supplies a finished product. What you see is what you get. There is no more. Even if the dictionary is presented 'on-line.' With information systems process, the consumer searches for insights and relationships. As the assembled data grow, the process is always open and never ending. For the consumer there is always more. Now, with the advent of the Internet into the world, the two disciplines might seem to going in different directions.
The Germanist has not been oblivious to the world of information systems. As early as 1971, a colloquium at the University of Mannheim entertained the idea that information systems might offer new tools for working with the masses of individual pieces of data. There have been several such colloquia since this date (for example, in Tübingen, Trier, and Würzburg). Results indicate that, at least, some Germanists recognize that there is indeed a new world with which they must come to terms. If they are going to effectively organize their beads in a more useful and effective manner, then they will have to assimilate the opportunities offered by the new tools developed by information science - with the caveat that the world of information science is one that seeks to expand horizons. The system provides opportunities. The user decides what and when. And it must be duty free!
Of special interest here is the world of dictionary making. Germanists understand how beads have been used in the past to prepare the great printed dictionaries,BMZ, Lexer, and even those such as the Iwein dictionary. Their compilers were masters and their results sometimes astounding, but these dictionaries have become outdated, limiting, and confusing. They can't really be trusted. Why were some texts included and not others, why were some lemmata included and not others, why were some word forms included and not others? And, of course, what does one do when a compiler used semantics to structure an entry when a grammatical structure would have been clearer, and vice versa? What does one do when meanings for prefixes and compounds are not treated consequently or, even, not present?
The answer to such questions is not to assemble more beads and use information systems to attempt to redefine old questions. To do this is to ask the wrong questions. To do this is to be limiting, even selfish. To do this is to fail to employ the new opportunities information systems offer. The old game may be familiar, but it is no longer interesting. The answer is to use information technology to organize the beads in a manner that expands rather than limits. The answer is to offer the user tools that are open, and, above all, which let the user ask the questions and seek the answers. Unless, of course, compilers think they have both the questions and the answers. This has never been the case! The user is far better left to the user's own devices.
For information systems, this user is of central importance. Looking at a MHG dictionary from an information systems view point, one might accept the argument that such a dictionary is designed for the Germanist in the widest sense of the word, which includes anyone from student to specialist. But it must then be understood that the student, by a very large margin the largest percentage of users, wants answers to questions. The specialist, part of a small minority, wants data with which to formulate questions. This distinction is very important and must never be forgotten.
When a student looks up a word, he or she wants to find a meaning for that word. The dictionary is a tool not a didactic device, nor is it a series of word studies arranged by some preordained structure, which remains to be discovered and agreed upon. The task is actually very simple. When the student accesses a dictionary to look up a word form (!), the student expects to find an entry and to find a meaning for that word form in that verse in that text. If the student would like more data, then the dictionary must offer the student more data, until the student has exhausted the system's resources. When the student begins to assemble data to formulate new and useful constructs, the student becomes the specialist. The dictionary must never be limiting, must never be didactic, and must never fail to provide all the data over which it has control. The information systems' world is one of expanding horizons.
Since all of the beads must be available all the time, the question naturally becomes one of sources. If the user proceeds from a word form, then where are the initial word forms to be found? The easy answer is in the primary texts, but then which ones? Since by far the largest number of users of the dictionary are the students, then one might assume that the texts they use should in the first instance all be available in the primary corpus - and every word form in these texts accounted for and available. One begins with the texts most used and then adds new texts as resources permit. If a specialist wants to add a specific text of limited interest, he or she is free to do so. The task is to begin to serve the greatest number of users and then to continue to grow.
Since all of the beads must be available and manipulable, there are basic questions of organization and of media. The first step is to rule out the printed page as the primary medium. Forget it. Anything written is old and limiting before it ever sees the light of day. One hears of interesting examples being lost along the way, of limiting the number of columns, of limiting the number of volumes, and of the needs of the publisher. Old questions. Perhaps an interesting exercise among players, but in an information world a waste of time and effort.
The real questions concern data organization, cross-referencing, tagging, entities, etc. The trick is to be both comprehensive and unique in a manner that permits present and future control of the data. Texts can be prepared in an information system infinitely quicker for worldwide access than they can be condensed into inflexible tombs bound for dusty library stacks. By using media, such as the Internet, a dictionary can begin to function the day the first text is prepared. It can be expanded, improved, and restructured in an immediate environment, and the results made instantly available.
The Hartmann von Aue Knowledge Base suggests some of these unlimited possibilities. It contains a Context Dictionary of the current critical edition for ' Der Arme Heinrich', in which all of the word forms are accounted for with lemmata, grammatical forms, translations, and notes. For those wanting to go further and to examine supporting historical documents, enlargable color images of the various manuscripts and corresponding transcriptions are available. Further research is supported by a lemmatized concordance, a reverse index, a rhyme index and a name register. All of these are generated upon requested. Make a change to the critical edition and all of the rest is automatically updated. Here is something for the scholar as well as the student - in one place and available, free of charge, and world-wide.
Of special importance here is the Context Dictionary. It is specific only to the current critical edition of 'Der Arme Heinrich' and contains entries and data for all of the word forms, nothing more and nothing less! Additional context dictionaries are planned for each of Hartmann's other works. These can then be merged to form a Hartmann von Aue context dictionary on-line - which will account for all of the word forms in all of Hartmann's works, nothing more and nothing less. This can in turn be combined with other such dictionaries for other authors, step by step, until one has assembled a data mine from which one can retrieve whatever is required at any given time from anywhere. Data will be instantly available for questions that have yet to be asked. Sounds like a lot of work? This depends on one's viewpoint.
Scholarship can continue to be paper-bound with all the implied limitations and costs; or we can recognize that academic contributions are just that, contributions, and we can make them and all of the supporting tools easily and freely accessible. The material on this web site has not been simply just put here. A great many students, scholars and institutions have made contributions. Use it profitably and then contribute wherever you can.
The year 2000 BC, and indirectly the abacus, was mentioned above. A new MHG dictionary is promised for 2025 AD. Both employ little beads. The question is whether new, non-linear structures will be developed which will take advantage of new information opportunities. One can try to improve an old exercise or one can challenge assumptions and create new opportunities. But whatever one does, as Master Ludi knew, there is indeed another world out there.