Linguistics

When ALLC, our association was founded, linguistics was understood as one of the main pillars of humanities computing. It was duly so, because the very first uses of the computer in the humanities was aimed at completing linguistic tasks or, in a broader sense, tasks centred around the manipulation of data having linguistic form. Just think of the pioneering linguistic analyses by father Busa, or the collection of machine readable dictionaries and corpora in general, as well as early attempts of machine translation. Another boost for linguistic tasks resulted from TEI, the Text Encoding Initiative which in the past decade did not only produce a standard usable and used for a wide variety of text related (thus linguistically interpretable and meaningful) projects but which influences the digital representation of an ever widening range of textual and non-textual information. Thus, it appears quite in line with the development of humanities computing in general to give an overview of the contributions of linguistics as well as to suggest possible new roles linguistics can play in our multi-featured, increasingly multidisciplinary field of applied science.

Where are we now?

Since any application must be based on certain firm theoretical foundations, there is a logical relation between the two main branches of linguistics, theoretical and applied, which must go hand in hand in completing linguistic computing tasks.

Theoretical linguistics and humanities computing

Dictionaries

One of the early uses of the computer was aimed at making machine readable dictionaries and collect relatively large corpora of a language, such as the Brown Corpus. In addition to their function of getting to know an inventory of the given language, they also served as a basis on which ambitious projects of natural language processing could start.

Parsing and synthesis

The collection of even unedited linguistic data offered the linguist the chance to describe the underlying structure of a language and support applications for non-linguistic tasks. Thus, the development of parsing techniques in syntax and morphology made early machine translation systems possible and layed the foundations for more sophisticated, speech based language technology tasks of the future.

Cognitive science

As an emerging and rapidly developing field, cognitive science needs the input from many disciplines, including linguistics. We have been witnessing, especially in the past decade, the computer aided study of language, linguistic form and communication which is contributing both to a better understanding of cognitive processes and to practical applications based on such research.

Applied Linguistics

Text analysis

Text analysis is probably one of the broadest concepts in humanities computing encompassing all research and applications related to linguistic form. Main work is done in the field of information retrieval of any kind, including literary text analysis in general, automatic indexing, stylistics, authorship studies and retrieval of grammatical as well as semantic, pragmatic information, including automatic abstracting. In addition, publication support is a highly useful area of text analysis which makes all kinds of text based and multimedia information available to the largest ever public.

Text analysis is the field of applied science in which linguistics and literature are most dependent one on the other, where the deviding lines between these two, often competing disciplines are essentially disappearing and where both of them mutually benefit from the approaches as well as results of the other.

Corpus linguistics and corpus creation

Since the first electronic corpus (Brown Corpus) was created a new linguistic discipline has developed (Corpus linguistics) which tries to turn performance into its starting point and to describe, in a systematic way, languages on the basis of naturally occurring oral and written speech.

For a long time corpus linguistics existed only for English or for varieties of the English language because the corpora which were created were either corpora of English or were the only ones freely available to the scientific community.

In the meantime corpus linguistics has established itself also with respect to other languages, above all the Romance languages. However, whereas English corpus linguistics started off with the study of contemporary written speech, Romance corpus linguistics is at the moment mainly considered either with oral speech or with ancient written speech. This difference in interest reflects itself in corpus projects like C-Oral-Rom, which aims at the creation of comparable oral speech corpora for the main Romance languages, or ItalAnt where a marked up corpus of ancient Italian texts is created.

(Foreign) language teaching

Traditionally, applied linguistics used to simply mean language teaching to many. The introduction of the computer in the process of language teaching and learning has produced significant results by making this special learning process accessible and efficient to the millions. Computer techniques in language teaching methodology also spread over to other disciplines as well, thus showing the power of applied computing beyond linguistics proper or even humanities computing.

Language technology

With the introduction of the computer in linguistics the first seeds of language technology were also laid. Whereas almost half a century ago it mainly meant machine translation, with technological development and the appearance of new consumer demands this accent was shifted to speech technologies. As new fields of computer based applied linguistics we can identify the development of speech recognition and synthesis as well as automatic question/answer systems.

Methodology/Tools

Methodology is an essential constituent of our joint efforts in humanities computing. The selection of the proper methodology can serve different fields of humanities.

Corpus studies

The study of corpora is a methodology equally used for linguistic, literature, general text analysis tasks.

Quantitative analyses

Data of any kind are analysed by quantitative, mathematical and statistical methods to offer generalizations which were not possible prior to the introduction of humanities computing.

Markup

Any significant generalization is based on data. Data can be obtained by parsing and other ways of analysis, however, human markup still remains an essential and indispensable prerequisite for the retrieval of any complex information.

Where are we going?

Present development is fundamentally based on earlier achievement with an increasing share of new technologies as well as demands.

Theoretical linguistics

Logical modelling of language

As one of the promising new fields of theoretical linguistics, we can see an increasing number of projects to model linguistic structure by the help of the computer. In addition to the use of logical programming languages, such as PROLOG and Lisp for the realization of applied linguistic tasks, we can now find the same programming environment for the modelling of linguistic structure, which can be considered as computer simulations of structures predicted by a given theory.

Models of Universal Grammar

One of the fundamental concerns of theoretical linguistics is the description of Universal Grammar, i.e. the grammar that is supposed to form the abstract underlying basis of all languages. If we just think of the enormous importance of more efficient language teaching methodologies, its study far exceeds the theoreticians‘ interests. Computer modelling of Universal Grammar can thus reach out to every layer of a society.

Applied linguistics

Corpus linguistics and corpus creation

The experiences gained from corpus building and the insights into the regularities of oral and written speech corpus linguistics has permitted so far will lead to the creation of more advanced corpora and to the development of a theory of speech, the speakers‘ knowledge and of corpora.

There is already urgent need for more complex and adequate software and this need will become even stronger as corpus linguistics establishes itself more firmly. As more and more courses in corpus linguistics are being established at universities the need for corpora available and analysable over the web will grow. There will also be a growing need for financially affordable (in opposition to the Tuscan world centre, for example) summer schools / training courses / working seminar.

(Foreign) language teaching

We are witnessing new ways and methodologies of language teaching which essentially mean the increased incorporation and integration of multimedia elements and non-linear structures. In addition, speech technology is entering the field: not only will programs speak to the language learner but they will also evaluate the speech feedback of the student.

Speech technology

Linguistics will have an increasing share in solving practical industry-level speech technology tasks for the consumer market as well as education in general.

Methodology/Tools

We cannot determine the position and role of any theoretical or applied research without considering its supporting methodologies. What is probably most essential is that most linguistic applications use some sort of markup. Markup for text analysis, information retrieval, semantic and pragmatic annotations. In doing so mathematical, statistical, probabilistic methods are increasingly used. Methodology appears to be specific to humanities computing in general, thus any knowledge received in one area of research can have a very good chance to be applied to another field, be it linguistics, literature, arts, music or publications. Common features of methodology successfully unite the academically diverse fields within humanities computing.

What should our agenda be?

Survey

Although we have many years of experience in humanities computing, in order to define our present position and future goals, we need to have a broader picture of what we actually do, what our actual needs and plans are. We need to set up a database of research groups, types of activities, methodologies, goals and prospects in order to have a firm basis for strategic planning. We hope conferences like the present one as well as submissions to our journals will be reliable contributions to this issue.

"Virtual Institute of Humanities Computing"

The strength of our community may eventually be multiplied under the organizational force of a virtual institute which may offer

  • online courses on methodology as well as specific humanities computing tasks especially for the beginners,
  • distance diplomas in humanities computing for which the academic background of a network of universities may be required,
  • online publications, especially for work in progress to stimulate dialogue and provide forum for talented young scholars,
  • a software repository.
  • workshops by leading representatives of various fields of humanities computing to disseminate and enhance accumulated knowledge and constantly increase the share of computing in humanities research in general.

 

Software initiative

A forum (working seminar or the like) should be created where people who really do research on the basis of corpora should come together with programmers in order to create good interactive and integrated tools for the scientific study of corpora, for making corpora available and analysable over the web and for the teaching of languages with the help of corpora.