Words on the web: a new direction for semantics
Share this article
A Website Management Programme for students in the Bachelor of Arts with Honours Degree in Language Information Science, offered by CityU's Department of Chinese,Translation and Linguistics, is proving a great success, with student participation now including a large number of students from all three years. The programme is run by Dr Jonathan Webster, Acting Head of CityU's Department of English and Communication. It is one of several projects in which Dr Webster is involved in affiliation with the Institute of Chinese Linguistics (ICL). "The programme has grown considerably from when there were maybe five or six students interested to the present, when almost the entire class is involved. It reflects a great change in student awareness of the importance of the web and IT, in terms of their future careers."The programme, which has been running for several years, provides an opportunity for students to gain hands-on experience in managing a web server. The initial focus was putting together a website that was directly related to topics in Chinese language and linguistics. Over the years this has expanded, as students have become involved in other projects, such as putting together a "grammar surgery" for the English Language Centre. It's a computer-assisted language-learning tool for students, with a fun slant of going to the "surgery" to visit the "grammar doctor".
Machine translation project
One of the more challenging projects currently being undertaken in affiliation with the ICL is the application of semantic web technology for an example-based machine translation project (EBMT). Other colleagues from the Department of Chinese, Translation and Linguistics participating in the project along with Dr Webster are Dr K K Sin , Dr H Pan and Mr Caesar Lun .
The initial task is to design a best-match algorithm for translated text spans ranging in size and scope from words to phrases, clauses and sentence patterns. The algorithm will be rigorously tested and human input of improved translations will be constantly incorporated into the corpus in order to build up and develop the learning ability of the algorithm. In turn, this will enhance the accuracy, consistency and intelligibility of the translated text.
Going through the phases
The EBMT project, which is funded by the University Grants Committee, has three phases: example acquisition, example application and example-base management.
The first phase, example acquisition, is nearly completed. This has involved the text alignment of the 25 million word Bilingual Laws Information System (BLIS) Corpora. The text alignment occurs at various linguistic levels, including word, phrase, clause and sentence. "The BLIS corpus was selected because it was an amazingly good rich text to work with and it was translated by experts. Progress has been quite good in this very difficult phase at the beginning, which is to get the examples by doing the alignment between English and Chinese," said Dr Webster.
Phase two, the example application phase, is currently in progress and deals with how existing examples are used to facilitate translation. The main issues include identification of useful examples in an input sentence, determination of a sequence of identified examples to be used in composing the translation, and further manipulation of the target language parts to render the composition. This is actually the translation process.
Within a year, the team hopes to have a prototype for example application where the examples are in a database and used to improve translation.
The third phase concerns the management of the example-base where the examples are stored in such a way as to facilitate subsequent retrieval. This method draws on advances in semantic web technology.
The semantic web approach provides the means for rendering information in a machine-processable form. "Basically, the web is now moving in a direction of how you can represent the meaning of the text, instead of just having a repository of documents. Instead it will have a rich knowledge base from which to draw information," Dr Webster explained.
Bilingual dictionary database
Another project Dr Webster is working on uses semantic web technology with a bilingual dictionary database. This will be useful for natural language processing as well as being a practical tool for any users of the web.
"Many databases today are very fixed - you input something and retrieve information following a fixed format. With this technology you will be able to store information and retrieve it using rules and inference. The only way you can do that is if your database is rich in terms of knowledge."