Authors of that paper proposed a prototype machine translator system to translate scientific English sentences into Ara- bic sentences. This system is based on natural language processing and machine learning. This proposed system is ap- plied in statistical field, which is very important on a mathematical sub field in Math department. The system is ana- lyzed, designed and developed. Author tested the proposed system on some statistical statements. It proves its validity as a prototype system.
Machine Translation (MT) is one of the applications of Natural Language Processing (NLP) [
This definition stresses the fact that MT is not simply substituting words for other words, but like human it involves the application of complex linguistic rules especially in morphology, syntax and semantics. Which means that computer could be used to translate from source language to target language. It could translate an entire document automatically and then presents it to a human. But when a human composes a translation, perhaps calling on a computer for assistance in specific tasks such as looking up specialized words and expressions in a dictionary, the process is called human translation. There is a gray area between human and machine translation, in which the computer may retrieve whole sentences of previously translated text and make minor adjustments as needed. However, even in this gray area, each sentence was originally the result of either human translation or machine translation. Authors should reserve the label MT for the case when a computer performs both the initial translations of the sentences and subsequent manipulations, which could be called translator tools.
There are three basic approaches being used for developing MT systems [
The idea of using computers to translate or help translate human languages is almost as old as the computer itself. Before email, before word processing, before commandline interfaces, machine translation or MT—was one of the first two computer applications designed to act upon words instead of numbers (the other was code breaking). But it turns out that really good MT is so hard to pull off that the task exhausted the top-end computing resources of every generation attempting it. Regardless, machine translation is going stronger than ever, fired up by the globalization of the Net. Today, all over the world, software designers, programmers, hardware engineers, neural-network experts, AI specialists, linguists, and cognitive scientists are enlisted in the effort to teach computers how to port words and ideas from language to language [
New developments worldwide in the fields of technology, political and socioeconomic trends starting in the 1980s contributed to a revival of Machine Translation advancement and research. These developments include the strides made in information technology, a rapid fall in the cost of computing power, globalization and increaseing demand from multinational companies and governments for translation. These developments are by no means the prime mover of research and development behind MT; they just helped increase the pace of development. Translation and Interpretation asserts that research and development of MT has been going on since the 1950s “engaging some of the best minds in computing, linguistics and artificial intelligence,” [
As English is a universal language, most of the researches in MT are mainly concentrated on the translation between English and Arabic because automatic English-to-Arabic translation is still an active area and this will help in simplifying the Arab communication with other countries. There were a lot of implemented systems that work especially in English-to-Arabic MT field:
Hoda M.O. Mokhtar et al. [
Beesly [
Al-Anzi et al. [
There are commercial MT systems. “Al-Mutarjim AlArabey” which translates English text into Arabic [1,13], “golden Al-Wafi translator” which also translates English text into Arabic [
The previous lectures proposed a generic MT system but domain specific MT systems are mandatory for scientific applications. Martha Palmer et al. [
The translation in Arabic language still limited and its results are still not totally satisfactory [
Although English is a universal language and most of the researches in MT are mainly concentrated on the translation between English and Arabic language. But a few translations in scientific fields are existing. Authors specify a statistical field as a specific domain in the proposed system translation. This field is very important on a mathematical sub field in Mathematics department, which ease the teaching and research in that department.
The main idea behind the proposed MT system is to translate the Source Language (SL) sentences to the Target Language (TL) sentences by carrying out the possible parse, replacing source words with their target language equivalents as specified in a bilingual dictionary, and then re-arranging their order to suit the rules of the target language.
The first stage of processing involves the parser [
As English is a universal language, most of the researches in Arabic MT are mainly concentrated on the translation between English and Arabic [6, 1, and 5]. This could help in simplifying the Arab communication with other countries.
The current system is proposed for scientific EnglishArabic automatic translation. It consists of three main modules as shown in
• Analysis module: This is used to analyze the input text.
• A transformer module: This is used to translate English sentences structures and words.
• A generation module: This is used to produce target Arabic sentences behind its input-output interface and special MT requirements.
The proposed system architecture is shown in
The proposed system centered on the domain-specific data founding in an English-Arabic Bilingual Dictionary selecting from statistical books in which Efficient MT system would rely heavily on domain-specific statistical vocabulary. This nature is not supported by most Englishto-Arabic translators systems. For any specific MT we would have to augment extensively with additional domain-specific vocabulary.
The proposed system composed of three main components. As described in the previous section the first is used to analyze the input English sentence and uses Dictionary and suitable Grammar to Produce a Source (English) Structure. Secondly is a transformer module which is used to translate Source (English) Structure and words to target (Arabic) language structure and words. And finally a generation module which is used to produce target (Arabic) language text. The flow of all the process is shown in
The analysis is done throw two main phases: scanner and parser phases.
4.3.2.1. Scanner
The English sentence entered to the proposed system. And then it uses the Scanner, which it divides the English sentence into words by splitting it when it finds a space string. The output of this step is a list of English words that ready to go to parser. Author called the output list “English_words_list”, assigns the number of words in the sentence to the variable name “English_words_list_ lentgh” and should keep the order of words as shown in
4.3.2.2. Parser
English sentences Analysis (Parsing), it means that: employing the possible English grammar rules that have been selected to cover almost sentences cases that could compose an abstract encountered by the system to analyze (parse) the input English sentence and Produce an English Structure. Rules were implemented using Phrase Structure Grammar (PSG) [
The parser accepts “English_words_list “that building a sentence and output a list of parts of speech like noun, verb, determinant, auxiliaries, adjective, preposition and etc as shown in
The statistics is the analysis of the data.
To parse this sentence which it has [the, statistics, is, the, analysis, of, the, data] English_words_list, linguists often use a special notation to write out grammar rules. In this notation, a rule consists of a “left-hand-side” (LHS) and a “right-hand-side” (RHS) connected by an arrow (à) [
S à NP VP; VP à V NP PP; PP à preposition NP; NP à (DET) N; DETà the; N à statistics; V à is; DET à the; N à analysis; P à of; DET à the; N à data.
Where S means sentence, NP means noun phrase, VP means verb phrase, DET means determinant or article like “the, a and an”, N means noun, V means verb and P means preposition.
Semantics is concerned with the meaning of words and how they combine to form sentence meanings; there are many ways of thinking about representing word meanings, it involves associating words with semantic features which correspond to their sense components. Associating
words with semantic features is useful because some words impose semantic constraints on what other kinds of words they can occur with.
After obtaining the English_parts_of_speech_list, some semantic features has been applied for every word in English_words_list, in which it deals with the relation between categories such as “Subject”, “Object” and (deep) categories such as “Agent” and “Effect”. It reduces the ambiguity of choosing the meaning of words.
The transformation is done throw two phases: Building a Bilingual dictionary and English-Arabic transformation:
4.3.3.1. Building a Bilingual Dictionary
A Bilingual dictionary is an English-to-Arabic dictionary that contains the words in English language and their translation in Arabic language. Author has been used sql server database as the associated management system. To form tables in the database, author collected words variously from different traditional dictionaries and statistical books to cover the statistical vocabularies that may found in the input sentence. And also it contains the word characters such as type, gender, tense, numbers and meaning.
4.3.3.2. English-to-Arabic Transformation
The module accepts “English_words_list” and “English_ parts-of-speech_list”. The output is “Arabic_words_list”. The system looks up in the bilingual dictionary for the Translation of English words and obtains equivalent Arabic words Translation according to the transformer flow chart as shown in
Generates translated Arabic sentence after applying transformation rules is done within that module through two phases (Synthesis rules of Arabic) and (Synthesis rules of Arabic).
4.3.4.1. Synthesis Rules of Arabic
At that phase the system accepts “Arabic_words_list” and the output is a sentence in a target (Arabic) language. It is the previous final phase that reordering translated words according to various Arabic rules as shown in
1) Verb phrase in Arabic sentence has the order to form as follow:
a) The subject in English sentence is located after the verb.
b) The object in English sentence is located after the subject.
2) Noun phrase in Arabic sentence has the same order
to form like English sentence.
3) If the English sentence matches the rule NP ® DET N then DET located before N for example “the book” English sentence would be “ال كتاب” in Arabic sentence.
4) If the English sentence match the rule NP ® ADJ N then ADJ located after N for example “clever boy” in the English sentence would be “ولد ماهر” in the Arabic sentence. Where ADJ means adjective.
4.3.4.2. Arabic Morphology
After obtaining translated Arabic sentence, the system has to apply Arabic morphological rules to obtain satisfactory Arabic sentence.
For example “the girls play in the garden”, here girls are female so the translation of the verb play should be “يلعبن” which it combines of translation of play “يلعب” and morphological rule “ن النسوه” In contradiction that if the sentence be “the boy play in the garden”, the translation should be “يلعبون” because boy’s gender is male so it combines of play “يلعب” and morphological rule “ون للجمع المذكر”.
The system will be evaluated through performing a number of successful tests by randomly choosing a set of sentences from the field of mathematical studies and that to prove its validity. A dictionary is implemented by choosing statistical vocabularies from statistical books, Elias, Almawred and et al. [
The proposed system is being carried out in visual c#.Net programming language environment and SQL server as the associated database management system. And some shoots from the results shown below in Figures 8(a)-(e).
At that work authors propose an approach for Englishto-Arabic translation based on NLP which fills the gap in the field of scientific translation. The transformer approach employed in the proposed system combines some English grammar rules, structure transformational rules and Arabic morphological synthesis rules. The system analyzed, designed and implemented using c#.Net and SQL server. The results are successful Arabic translations from scientific English sentences. The proposed system is characterized by its translations that have high syntactic and semantic quality. In addition, its simplicity and modularity, enables future modification and extendibility with ease. In the future work authors intend to make that prototype system more realistic.