FAQ
overflow

Great Answers to
Questions About Everything

QUESTION

I'm working on an application that takes a special database of words and its word class and determines the such from a given sentence. I'm now working to see if word classes that are found in English are found in other non-European languages. If not, how would do you define word classes for Pacific, Asiatic and other languages?

So far, I can take the sentence the boy has a shirt, tag each word properly and using a Spanish database now, convert that same text to Spanish el nino tiene una camisa. In essence, I'm creating a multi-functional translation engine, but it won't be used to translate; it's more for simple human-to-machine translation.

NOTE: I wanted to post this on StackExchange, but the question was more tied to linguistics than programming.

{ asked by Everyone }

ANSWER

From Word classes and parts of speech (pdf), a 2001 paper:

Despite the theoretical problems in defining word classes in general, in practice it is often not diļ¬ƒcult to agree on the use of these terms in a particular language. This is because nouns, verbs, and adjectives show great similarities in their behavior across languages. [...]

The general properties of nouns, verbs, and adjectives ... are sufficient to establish these classes without much doubt in a great many languages. However, again and again linguists report on languages where such a threefold subdivision does not seem appropriate. Particularly problematic are adjectives ... but languages lacking a noun- verb distinction are also claimed to exist ..., and ... adverbs ... present difficulties in all languages.

And towards the end:

Hengeveld (1992a) proposed that major word classes can either be lacking in a language (then it is called rigid) or a language may not differentiate between two word classes (then it is called flexible). Thus, `languages without adjectives' ... are either flexible in that they combine nouns and adjectives in one class (N/Adj), or rigid in that they lack adjectives completely.

Hengeveld claims that besides the English type, where all four classes (V - N - Adj - Adv) are differentiated and exist, there are only three types of rigid languages (V - N - Adj, e.g., Wambon; V - N, e.g., Hausa; and V, e.g., Tuscarora), and three types of flexible languages (V - N - Adj/Adv, e.g., German; V - N/Adj/Adv, e.g., Quechua; V/N/Adj/Adv, e.g., Samoan).

Universal language support is tricky at best, as far as I know linguists are still arguing about which aspects of language are universal and to what extent. (Universal here meaning "applying to every natural language that could conceivably be used by a human being".)

From a practical standpoint, e.g. for the purposes of making a program, there is also the question of how common the various language classes are. For example Tuscarora, mentioned as an example above, has a grand total of 52 speakers (according to Wikipedia), and it may become a business decision how far out of your way you are willing to go to support it.

IANAL (with the L here meaning Linguist), I merely read up on similar topics in a similar context a few years ago.

{ answered by Everyone }
Tweet