Voici la dernière version d'un article que j'ai écrit pour le numéro de novembre 2003 de la revue technique interne de la société pour laquelle je travaille.
 
Here is the latest revision of an article I have written for the November, 2003 edition of the internal technical publication for the company I work for.
 

Lojban, the UML, and the SWH


Summary

coi. rodo .i mi'e jexOm.

While learning the Lojban language1, I found myself drawing some Unified Modeling Language (UML) class diagrams to help me better grasp the Lojban grammar concepts. I was doing domain modeling in the linguistics field! Was it a UML Whorfian effect?

This article is about a constructed language called Lojban, the UML, and the Sapir-Whorf Hypothesis (SWH).

Table of Contents

Constructing Languages
Learning the Lojban Grammar Concepts
   Lojban Basic Predications
   Lojban Word Categories
   More Lojban Word Categories
   Lojban Tanru
   Description Sumti
   Would You Like More of This?

Why Did I Create this Model?
Back to the Sapir-Whorf Hypothesis
Experiencing UML Whorfian Effects?
Be Liberated from Whorfian Mind-locks
ki'e .i co'o
Acknowledgements
References
Endnotes


Constructing Languages

Designing new languages or studying artificial languages is a most fascinating activity, in my opinion. When I say new languages, I mean new tongues, not new programming languages (although it may be fascinating also to invent or learn new programming languages).

Esperanto2 is the most successful artificial language today. Clearly, it has it roots in European natural languages, but has been built with a regular grammar and a regular way for building families of words. Esperantists promote Esperanto as a universal secondary language for culturally neutral international communications.

There are thousands of other artificial languages (also known as constructed languages, or "conlangs"), some of them being "art languages," practiced by their inventor only, or by a whole community,3 like Tolkien's Elvish languages4 or Klingon,5 spoken by the aliens of Star Trek.

Lojban has evolved from its inception to become an original, usable and interesting language. Why would anybody like to learn and practice Lojban? Basically because it's a mind-expanding experience and because it's fun!

At the very beginning, in 1955, Lojban was called Loglan (both names mean "Logical Language"). It was an experiment to test a linguistic research concept known as the Sapir-Whorf Hypothesis [9, 10] (see later section "Back to the Sapir-Whorf Hypothesis"). The SWH has many different formulations, depending whether you take the "strong" or "weak" variant:

  • SWH strong formulation:

    Language shapes the way we think, and determines what we can think about.

  • SWH weak formulation:

    The language spoken by a linguistic community has an influence on its culture (what this society does and thinks).

  • SWH negative (and rather strong) formulation:

    The limits of the language one speaks are the limits of the world one inhabits.

One the issues with the SWH is that nobody really agrees on what it could be, but we will discuss more about the SWH later.

Today, the Logical Language Group has departed from its original objective to test the SWH (which is very difficult to demonstrate, by the way) and has grown Lojban as an instance of the "engineering languages," a subcategory of the constructed languages.

As of today, Lojban is a beautifully designed language. And it is fascinating partly because it is different. The grammar, the vocabulary, everything is very carefully built, in a logical way, but doesn't look like any spoken or written natural language. That doesn't mean that it is ugly or difficult to learn, and "logical language" doesn't at all mean that you can't be fuzzy in what you say or that you can't write poetry!

For more about Lojban, see The Complete Lojban Language book [1]; Chapter 2, "A Quick Tour of Lojban Grammar, with Diagrams," is a good place to start and get a feel of what Lojban is. Actually, the book doesn't contain many diagrams...

Learning the Lojban Grammar Concepts

This section is about the Lojban language, for which I describe some grammar concepts with UML class diagrams. If you're lost, feel free at any time to skip to the next section, "Why Did I Create this Domain Model?"

Lojban Basic Predications

Since the Lojban grammar is rather unusual, it is explained from the beginning using Lojban terms. (Note: Lojban terms are always written as invariable nouns, since in Lojban plurals are not denoted with a suffix such as an "s".)

A standard Lojban sentence expresses a predication, which is called a bridi. In English, all the following sentences, although built from different grammatical entities, also express predications, which can be paraphrased with relationships:

  1. I am your father (to be + noun) = to-be-father-of (father => I, child => you)
  2. You are big (to be + adjective) = to-be-big (who-is-big => you)
  3. I go to Paris (active verb) = to-go (goer => I, destination => Paris)
  4. I give you this (active verb) = to-give (donor => I, gift => this, beneficiary => you)
  5. That is green (to-be + adjective) = to-be-green (what/who => that)
  6. You are a cat (to-be + noun) = to-be-a-cat (what/who => you)
Lojban Pronunciation
  sounds like
u /oo/, like in "look"
o /o/, like in "show"
c /sh/, like in "show"
g /g/, like in "god"
s /ss/, never /z/
j /j/, like in French "bonjour", or /s/, like in "pleasure"
/h/, like in "hello"
x

/kh/, like in the Arabic "Khaled", or /ch/ like in the Scottish "loch", or the German "Bach"

The translations into Lojban give the following bridi:

  1. mi patfu do
  2. do barda
  3. mi klama la paris.
  4. mi dunda ti do
  5. ta crino
  6. do mlatu

Note: the Lojban words, like patfu, barda, klama, etc., have been built algorithmically using today's six most widely spoken languages: Chinese, Hindi, English, Russian, Spanish, and Arabic.

For each relationship, a default place structure (programmers would say signature) has been defined. The place number in the bridi tells the role played by its occupant. A place in the bridi is called a sumti. The centerpiece of the bridi, called the selbri, expresses the relationship itself. So, typically, a bridi will have the form shown in Figure 1.

 


sumti selbri sumti sumti ...
Figure 1. Lojban bridi Structure
Figure 1. Lojban bridi Structure
 
Lojban Grammar Glossary
Word Definition
bridi predicate
sumti argument
selbri predicate relation
cmavo structure word
gadri article
cmene proper name
brivla predicate word
gismu root word
valsi word
lujvo compound predicate word
tanru phrase compound

Lojban Word Categories

If we look back to the examples in Lojban, we see different kinds of words:

  • mi, do, la, ti, ta belong to the category of small grammatical words called cmavo.
  • among these, la is an article (gadri) announcing the name paris. (a name is called cmene in Lojban).
  • mi, do, ti, ta are sumti cmavo, a bit like pronouns.
  • patfu, barda, klama, dunda, crino, mlatu are all brivla i.e., words that express a relationship, words that carry the meaning; these brivla are gismu actually, i.e., root words.

 

STOP! You may say. Don't you feel the need to draw some diagrams to help yourself at this point? Well, I do.

Figure 2. Categories of Lojban Words

Figure 2. Categories of Lojban Words
 


More Lojban Word Categories

In other words, Lojban has no such category as noun, verb, adjective, or adverb. It has relationships, called bridi, with one or more words that constitute the selbri at the center.

In

  • do mamta mi ("you are-a-mother-of me" i.e., "you are my mother")

or in

  • do patfu mi ("you are-a-father-of me" i.e., "you are my father")

mamta and patfu play the role of the selbri. They are different brivla. A brivla is a content word, it can be:

  • a gismu, built into the language
  • a lujvo, derived from combination of gismu
  • a fu'ivla, borrowed other languages, and adapted to Lojban
 
Figure 3. Kinds of Lojban brivla
Figure 3. Kinds of Lojban brivla


We have already used some gismu. These gismu are formally defined like this:

  • patfu: x1 is a father of x2
  • barda: x1 is big/large in property/dimension(s) x2 as compared with standard/norm x3
  • klama: x1 comes/goes to destination x2 from origin x3 via route x4 using means/vehicle x5
  • dunda: x1 [donor] gives/donates gift/present x2 to recipient/beneficiary x3 [without payment/exchange]
  • crino: x1 is green/verdant [color adjective]
  • mlatu: x1 is a cat/[puss/pussy/kitten] [feline animal] of species/breed x2

Where x1, x2, … represent the arguments (the sumti) that are accepted in the predicate (the bridi) when these gismu play the role of a selbri. The arguments are optional. If there are present, it is their order in the bridi that counts to understand the sentence. (There are means to change this order and still understand the same thing, but it's beyond the scope of this presentation.)

Lojban Tanru

A selbri can be also a tanru, which is a metaphor, built with a set of brivla. Like:

  • mi sutra bajra (I am a quick runner / I run quickly / I quickly run)
  • do barda nanla (you are a big boy)
  • mi dunda patfu (I am the father-who-gives)

Where:

  • sutra: x1 is fast/swift/quick/hasty/rapid at doing/being/bringing about x2 (event/state)
  • bajra: x1 runs on surface x2 using limbs x3 with gait x4
  • nanla: x1 is a boy/lad [young male person] of age x2 immature by standard x3

 

Note that the meaning of a tanru may be fuzzy.

In a tanru, the left part is called the seltau; it is a modifier for the rightmost brivla in the tanru, which is called the tertau. A tanru has the place structure of its tertau.

A tanru may be more complex, with more than two brivla. Complex tanru have a "left-grouping rule" semantics that can be overridden using the cmavo bo, which acts as a top-priority operator. For example, with the following additional vocabulary:

  • cmalu: x1 is small in property/dimension(s) x2 (ka) as compared with standard/norm x3
  • nixli: x1 is a girl [young female person] of age x2 immature by standard x3
  • ckule: x1 is school/institute/academy at x2 teaching subject(s) x3 to audience x4 operated by x5

you can build the following complex tanru, which all mean "this is a small girl school," but where the English is disambiguated in:

  • ta cmalu nixli ckule ("left-grouping rule" semantics)
    ta cmalu bo nixli ckule (same meaning as above)
    This is a small-girl school (a school for small girls)
  • ta cmalu nixli bo ckule
    This is a small girl-school (a small school for girls)

 

A tanru may be modeled with a variant of the Composite Pattern as shown in Figure 4.

Figure 4. Lojban tanru Basic Structure

Figure 4. Lojban tanru Basic Structure


Do you remember the lujvo, which is a kind of brivla? I said a lujvo is derived from a combination of gismu. The Lojban vocabulary is founded on a list of 1350 gismu and building lujvo is the only way to extend this vocabulary. A lujvo is built by contracting a tanru, and fixing its meaning (a tanru may have an ambiguous meaning, that will be disambiguated by its usage context).

Let's consider:

  • gerku: x1 is a dog/canine of species/breed x2
  • zdani: x1 is a nest/house/lair/den/[home] of/for x2

The following tanru

  • gerku zdani

means "a house that has something to do with some dog or dogs." It may mean any of the following:

  • houses occupied by dogs
  • houses shaped by dogs
  • dogs which are also houses (e.g., houses for fleas)
  • houses named after dogs

If you want the meaning "doghouse," fix it into a lujvo. For that, you just have to combine (the exact rules won't be described here) two of the rafsi (affix) associated with the gismu in the basic dictionary.

  • gerku has ger as rafsi (and also ge'u)
  • zdani has zda as rafsi

For "doghouse," we can now build a new word from gerku zdani, and set its meaning and place structure:

  • gerzda
    for which:
    x1 = x1 of zdani = nest
    x2 = x2 of zdani = inhabitant = x1 of gerku = dog

gerku zdani is said to be the veljvo of gerzda.

So, there's a relationship between a lujvo and a tanru that has something to do with the rafsi of the participant gismu. See Figure 5.

 

Figure 5. A More Complete tanru Model

Figure 5. A More Complete tanru Model
 

Description Sumti

Description sumti turn a selbri place into a "description sumti." All the x1, x2, … in the previous examples were filled by pronouns (sumti cmavo), except in one example, "la paris.", which has an article (or gadri: la), which turns the cmene "paris." into a description sumti. There are other gadri to use with a gismu. Suppose I would like to say "My mother gives the green cat to the big girl." You need something to fill the places of "give": x1 (the donor), x2 (the gift) and x3 (the beneficiary). The cmavo "le" directly extracts the first place of the bridi built with a unique brivla or tanru. Combined with "se" it extracts the second place, and with "te" the third place, and so on. For example:

  • le dunda (the donor)
  • le se dunda (the gift)
  • le te dunda (the beneficiary)
  • le mlatu (the cat)
  • le se mlatu (the species of the cat)
  • le crino mlatu (the cat that has something to do with green-ness)

 

So:

  • le mi mamta cu dunda la crino mlatu le barda nixli
    My mother gives the green cat to the big girl
  • le crino mlatu cu se dunda
    The green cat is given (to someone by somebody)
    The green cat is a gift
  • le barda nixli cu te dunda le crino mlatu
    The big girl is given the green cat
    Somebody gives the green cat to the big girl

 

Note: "cu" is a cmavo used to introduce the selbri. If not present in the first example above, "mamta dunda" would have to be interpreted as a tanru, meaning something like "a giver which has something to do with a mother," or a "motherly giver." So, you need something to separate the end of the first sumti from the beginning of the selbri: "cu" plays this role. It is optional when the first sumti is simple, like a sumti cmavo, but is mandatory when the first sumti is more complex.

If you think about it, descriptors are used to turn a selbri into a sumti. If you study Lojban, you'll see how "events" are used to turn a whole bridi into a selbri.

These sentences are in fact object representations (instances) of the following class diagram, which is an enhancement of Figure 1, where the selbri and sumti classes have now been turned into interfaces.

 

Figure 6. Lojban Grammatical Concepts
Figure 6. Lojban Grammatical Concepts
 

Would You Like More of This?

This may look complex, because the explanations were very quick and not very progressive.
Of course, a text or a dialog would use many grammatical features that won't be described in this short article (the events, the Lojban time/space tense system, etc., or even much simpler constructs). If you feel like you could be interested in Lojban, check out the documents and lessons at the www.lojban.org Web site. The Lojban community is really friendly to beginners; feel free to ask questions on the mailing lists.

Why Did I Create this Domain Model?

Why did I do all that diagramming? Confronted with new concepts, I felt the need to represent them and their relationships. What I have now is only a map of the concepts, and a lot of white spaces. Working on a domain model is exactly this: building an enhanced glossary of the concepts. They are mostly class diagrams, but, of course, you can model beyond that. Modeling aids understanding, but it has its limitations: you still have your original task at hand. Suppose you want to build an application, for example, a dedicated structured editor for writing and automatically helping fixing Lojban texts, or a translator, or a computer-aided tutorial? You may have to build entirely different models for that, maybe reusing only small parts of the domain model for the application design. It depends on the application itself, and the way you analyze its use cases.

For Lojban, like in any other field, a domain model is a valuable and essential artifact in a project, but, by definition, the domain model doesn't depend on the project itself.

Back to the Sapir-Whorf Hypothesis

The SWH is named after the name of two linguists, Edward Sapir (1884-1939) and Benjamin Whorf (1897-1941). It states that the way people think is strongly affected by their native languages. There is controversy on this subject, for example attacks from Noam Chomsky, the father of the generative grammar. Today, the SWH [8, 9 10, 11] is well accepted, in its weak sense.

I am not a linguist, and won't go into deep linguistic debates, but I like this question: "Are all languages equivalent, a means of simple communication? Or is the SWH true: "Do languages shape (or limit, or extend) the way we think?" If language is like a tool to cut reality into slices, a tool to describe reality and think about it, maybe different languages end up with different slices—more precise in some domains and less precise in others.

Let's contemplate what a (so-called) Whorfian effect could be like. People fluently speaking several languages all experience, depending on the situation, that the ideas they want to express are easier to formulate using one of their languages rather than the others. It may depend not on the ideas themselves, but rather on a complex interaction between the idea, the person, the way he/she has learnt these different languages. Let's take another example: in the previous sentence, I have used "he/she" Some languages have a third person pronoun that doesn't depend on the gender. The point is that sometimes a given language, which reflects some culture, which is an historical result of some elaboration process (which never ends), can limit how something is communicated. Note: in Lojban, sumti cmavo has no indication of gender or number.

Much has been written about the SWH, and lots of flame wars took place on the Internet. A conclusion is that the SWH hypothesis is almost impossible to prove (it was the goal at the inception of Lojban: experience how people would start to think differently while learning a new and logical language). People talk more about Whorfian effects; and about Whorfian mind-locks [5, 6], a special case when Whorfian effects are negative.

Experiencing UML Whorfian Effects?

The SWH is very difficult to demonstrate for natural languages. It is very difficult to invent a new culturally neutral language, teach it to people from different cultures, and wait for Whorfian effects to manifest.

Let's consider our software engineering "culture": we have our own languages; we share common knowledge, problems, and solutions. I can talk about some things with any software engineer anywhere in the world, and feel more commonality than discussing with my neighbor next door. But our engineering field is far less complex than a real culture. Anyway, it already has its own history, and has the specificity of being very focused on inventing the best languages. Our goal is always to enhance the way we can solve our engineering problems, by inventing programming languages, like LISP, Prolog, Smalltalk, Ada, Erlang, and so many others; and design and analysis languages like the UML.

I have had some system engineering discussions with people who build very different software than I do. To find some common ground, I suggested modeling a simple fire alarm system for a house. One person, who was used to building software for controlling an aircraft engine unit, considered everything as a control loop, with outputs giving feedback for modifying what to do with the inputs. Another considered everything as working like function chains with filters. So, even in our software (or software intensive systems) engineering culture, many different subcultures can be found. Each of these uses a different representation and codifies knowledge using special languages.

From this example, and many other examples, we can infer that there is a strong relationship between a language and an "engineering culture" (what engineers do and think). We are very close to the SWH here, in its weak formulation. Other questions about such a possible Engineering SWH: Does the engineering language we practice limit the solution space we can explore? (This is a strong negative formulation.) Does the engineering language we practice shape the way we engineer and determine the solutions we can imagine?

In software engineering, the UML is today's language of choice for analysis and design. And since the UML is a language, could there be something like a UML Whorfian effect? Faced with a complex problem to solve, what do you visualize in your mind? If your reflex, as an engineer, is to create (mentally or physically) UML models and diagrams, then I think you are directly experiencing a (positive) Whorfian effect.

Students are now taught the UML at school. For the rest of us who have been in the field for some time, we discovered OO programming, then OO design, and then OO analysis. We read and participated in the elaboration of methods and visual representations of our design results and analysis results. We experienced a paradigm shift in how we implement and think about our engineering practices. A language is not only something we learn, it is at foremost something we practice. Practicing the UML gives new reflexes for solving problems.

Now, wouldn't the UML, by its very nature, eliminate solution paths one could take to solve a problem? If this is the case, you're experiencing a Whorfian effect, too, but a negative one. Such negative Whorfian effects, or Whorfian constraints, should drive UML enhancements. There is another difference between an "engineering culture" and a "real culture": as engineers, we have much more freedom to change our communication languages. Doing so is an engineering activity on its own.

Be Liberated from Whorfian Mind-locks

We have seen that since the UML is a language, there could be UML Whorfian effects (positive or negative). Positive ones would be that learning and practicing the UML might enable us (by giving us new mental structures) to view the (software/systems) world differently. Or, more simply put, UML might enable us to practice engineering differently.

Eric Steven Raymond, as a theorist of the free software movement, is well known for "The Cathedral and the Bazaar," [12] and less known for "Tolkien's Tengwar: A romantic orthography for Lojban" [7]. In his Jargon File [5], Raymond defines Whorfian mind-locks (Jeff Prothero's term [6]):

"Software designs are sometimes restricted in avoidable ways by mental habits a developer has picked up from a particular language or environment (perhaps a now-obsolete one) and never discarded."

An example of that is the well-known joke:

"Good FORTRAN programmers can program in FORTRAN with any programming language."

Maybe the UML could liberate us from some Whorfian mind-locks.

What would a UML Whorfian effect feel like? Actually, nobody really knows. I think that it almost happened to me when I started to learn Lojban. Be warned that it could happen to you, dear super-modelers. Then share it when it does! Maybe it simply would instantiate itself as a release of some old Whorfian mind-lock.

Be warned, too, that the UML is not the "final word" in software engineering. Don't get caught in UML mind-locks when it comes to imagining new solutions for new problems.

ki'e .i co'o

To finish, just in case you're lost in Lojbanistan during your next holidays, here is a Lojbanic survival kit:

  • coi (hello)
  • mi na jimpe (I don't understand)
  • mi xagji (I am hungry)
  • ma do cmene (what's your name?)
  • mi prami do (I love you)
  • ki'e (thank you)
  • co'o (bye)
  • ko ko kurji 6 (take care of you)


Acknowledgements

Catherine Southwood really helped improve the English in this article, and made many useful suggestions. Many thanks to her! .i ki'e doi. katrin.


References

Books

[1] John Cowan. The Complete Lojban Language. A Logical Language Group Publication, 1997. (Partially available online.)

[2] Nick Nicholas and John Cowan. What is Lojban? .i la lojban. mo. A Logical Language Group Publication, 2003.

[3] Robin Turner and Nick Nicholas. Lojban for beginners. http://www.opoudjis.net/lojbanbrochure/lessons/book1.html

Web Sites

[4] The Lojban official website: http://www.lojban.org

Online Articles

[5] Eric Steven Raymond's Jargon file extract: http://catb.org/~esr/jargon/html/W/Whorfian-mind-lock.html

[6] Jeff Prothero's original thought: http://www.lojban.org/files/papers/4thtense

[7] Eric Steven Raymond's article: Tolkien's Tengwar: A romantic orthography for Lojban http://catb.org/~esr/tengwar/lojban-tengwar.html

[8] What is Lojban? (and the SWH): http://www.lojban.org/files/draft-textbook/lesson01

[9] Lojban and the SWH, discussions: http://www.lojban.org/files/why-lojban/swh.txt

[10] Presentation of the SWH and compilation of links: http://www.usingenglish.com/speaking-out/linguistic-whorfare.html

[11] And of course, Lojban, UML, and the SWH can be found in the Wikipedia: http://www.wikipedia.org

Other Web Sites and Articles

[12] http://www.catb.org/~esr/writings/cathedral-bazaar/ Eric Steven Raymond's seminal essay about the open-source hacker culture.

[13] http://www.uea.org The World Esperanto Association.

[14] http://www.langmaker.com about Model Languages & The Art of Language Making (Conlang).

[15] http://www.elvish.org The Elvish Linguistic Fellowship.

[16] http://www.kli.org The Klingon Language Institute.

[17] Wanted: A World Language, by Edward Sapir, 1931: http://www.langmaker.com/sapir.htm


Endnotes

1. See www.lojban.org [4]
2. See www.uea.org [13]
3. See www.langmaker.com [14]
4. See www.elvish.org [15]
5. See www.kli.org [16]
6. ko ko kurji is the same a ko kurji ko (only the sumti order counts in a bridi, not their absolute place). ko is the imperative do.
From the Lojban FAQ: "ko kurji do" commands "Take care of you(rself)" but "ko kurji ko" commands both that "You take care of yourself," and "Allow yourself to be taken care of by you," with a resulting double emphasis that indicates an especial priority or responsibility for self-focus.