book – In Traction

January 20, 2008

SmartKom

At Ubicomp 2007, there was a book stand by Springer just outside the conference room. On the last day, the volunteer behind the stand told me that I could choose one of the books that were still lying there. I didn’t see anything interesting at first. Since a few people at our institute are working on multimodal systems, I picked the book SmartKom: Foundations of Multimodal Dialogue Systems.

During the holidays, I read the first part of the book and noticed the book was relevant for me after all. SmartKom was a large four-year project about multimodal dialogue systems. They developed a system that provides symmetric multimodality in a mixed-initiative dialogue system with an embodied conversational agent. There is also a follow-up project that should ends in 2007: SmartWeb. SmartWeb goes beyond SmartKom in supporting open-domain question answering using the entire (Semantic) Web as its knowledge base.

Symmetric multimodality means that every input mode (e.g. speech, gesture, facial expression) is also available for output, and vice versa. Multimodal interaction is one way to make interaction between humans and computers more intuitive. Human dialogue is not only based on speech but also on nonverbal communication such as gesture, gaze, facial expression, and body posture. One of the major characteristics of human-human interaction is the coordinated use of different modalities (e.g. allowing all modalities to refer to or depend upon each other). Symmetric multimodality combined with a mixed-initiative conversational agent results in more intuitive interaction. The SmartKom systems reduces recognition errors by modality fusion. By considering multiple input modalities together (e.g. speech, facial expression and gesture), the system can more correctly estimate the user’s intention.

SmartKom has been used in several application scenarios: in public telephone booths, home entertainment systems, mobile systems and in a car environment. The last part of the book discusses techniques to evaluate multimodal dialogue systems, which should be an interesting read.

January 7, 2008

Creativity and scientific thinking

During the holidays, I spent some time reading about creativity and the basic principles of scientific research.

We (researchers) are supposed to come up with innovative ideas, but no one ever told us how to do that exactly. Great ideas are often said to be discovered by accident. People assume creativity is a talent, something you’re good at or bad at. However, according to Edward De Bono, creativity is a process we can steer. He came up with the concept of lateral thinking, which consists of a set of techniques to deliberately shift away from our traditional thinking patterns. I am currently reading his book De Bono’s Thinking Course. Although I am still a bit sceptic, let’s see where it leads me

The second topic I had a brief look at is how to do research. I came across a book called On Being A Scientist, which is great to remind you of your responsibility as a researcher. It also discusses a few case studies of dubious scientific methods. Richard Feynman (Nobel Prize in Physics) has another interesting take on misconduct in science, or as he calls it Cargo Cult Science.

A few motivational articles I had a look at are You and Your Research by Richard Hamming and Technology and Courage by Ivan Sutherland. For more specific advice, I always enjoy Simon Peyton Jones’ slides. Finally, I had a quick browse through a list of books every computer researcher should have read by Philip Dutré.

December 2, 2007

Making things talk

A few weeks ago I came across a blog post by Cati Vaucelle about Making Things Talk, the new book by Tom Igoe. The book deals with building smart, communicating things. It is built up out of specific projects and uses practical examples to explain different technologies. Tom works at NYU ITC (where Adam Greenfield also works).

Through a series of simple projects, this book teaches you how to get your creations to communicate with one another by forming networks of smart devices that carry on conversations with you and your environment. Whether you need to plug some sensors in your home to the Internet or create a device that can interact wirelessly with other creations, Making Things Talk explains exactly what you need.

The book seemed really useful to me to learn how to build smart things and prototype a ubicomp environment. Unfortunately I was never really exposed to electronics, so this might be a good way to catch up I pointed Kris at the book who ordered a copy afterwards. I had a quick look at it, and I must say it is well-written and fun to read. You need some hardware to really dive in though.

The author uses Processing and Arduino as the basic building blocks. I was pleasantly surprised that the programming environment works perfectly under Mac OS X and GNU/Linux (while it also supports Windows). I would also like to experiment with it at home, for instance to build a remote-controlled mood light Apparently a Wii Nunchuk is also pretty popular for connecting to Arduino as it sports a 3-axis accelerometer, joystick and two buttons for under 20$ and uses the I2C protocol.

November 20, 2007

Beyond the desktop metaphor: Lifestreams and Haystack

I spent part of my lazy Sunday on reading a few articles in Beyond the Desktop Metaphor: Designing Integrated Digital Work Environments, a book that Kris dropped on my desk a few weeks ago. It gives an overview of the state-of-the-art in integrated digital work environments and is edited by Victor Kaptelinin and Mary Czerwinski.

I went through the chapters on Lifestreams by Eric Freeman and David Gelernter and Haystack by David R. Karger.

Lifestreams was an alternative to the desktop metaphor that was developed starting in 1994 and aimed to be a better way to organize your personal electronic information. One of the primary motivations for this work are the limitations of a static (hierarchical) filesystem. The problem with organizing our documents in the filesystem hierarchy is that information generally falls into fuzzy categories and that it is impossible for users to generate categories which remain unambiguous over time. Furthermore, users are forced to name their files, which often results in meaningless file names such as “draft1.doc” and “draft2.doc”. Names are an ineffective way of categorizing information, since their value decays over time. Traditionally, people do not name their documents as pointed out by Thomas Malone in his paper How do people organize their desks? Implications for the design of office information systems. He noticed that people often just create nameless stacks of related documents on their desk. Freeman and Gelernter discuss a few other problems with the desktop metaphor, such as no support for archiving, reminding and summarizing. The desktop metaphor does not make it easy to archive information, to put information somewhere we can later retrieve it but also remove it from our periphery. Users often place information on their desktop to remind them of tasks to do or leave an email in their inbox to remind them that they still need to reply to it. As the desktop has no semantic notion of reminding, users are just working around the system. Finally, summaries are needed in order to cope with all our electronic information. The authors state that summaries are often application-centric (e.g. an overview of your photo albums, an summary of your music, etc.), instead of system-wide.

I found it interesting that the authors do not see their architecture as another metaphor, but as a unified idea or system. They refer to Nelson’s concept of virtuality as opposed to metaphorics. Nelson (who also coined the term hypertext) argues that adherence to a metaphor prevents the emergence of things that are genuinely new. Trying to adhere to a metaphor may lead to strange results when new functions are added, for example having the drag a CD icon to the trash to eject it on Mac OS X.

A lifestream is a time-ordered stream of documents that functions as a diary of a user’s electronic life. Every document he or she creates is stored in the lifestream. Moving forward from the tail to the present, the stream contains more recent documents. Moving beyond the present into the future, the stream contains documents that the user will need (e.g. reminders, calendar items, etc.). The system has a few primitive operations that together support transparent storage, organization through directories on demand, archiving, reminding and summaries: new, copy, find and summarize. New and copy are used to create or copy documents in the lifestream or between lifestreams. Documents do not have to be named. The find operation allows users to search their documents. It creates a substream with the results of the query. These substreams are not static, but are updated on the fly whenever new documents that are relevant to their query appear. Users can allow substreams to persist, in order to quickly find information they need regulary (e.g. “emails from Joe”). Finally, summarize compresses a substream into an overview document. The method of summarizing varies according to the content of the substream (e.g. a music playlist, a prioritized to-do list, etc.). The figure below shows the Lifestreams user interface:

It’s interesting to see that many of the ideas first explored in Lifestreams are currently supported by several applications. Archiving was one of Gmail‘s defining characteristics (“never lose a message again!”) when it was first released. Apple’s iApps such as iTunes offer summarization, dynamic substreams (“smart playlists”) and time-based visualizations. Desktop search tools such as Google Desktop, Apple Spotlight and Beagle offer a way to quickly find items on your computer. Some of them also offer saved searches (which is again similar to “dynamic substreams”). The authors also discuss this evolution. However, they feel that desktop search, while definitely a step in the right direction, is not sufficient. It only works if you know what to look for. People really need good browse engines instead of search engines. This statement is also made in the next chapter on Haystack where it is called orienteering.

Haystack can be seen as a generalization of Lifestreams. Haystack is a way to visualize and organize a user’s information, but does not restrict the visualization and categorization to be time-based. The authors try to find a solution for the fact that current applications force users to manage information in the way that the application designer envisioned it. This might not be the most natural way for the users, so Haystack gives the users more control over what kinds of information they store and how to visualize and manage it. In traditional email applications for example we can only categorize by the labels that are predefined (e.g. sender, subject. etc.), but not by our own features such as “needs to be handled by such-and-such a date”. The information may even be in the application, but no appropriate interface is offered to use it. Furthermore, every application manages its own data independently while we might want to relate data from different applications together (e.g. emails, articles, blog posts, pictures, songs, people, etc.). A user might also want to add a new data type. Consider the location field in a calendar event: this is just a string, while the user might want a richer presentation (Google Calendar can do this by linking to Google Maps by the way). Existing applications are very bad at extending existing types, since they offer no way of displaying the type, no operations for acting on it and no way of connecting them to other information objects in the application.

Haystack has a generic user interface architecture that supports impressive personalization. Users can for instance create a new “Send to Joe” operation by filling in part of the “Send to” operation, and saving it. Objects can be dragged upon each other to connect them: dragging an object onto a collection adds it to the collection, while dragging an object onto a dialog box arguments binds that argument to the dragged item.

Custom workspaces can be constructed by drag and drop. The figure below shows a workspace specialized for writing a particular research paper, presenting amongst others relevant references, coauthors and outstanding to-do’s.

The system uses Semantic Web technology (more specifically RDF and URIs) to represent information objects, their attributes and relationships to other information objects. However, they do not enforce schema such as RDFS or OWL) in order to allow users to organize information the way they want. It is after all difficult to create an ontology that serves everyone’s needs. Consider for example the composer attribute of a symphony concept. A reasonable constraint is to restrict composers to be people. But this will prevent a user that is interested in computer music from entering a particular computer program as the composer. The authors state that schemata may be of great advisory value, but they argue against enforcing them. Apparently this is also known as a semi-structured data model.

I think this is the most impressive Semantic Web application I have seen, although I am also looking forward to test Twine and Powerset. I have barely touched upon everything that Haystack can do in this blog post so if you are not yet convinced, have a look at a paper that is pretty similar to the book chapter. The level of customization supported by Haystack reminded me of the Meta-UI concept (which I see as a user interface to manipulate an interactive system or its user interface) as discussed by Coutaz at Tamodia’06.

Although Lifestreams and Haystack would certainly improve the way we manage our data, I feel they both ignore an important type of information: information in the physical world. After all, a substantial amount of the information we process is non-digital. Last year, I had a project proposal for the course Actuele Trends in HCI (translated: “Current trends in HCI“) on improving the way we work with digital and physical information. Given that the students had little time for this project, the result was pretty nice.

November 11, 2007

Thoughts on speed reading

On Monday afternoon, I participated in a Smart Reading course together with a few colleagues. Although the basic techniques of speed reading were explained, it left me wanting to know more. Since I don’t feel like paying more than a thousand Euros for a full, three-day course, I started to look for some more information on the topic.

If you want more or less the same information that we received in the course, have a look at this excellent overview of speed reading techniques.

For those of you who want to speed read through information on your computer display instead of in books, there is software available that uses the technique of Rapid Serial Visual Presentation to help you read faster. One of these applications is RapidReader. They have a nice video illustrating that reading faster doesn’t significantly hamper your comprehension:

[youtube:http://www.youtube.com/watch?v=hs6CGBlqulk]

There are also some free alternatives, such as Spreeder (an online speed reader) and dictator.

Apparently there is a yearly contest called the World Championship Speed Reading Competition. The current record holder is Sean Adam with 3850 words per minute with comprehension. There were also some famous people in history that could speed read, including Jacques Bergier and USA presidents John F. Kennedy and Jimmy Carter. There are also some claims of a child prodigy that could read more than 400 000 words per minute, but that might be attributed to her photographic memory.

Although a lot of the claims around speed reading are unrealistic and it is surrounded by the typical vagueness of pseudoscience, the idea still intrigues me. I went to the book shop yesterday and found a few books (some exclusively in Dutch, others translated from English) that seem interesting to have a look at. I also included books on Mind mapping since this is the technique used to summarize the books you read. There is another book in English that seems to be recommended by a few people: Breakthrough Rapid Reading by Peter Kump.

The Dutch books I might have a look are:

Gebruik je hersens Snellezen Mindmappen Gebruik je verstand