Tag: research

How to efficiently perform a literature review

Research is to see what everybody else has seen, and to think what nobody else has thought.
— Albert Szent-Gyorgyi (Hungarian Biochemist, 1937 Nobel Prize for Medicine, 1893-1986)

I got the idea to document the way I read papers from a discussion with Lode (he uses mostly the same approach as I do). I’m pretty certain this is obvious for many people, but I hope it may still be useful for others.

In this post I focus on processing lots of (related) papers in the least amount of time. I will not discuss how you find interesting papers, since this is fairly easy. Reading prestigious conference proceedings or journals in your field usually allows you to end up with a couple of seed papers that are representative for your specific area of interest. If you do research in ubiquitous computing, reading the proceedings of Ubicomp, Pervasive or Percom, and journals such as Personal and Ubiquitous Computing will get you along quite nicely. If you focus on the human-computer interaction aspects of ubiquitous computing, throw in the proceedings of CHI and UIST, together with the TOCHI and Human-Computer Interaction journals. The papers you find here will be stepping stones to related papers (and conferences) using the simple technique I will discuss in this post.

I estimate that of all papers I read, I only process 1 out of 5 in full detail. Most of the time when I find a paper that seems interesting, I start with skimming the paper briefly. I focus mostly on the abstract, the last part of the introduction (which should list the contributions), relevant pictures in the main matter, and the conclusion/discussion. I do several passes over the paper to validate my early understanding of what the authors did. I also go through the related work and references. This allows me to see if the authors refer to work that I don’t know but which could be interesting (which I mark for later processing). This will usually take little more than 5 minutes.

Now I have a pretty good idea whether the paper is interesting to me. If it’s not, I just go through the interesting references (if any) using the same process and stop there. If it is interesting, I go one step further, treating the paper as a seed paper that allows me to find other relevant work.

If the paper is not too recent, there will probably be (lots of) other articles that refer to this paper. Even if the paper is fairly new, it is still worth looking up papers that already refer to it. Looking up these articles allows me to discover relevant related work and to see how others critically describe this paper. A critical review could contradict of the results of the paper (e.g. through replication, such as Shumin Zhai’s work on target expansion), or might offer new insights (such as suggesting a new direction for research related to the original paper, as Yvonne Roger’s Ubicomp’06 paper did for Mark Weiser’s original Ubicomp paper). Both types of critical reviews are very useful for your own understanding of the paper.

I usually use Google Scholar to find citations (and downloadable versions of articles), since it indexes the major publisher databases (e.g. the ACM Digital Library, IEEE Xplore or SpringerLink):

After entering a title in Google Scholar, it presents you with a list of matching papers. Below each item, there is a link with the number of papers that cite it (indicated in orange in the figure below). In this case there are 118 citations:

After clicking on this link, Google Scholar shows all articles that cite the previous one, sorted by their relevance (in other words, the number of times these articles are themselves cited):

This approach to reading papers has one disadvantage: your reading list will quickly grow. You can end up with 30 more papers to read starting from just 1 interesting seed paper (which is what happened to me when I looked for papers that cited Intelligibility and Accountability: Human Considerations in Context-Aware Systems by Victoria Bellotti and Keith Edwards). In my opinion, this is actually an advantage as it allows you to quickly find many related (and hopefully interesting) papers. Furthermore, because you’re skimming papers, you never spend more than 5 minutes on papers that are not relevant or interesting.

After I have found a significant number of papers that I want to read, I of course will go through them in more detail. Papers that are very relevant for my work or give me the feeling they deserve a more thorough reading (e.g. because they provide many new insights), will be read from beginning to end. A few examples that I also discussed in my blog are: Range: Exploring Implicit Interaction through Electronic Whiteboard Design, Evaluating User Interface Systems Research and two chapters from Beyond the Desktop Metaphor: Designing Integrated Digital Work Environments.

A few related guides:
* Research Techniques by Alan Dix (especially the slides about gathering information)
* Reading AI from How to do Research At the MIT AI Lab

Thanks to Lode for having a look at an early draft of this post, and for providing the example of replication in HCI by Shumin Zhai.

Wendy Ju’s implicit interaction framework

I recently read an interesting CSCW 2008 paper by Wendy Ju: Range: Exploring Implicit Interaction through Electronic Whiteboard Design. She describes a framework for implicit interaction and applies it to the design of an interactive whiteboard application called Range.

The paper is situated in the field of ubiquitous computing. The goal of Mark Weiser‘s vision of ubiquitous computing was calm computing, where calm reflects the desired state of mind of the user. Invisibility in ubicomp is more about enabling seamless accomplishment of task than staying beneath notice. Just as a good, well-balanced hammer “disappears” in the hands of a carpenter and allows him or her to concentrate on the big picture, computers should participate in a similar magic disappearing act. Calm computing moves between the center and the periphery of attention. The periphery informs without overwhelming the user, but the user can still move to the center to get control. The implicit interaction framework presented in this paper contributes to how calm computing can be effectively realized. It allows to reason about the way users can mitigate actions the system takes, and is a good complement to Eric Horvitz’s work on mixed-initiative interaction.

Implicit interactions enable communication and action without explicit input or output.One way that an action can be implicit is if the exchange occurs outside the attentional foreground of the user (e.g. auto-saving files, filtering spam, and ubicomp interaction). The other way is if the exchange is initiated by the computer system rather than by the user (e.g. an email alert, screen saver, etc.). Although it may seem strange that something that grabs attention is implicit, the key factor is that the interaction is based on an implied demand for information or action, not an explicit one.

The implicit interaction framework divides the space of possible interactions along the axes of attentional demand and initiative:


Attentional demand is the degree of cognitive and perceptual load imposed on the user by the interactive system. Foreground interactions require a greater degree of focus, concentration and consciousness and are exclusive of other focal targets. Background interactions are peripheral, have less demand and can occur in parallel with other interactions.

Initiative is an indicator of how much presumption the interactive system uses in the interaction. Interactions that are initiated and driven by the user explicitly are called reactive interactions, while interactions initiated by the system based on inferred desire or demand are proactive interactions.

The implicit interaction framework builds on Bill Buxton’s background/foreground model. Buxton’s model assumes attention and initiative are inherently linked. On the contrary, this framework decouples attention and initiative in two separate axes. Buxton’s foreground corresponds to the reactive/foregound quadrant, while his background corresponds to the proactive/background quadrant.

An example: a word processing program that …

  • auto-saves because you command it to is situated in the reactive/foreground quadrant
  • auto-saves because you have set it to do so every 10 minutes is situated in the reactive/background quadrant
  • auto-saves because it feels that a lot of changes have been made is situated in the proactive/background quadrant

Being proactive means the word processing program is acting with greater presumption with respect to the needs and desires of the user.

Designers can manipulate the proactivity and reactivity by (1) dictating the order of actions (does the system act first or wait for the user to act?); or by (2) choosing the degree of initiative (does the sytem act, offer to act, ask if it should act, or merely indicate that it can act?); or by (3) gathering more data to ensure the certainty of the need for an action or when they design features to mitigate the potential cost of error for the action. Even in the reactive realm, the degree of initiative can vary based on the amount that the user needs to maintain ongoing control and oversight of an action in progress.

Ju discusses 3 implicit interaction techniques:

  • user reflection
  • system demonstration
  • override

User reflection is how the system indicates what it feels the users are doing or would like to have done.

A good example are modern spell-checking programs. Early versions had to be invoked explicitly, and engaged the user in an explicit dialog about potentially misspelled words to repair. Current spell-checking programs run continuously in the background allowing users to more easily notice potential errors. The implicit alert of this interaction is far more seamless than that of earlier spell-check programs. A similar example is the continous compilation used by modern IDEs such as Eclipse and Visual Studio. Earlier programs only showed compile errors when the user explicitly activated the compile command.

System demonstration is how the system shows the user what it is doing or what it is going to do.

In Range, the whiteboard animates its transition from ambient display mode (where it displays a set of images related to the workspace) to drawing mode (where users can make sketches and diagrams) as a demonstration-of-action that calls more attention to the mode change than a sudden switch would, and provides a handle for override (see later).

Override techniques allow users to repair misinterpretation of the user’s state or to interrupt or stop the system from engaging in proactive action.

This usually occurs after one of the previous techniques (user reflection or system demonstration) alert the user to some inference or action that is undesirable. Override is distinct from “undo” because it is targeted at countering the action of the system rather than reverting a command by the user.

An example of override in Range is that in the transition between modes users are able to “grab” digital content to use it as part of the whiteboard contents, or to stop the motion of objects that are being moved to make space for drawings.

The main contribution of this framework compared to prior models for implicit interaction lies in the key variable of initiative. Without this variable, it would not be possible to distinguish user reflection techniques from system demonstration techniques or to map the role of override.

In conclusion, a very interesting paper that offers a framework to reason about proactive user interfaces and make sure that users are always in control.

AVI 2008

Although it’s a bit late (almost a month after the facts), I finally found some time to blog about Advanced Visual Interfaces (AVI) 2008 in Naples, where Jan presented our paper about Gummy.

Gummy title slide at AVI 2008

I liked it very much: the conference had good quality papers but was still reasonably small (around 150 attendants), and of course the weather and the Italian food were great We arrived on Tuesday which gave us some time to explore the city and take the ferry to Capri (a great suggestion by Robbie).

Piazza del Plebiscito

Vesuvius

Capri

I am not going to discuss the conference program into detail this time, but will just highlight a couple of interesting papers. Possibly one of the coolest papers was “Exploring Video Streams using Slit-Tear Visualizations” by Anthony Tang (video). Another presentation I enjoyed was “TapTap and MagStick: Improving One-Handed Target Acquisition on Small Touch-screens” by Anne Roudaut (video). It seems there is lots of related work in this area (e.g. Shift, ThumbSpace, etc.). Peter Brandl presented two interesting papers: Bridging the Gap between Real Printouts and Digital Whiteboards and Combining and Measuring the Benefits of Bimanual Pen and Direct-Touch Interaction on Horizontal Interfaces. He was brave enough to do an impressive live demo for the first paper Oh, and he also covered the conference in a blog post.

The first paper in our session, titled “A Mixed-Fidelity Prototyping Tool for Mobile Devices” by Marco de Sá, introduced a tool to easily design prototypes and evaluate them in real-life situations. The system was well thought out and serves a real need. I can imagine that we could use this kind of tool in a user-centered UI design course. The second paper in our session was “Model-based Layout Generation” by Sebastian Feuerstack. I already met Sebastian at CADUI 2006. They presented a generic layout model based on constraints. It reminded me a bit of the layout model Yves worked on for our EIS 2007 paper. They used the Cassowary constraint solver, which I also used for my MSc thesis on constraint-based layouts for UIML. Sebastian told me he got the idea from my demo at CADUI 2006. I forgot to add a certain constraint (the layout of the UI was thus underconstrained), which by coincidence had no effect on the user interface everytime I tested it. Of course, when I showed the demo it did have an effect This clearly illustrated that constraint solvers are sometimes unpredictable (see Past, present, and future of user interface software tools by Myers et al.). Sebastian’s solution to this problem was to hide the constraints from the designer and generate them automatically from a graphical layout model.

Juan Manuel Gonzalez Calleros — who I met at CADUI 2006, TAMODIA 2006 and a few other occasions — presented a poster and a paper at the workshop on haptics. He took a few pictures while Jan was presenting (thanks again Juan!). Here are Juan and Jan discussing UsiXML vs UIML

Jan and Juan discussing UsiXML vs UIML

Overall, the comments on our work were positive, although of course one of the biggest problems is still the lack of support for multi-screen interfaces. As Jan is actively hacking on Gummy these days, I don’t think it will take very long for this to be included in the tool

Back to the future: Smalltalk

I spent some time last weekend looking into Smalltalk again. The first time I did this was somewhere around 2004, when I played around with Ruby and discovered that it was strongly influenced by Smalltalk. Back then I watched an old video by Dan Ingalls on object-oriented programming which finally made me fully understand the essence of OOP: it’s all about messaging

[googlevideo:http://video.google.com/videoplay?docid=-2058469682761344178]

In my personal opinion, this video (or at least the message that Dan tries to communicate) should be better integrated in OOP courses at universities. Another invaluable resource for grasping these ideas is Design Principles Behind Smalltalk, again by Dan Ingalls. Of course, it’s difficult to understand what OOP is about if you have to learn it through a weak implementation. We learned the basics of OOP in C++ for example, which would be blasphemy to Alan Kay He once said Actually I made up the term object-oriented, and I can tell you I did not have C++ in mind. Here’s his definition of OOP:

OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.

Before I looked into Smalltalk, to my understanding objects just contained a bunch of methods or functions that had access to the object’s context. I did not really grasp the idea that objects just respond to messages (or method calls in my definition). The real difference about this is that in Smalltalk messages are dynamically dispatched at runtime. A method is the function or subroutine that is invoked in response to the sending of a message, which will be matched to the message name (or selector) at runtime. In contrast, method calls in C++, Java and C# are statically bound at compile-time. There is thus a distinction between the semantics (or message) and implementation strategy (or method) in Smalltalk. Decoupling these allows for more flexibility, such as objects that cache all incoming messages until their database connection is fully set up, after which they replay these messages, or objects that forward messages to other objects (which might have even been passed in at runtime). This is of one of the aspects of extreme late binding in Alan Kay’s definition of OOP.

It’s exactly this run-time lookup of methods that enables effortless polymorphism. As explained in the video, at some point the intermediate factorial result will become an instance of LargeInteger, while in previous iterations it was an instance of SmallInteger. The multiplication message (*) is sent to this object, after which the correct method in the class LargeInteger is looked up for handling the message, allowing the existing code to continue to work. Java, C# and C++ have all inherited this feature (although C++ requires explicitly declaring methods as virtual for this to work, due to efficiency reasons). Smalltalk can even realize polymorphism without inheritance (also known as duck typing), although this is not shown in this video. Smalltalk has implicit interfaces: an object’s interface is the messages it responds to. If two objects both respond to a certain message, they are interchangeable (even at runtime). Traditional languages such as Java or C++ only support inheritance-based polymorphism (although something similar to duck typing can be achieved with C++ templates). Here’s the explanation by Dan Ingalls:

Polymorphism: A program should specify only the behavior of objects, not their representation.

A conventional statement of this principle is that a program should never declare that a given object is a SmallInteger or a LargeInteger, but only that it responds to integer protocol. Such generic description is crucial to models of the real world. Consider an automobile traffic simulation. Many procedures in such a system will refer to the various vehicles involved. Suppose one wished to add, say, a street sweeper. Substantial amounts of computation (in the form of recompiling) and possible errors would be involved in making this simple extension if the code depended on the objects it manipulates. The message interface establishes an ideal framework for such an extension. Provided that street sweepers support the same protocol as all other vehicles, no changes are needed to include them in the simulation:

More details on the differences between Smalltalk and current OOP languages are explained in Smalltalk: Getting The Message. I believe that understanding the original philosophy behind OOP helps you be a better object-oriented programmer in any language. Ramon Leon discusses the common mistake of magic objects which is an interesting read.

But let’s get to the point of why I started looking into Smalltalk again. At the moment, I mostly program in C# (and sometimes in Java), but I often feel frustrated with both languages. After being exposed to Ruby and Python, I feel like static typing requires me to write too much code and helps the compiler more than it helps me. Furthermore, Java seems to be overly engineered with all the factories, manager, readers and writers, while C# is often inconsistent or lacking in its implementation (e.g. anonymous methods are not really closures). Both languages are becoming increasingly complex with the addition of more and more features. Generics for example is just not necessary in a dynamically typed language. The problem with scripting languages such as Ruby and Python however, is that they are often interpreted and slow. I experimented a bit with JRuby (a Ruby implementation in Java with full access to Java’s class library) but that didn’t satisfy my needs either. After trying to code a simple Hello World Swing application in JRuby, I was stunned that it still required me to wrap code inside an ActionListener like Java does, while I really just wanted to pass in a Ruby block.

Update:: Nick Sieger pointed out that a newer version of JRuby does allow blocks to be passed in.

Other people have also been struggling with languages such as Java or C# (e.g. Jamie Zawinski, Mark Miller and Steve Yegge) or are looking for alternatives (e.g. Martin Fowler and Tim Bray). I think the popularity of Ruby might motivate more people to have a look at Smalltalk. Furthermore, if you know Ruby, it’s easier to get acquainted with Smalltalk. Besides lots of similarities in the class library (the Kernel class, the times message on numbers, etc.), Ruby already introduces the notion that everything is an object, objects in Ruby communicate through messages and Ruby has blocks. However, Ruby is not really equivalent to Smalltalk yet. Ruby introduced extra syntax to be more familiar to people that were used to C-style programming languages, thereby losing part of Smalltalk’s flexibility. In fact, the beauty of Smalltalk is that its entire syntax easily fits on a postcard. If you look closely at this example, even a conditional test in Smalltalk is implemented using messaging on objects. You just send the message ifFalse to an instance of the class Boolean, and pass in a code block you want to have executed when the value is false. It’s turtles all the way down.

Another problem I came across when developing in Java or C# (or in any other OOP language I used) was the difficulty of changing class hierarchies. Very often, due to time constraints, a design is just left in its original state, and the new requirements are supported by performing a quick hack. I suspect this problem is especially prevalent in so-called “research code” It gets even worse when programming in teams. Although this problem is generally known in software engineering and several strategies have been proposed to deal with it, I wondered why the promise of OOP failed here. Wasn’t OOP supposed to improve the situation and make spaghetti code obsolete?

Jeffrey Massung asked himself a similar question: What if the philosophy (OOP) wasn’t the problem, but the implementation (language) was?, and decided to write a 2D DirectX game in Smalltalk. It seems Smalltalk did indeed allow for easier design changes. Self (a language derived from Smalltalk) tries to alleviate the aforementioned problem by specializing through cloning of existing objects instead of through class hierarchies. It’s funny to note that the problem wasn’t that bad in Smalltalk, since you could still easily change the hierarchy, unlike in languages such as Java or C++.

The real power of Smalltalk is not its syntax, but the entire environment. I believe this is also key to understanding OOP. The current languages and tools (e.g. IDEs) we use for doing object-oriented programming are just weak implementations of the original Smalltalk environment. When working in Smalltalk, you are working in a world of running objects, there are no files or applications, everything is an object. For example, version control systems in Smalltalk are actually aware of the semantics of your code, they are not just text-based. When merging code they can show you what methods have been changed, added or removed, what classes were changed, allow you to decide which changes you want to keep, etc.. Although I think Bazaar is a great, it doesn’t come close to this way of working. Smalltalk allows live debugging and code changes, which is tremendeously useful. Ever wished that you could fix a problem while you’re debugging and immediately check if your solution works without having to recompile your application and start the entire process again? In Smalltalk (and Lisp) that’s possible. If you want to find out more about why Smalltalk is way ahead of current mainstream OOP languages, have a look at Ramon Leon’s Why Smalltalk.

Update:: Scott Lewis commented that I should have emphasized that Smalltalk is mostly written in Smalltalk: So when you subclass any object, you can go back up the chain of inherited objects and see how everything works. Likewise when you hit an error/bug, the debugger lets you delve about as deeply as you could possibly want into what is going wrong, and why it is an error. This is indeed a powerful aspect of Smalltalk, and an example of how it was influenced by Lisp.

Besides reading about Smalltalk, I have also been experimenting a bit with Squeak. Squeak is an open source implementation of the Smalltalk programming language and environment, created by its original designers. Squeak runs bit-identical on many platforms (including Windows CE/PocketPC). I will leave my Squeak experiments for another blog post though

To conclude, it seems that we are very good at ignoring the past. We just take our current systems for granted, and use them as a reference frame for future innovations. Marshall McLuhan once phrased it like this: We drive into the future using only our rearview mirror. I believe this is true in HCI research as well, as people like Dan Olsen have pointed out. He argued that our existing system models are barriers to the inclusion of many of the interactive techniques that have been developed. He gave the example of the recent surge in vision-based systems and multi-touch input devices, which get forced in a standard mouse point model because that is all that our systems support:

Multiple input points and multiple users are all discarded when compressing everything into the mouse/keyboard input model. Lots of good research into input techniques will never be deployed until better systems models are created to unify these techniques for application developers.

Research on toolkits is a lot less popular these days. We try to map everything into existing models, and always feel like we have to support legacy applications, which hampers significant progress. Bill Buxton has also studied innovation in HCI, and questioned the progress we made in the last 20 years.

I think the reason why so many great work was done by the early researchers in our field (e.g. Ivan Sutherland, Douglas Engelbart and Alan Kay) is — besides that they were very creative and intelligent people — that there was not that much previous work, they just had to start from scratch. Alan Kay once asked Ivan Sutherland how it was possible that he had invented computer graphics, done the first object oriented software system and the first real time constraint solver all by himself in one year, after which Sutherland responded I didn’t know it was hard.

Gummy UI improvements

I am currently working together with Jan on improving the Gummy tool (website still under construction). We have come a long way since Jan wrote the first version for his Master’s thesis. I figured it might be interesting to share a few screenshots of different phases in the development:

Here’s the first version (June 2007):

Gumme screenshot of June 2007

A version with roughly the same UI but lots of architectural improvements (November 2007):

Gummy around the end of 2007

The current version with an improved UI (March 2008):

Gummy screenshot of March 2008

I think the most significant improvement is the new toolbox. Designers can now more easily distinguish between different widgets. In previous versions, some widgets were hard to distinguish. We added labels to each widget in the toolbox, and improved the inline rendering of the widgets. Notice that these images are still not predefined icons, but real widgets rendered to a bitmap. However, in the current version we chose to render them at their optimal size and then scale the images down.

Rendering widgets to bitmaps also forced us to migrate to Windows. We started our development on GNU/Linux using Mono but had to switch due to an annoying bug in Mono’s implementation of Control.DrawToBitmap. Future versions should be able to run in Mono again when this bug is fixed though.

Update: Jan sent me a screenshot of an even older version of Gummy, dated back to December 19, 2006:

Gummy screenshot of December 19, 2006