Category: Uncategorized

Small update on face detection post

Just a quick update on my previous post. I hooked my code up to a D-Bus daemon, and used it to detect if I was sitting behind my desk or not.

I had to position the camera a bit lower, because it would sometimes not recognize my face when I was looking at the bottom of my screen. Having a dual-monitor setup doesn’t help either, since you should ideally position the webcam in the middle of the two displays to have good coverage. I positioned the webcam in the middle of the two displays at the bottom and tilted it upwards, which gives good results:

I guesstimated that if the system couldn’t detect a face for a period of 10 seconds, I would be away from my desk (or at least not paying attention to the screen). I played around a bit with pygtk and notify-python to create a status icon and show notification bubbles whenever my status changed. Here’s what happens when it detects that I’m away from my desk (notice the bubble in the upper right corner):

When I return, the system will change its status icon accordingly and notify me again:

OpenCV is fun!

Fun with Python, OpenCV and face detection

I had some fun with Gary Bishop’s OpenCV Python wrapper this morning. I wanted to try out OpenCV for detecting faces using a web cam. This could be used for instance to see if someone is sitting behind his desk or not. I used Gary’s Python wrapper since I didn’t want to code in C++.

I didn’t know where to start, so I searched for existing OpenCV face detection examples. I found a blog post by Nirav Patel explaining how to use OpenCV’s official Python bindings to perform face detection. Nirav will be working on a webcam module for Pygame for the Google Summer of Code.

I managed to rewrite Nirav’s example to get it working with CVtypes:

Here’s the code. Although it’s just a quick and dirty hack, it might be useful to others. It requires CVtypes and OpenCV, and was tested on Ubuntu Hardy with a Logitech QuickCam Communicate Deluxe webcam. You will need Nirav’s Haar cascade file as well.

import sys
from CVtypes import cv
 
def detect(image):
    image_size = cv.GetSize(image)
 
    # create grayscale version
    grayscale = cv.CreateImage(image_size, 8, 1)
    cv.CvtColor(image, grayscale, cv.BGR2GRAY)
 
    # create storage
    storage = cv.CreateMemStorage(0)
    cv.ClearMemStorage(storage)
 
    # equalize histogram
    cv.EqualizeHist(grayscale, grayscale)
 
    # detect objects
    cascade = cv.LoadHaarClassifierCascade('haarcascade_frontalface_alt.xml', cv.Size(1,1))
    faces = cv.HaarDetectObjects(grayscale, cascade, storage, 1.2, 2, cv.HAAR_DO_CANNY_PRUNING, cv.Size(50, 50))
 
    if faces:
        print 'face detected!'
        for i in faces:
            cv.Rectangle(image, cv.Point( int(i.x), int(i.y)),
                         cv.Point(int(i.x + i.width), int(i.y + i.height)),
                         cv.RGB(0, 255, 0), 3, 8, 0)
 
if __name__ == "__main__":
    print "OpenCV version: %s (%d, %d, %d)" % (cv.VERSION,
                                               cv.MAJOR_VERSION,
                                               cv.MINOR_VERSION,
                                               cv.SUBMINOR_VERSION)
 
    print "Press ESC to exit ..."
 
    # create windows
    cv.NamedWindow('Camera', cv.WINDOW_AUTOSIZE)
 
    # create capture device
    device = 0 # assume we want first device
    capture = cv.CreateCameraCapture(0)
    cv.SetCaptureProperty(capture, cv.CAP_PROP_FRAME_WIDTH, 640)
    cv.SetCaptureProperty(capture, cv.CAP_PROP_FRAME_HEIGHT, 480)    
 
    # check if capture device is OK
    if not capture:
        print "Error opening capture device"
        sys.exit(1)
 
    while 1:
        # do forever
 
        # capture the current frame
        frame = cv.QueryFrame(capture)
        if frame is None:
            break
 
        # mirror
        cv.Flip(frame, None, 1)
 
        # face detection
        detect(frame)
 
        # display webcam image
        cv.ShowImage('Camera', frame)
 
        # handle events
        k = cv.WaitKey(10)
 
        if k == 0x1b: # ESC
            print 'ESC pressed. Exiting ...'
            break

A known problem is that pressing the escape key doesn’t quit the program. Might be something wrong in my use of the cv.WaitKey function. Meanwhile you can just use Ctrl+C. All in all, the face detection works pretty well. It doesn’t recognize multiple faces yet, but that might be due to the training data. It would be interesting to experiment with OpenCV’s support for eye tracking in the future.

Update: the script does recognize multiple faces in a frame. Yesterday when Alex stood at my desk, it recognized his face as well. I think it didn’t work before because I used cv.Size(100, 100) for the last parameter of cv.HaarDetectObjects instead of cv.Size(50, 50). This parameter indicates the minimum face size (in pixels). When people were standing around my desk, they were usually farther away from the camera. Their face was then probably smaller than 100×100 pixels.

Just a quick note on ctypes. I remember when I created PydgetRFID that I tried to use libphidgets’ SWIG-generated Python bindings, but couldn’t get them to work properly. I had read about ctypes, and decided to use it for creating my own wrapper around libphidgets. Within a few hours I had a working prototype. When you’re struggling with SWIG-generated Python bindings, or have some C library without bindings that you would like to use, give ctypes a try. Gary Bishop wrote about a couple of interesting ctypes tricks to make the process easier.

Bzr vs git, the sequel

A while ago I noticed Jordan Mantha repeated my old bzr vs git benchmark.

I was curious to see if things changed. It seemed that both systems have improved in speed, but there were no shocking results.

There are some differences between our experiments though. Jordan wonders how long it took for me to commit my changes after adding. In fact, I didn’t do that directly but did a diff before committing. I don’t know why, I think I just forgot to commit. This clearly showed the big difference between bzr and git before and after committing. Bzr was a lot faster than git before committing, while git outperforms bzr after committing. I guess this is because git makes heavy use of data structures created during a commit.

Bzr took almost 4 minutes to diff after committing, while git only took about a second. This clearly showed a problem with bzr, it started walking the entire tree again, while git could immediately see that nothing changed. Bzr exacerbated the same problem when performing a status, it was unnecessarily slow. Jordan’s tests showed that the status problem was solved, but doing a diff when nothing has changed is still a major performance issue on big trees.

Committing a small change was a lot faster for both git and bzr. And it should! This is after all a really common operation.

Jordan did another comparison with bzr, git and hg, which showed that hg is actually fairly close to git in terms of performance.

AVI 2008

Although it’s a bit late (almost a month after the facts), I finally found some time to blog about Advanced Visual Interfaces (AVI) 2008 in Naples, where Jan presented our paper about Gummy.

Gummy title slide at AVI 2008

I liked it very much: the conference had good quality papers but was still reasonably small (around 150 attendants), and of course the weather and the Italian food were great We arrived on Tuesday which gave us some time to explore the city and take the ferry to Capri (a great suggestion by Robbie).

Piazza del Plebiscito

Vesuvius

Capri

I am not going to discuss the conference program into detail this time, but will just highlight a couple of interesting papers. Possibly one of the coolest papers was “Exploring Video Streams using Slit-Tear Visualizations” by Anthony Tang (video). Another presentation I enjoyed was “TapTap and MagStick: Improving One-Handed Target Acquisition on Small Touch-screens” by Anne Roudaut (video). It seems there is lots of related work in this area (e.g. Shift, ThumbSpace, etc.). Peter Brandl presented two interesting papers: Bridging the Gap between Real Printouts and Digital Whiteboards and Combining and Measuring the Benefits of Bimanual Pen and Direct-Touch Interaction on Horizontal Interfaces. He was brave enough to do an impressive live demo for the first paper Oh, and he also covered the conference in a blog post.

The first paper in our session, titled “A Mixed-Fidelity Prototyping Tool for Mobile Devices” by Marco de Sá, introduced a tool to easily design prototypes and evaluate them in real-life situations. The system was well thought out and serves a real need. I can imagine that we could use this kind of tool in a user-centered UI design course. The second paper in our session was “Model-based Layout Generation” by Sebastian Feuerstack. I already met Sebastian at CADUI 2006. They presented a generic layout model based on constraints. It reminded me a bit of the layout model Yves worked on for our EIS 2007 paper. They used the Cassowary constraint solver, which I also used for my MSc thesis on constraint-based layouts for UIML. Sebastian told me he got the idea from my demo at CADUI 2006. I forgot to add a certain constraint (the layout of the UI was thus underconstrained), which by coincidence had no effect on the user interface everytime I tested it. Of course, when I showed the demo it did have an effect This clearly illustrated that constraint solvers are sometimes unpredictable (see Past, present, and future of user interface software tools by Myers et al.). Sebastian’s solution to this problem was to hide the constraints from the designer and generate them automatically from a graphical layout model.

Juan Manuel Gonzalez Calleros — who I met at CADUI 2006, TAMODIA 2006 and a few other occasions — presented a poster and a paper at the workshop on haptics. He took a few pictures while Jan was presenting (thanks again Juan!). Here are Juan and Jan discussing UsiXML vs UIML

Jan and Juan discussing UsiXML vs UIML

Overall, the comments on our work were positive, although of course one of the biggest problems is still the lack of support for multi-screen interfaces. As Jan is actively hacking on Gummy these days, I don’t think it will take very long for this to be included in the tool

Demo video of a Smalltalk environment

Just a quick update to my previous post. I can imagine that my discussion of the advantages of Smalltalk might be a bit abstract for people who never used it.

So here’s a short demo video of a Solitaire game running in a Smalltalk environment (via David Buck). It clearly illustrates features such as full introspection (e.g. by using the object browser) and live “fix-and-continue” debugging:

I found another video showing live code updates in Smalltalk while invoking native libraries in the background (more specifically, OpenGL):

I think these videos are useful for demonstrating the power of Smalltalk's environment. On a side note, I discovered Objective-C supports some form of "fix-and-continue" debugging as well.