12 February 2013

A product of the University of Michigan and Adobe Research, PixelTone understands your voice commands and gestures to facilitate editing images on a small screen. But the big advantage may be you only need to state your goals, not tell the software how to get there.

As the white paper PixelTone: A Multimodal Interface for Image Editing explains the problem:

Not only are interfaces for photo editing often complex, but they also expect the user to learn the language of image processing. Users must understand image properties such as hue, saturation, levels, and cropping, and learn how they are changed and combined to achieve a desired effect. To add complexity, effective image edits are often localized to a specific region, e.g., to brighten a face, recolor an eye, or make a sunset more vivid; this task usually requires sophisticated direct manipulation.

But what if you could just tell the software to brighten the top part of the image. Or you could point to a figure, say their name ("Richard III") and tell the software to brighten Richard III's skull a bit?

Well, you can. Here's the demo running on an iPad:

It's experimental but already the team has developed "a customized natural language interpreter that maps user phrases to specific image processing operations."

You still need to use a gesture to indicate things like blur direction or local selections, but pointing is one of those things you learn very early in life.

It's listening and understanding that tend to take a while. And it looks like PixelTone is getting there fast.

