Tamara L. Berg
Assistant Professor, SUNY Stony Brook
1411 Computer Science
Stony Brook, NY 11794
tlberg - at - cs.sunysb.edu
(646) 509-3361

Member of the Stony Brook consortium for
Digital Arts, Culture & Technology (cDACT).

CEWIT office - Room 214

  Research

My main research area is Digital Media, specifically focused on organizing large collections of images with associated text through the development of techniques in Natural Language Processing and Computer Vision. Today billions of images with associated text are available in web pages, captioned photographs from news sources, video with speech or closed captioning, and others. In order to organize, search and exploit these enormous collections we have developed methods that combine information from both the visual and textual sources effectively. Past projects include: automatically identifying people in news photographs, classifying images from the web, and finding iconic images in consumer photo collections. I am also generally interested in bringing together people and expertise from various areas of Digital Media including digital art, music, and cultural studies.

Teaching

Spring 2009 - CSE/ISE 364 Advanced Multimedia.
Fall 2008 - CSE 690 Internet Vision.

Bio

I graduated with a Ph.D. from the Computer Science Department at UC, Berkeley in the Spring of 2007 under the advisorship of Professor David Forsyth and was a member of the Berkeley Computer Vision Group. I spent 2007-2008 as a post-doc at Yahoo! Research devloping various digital media related projects including the automatic annotation of consumer photographs. I am currently an Assistant Professor at Stony Brook University and looking for excited, motivated graduate students. Please email me if you are interested in joining my group.

-------------------------------------------------------------------------------------------------------------------------------------------------


  Projects
 

Faces In the Wild

We show that a large and realistic face dataset can be built from news photographs and their associated captions. This dataset is more realistic than usual face recognition datasets, because it contains faces captured ``in the wild'' in a variety of configurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. We obtain 44,773 faces from approximately half a million captioned news images. We then automatically link names, obtained using a named entity recognizer on the captions, with faces, obtained using a face detector on the images. Initially we use a simple clustering method and produce fair results. However, the context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We therefore improve our results significantly by linking the clustering process with a language model which learns the probability that an individual is depicted given its context within the caption. Once the training procedure is over, we have a large, accurately labeled set of 30,281 faces, an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation. We also produce a face dictionary of news photographs organized according to the people present and which can be searched by individual.
Demo: Face Dictionary
Dataset: Faces In the Wild
Dataset: Labeled Faces In the Wild
 

Animals On the Web

We have built a set of classifiers to recognize several animal categories: Alligator, Ant, Bear, Beaver, Dolphin, Frog, Giraffe, Leopard, Monkey and Penguin. Using, Google Web Search, we identify a pool of candidate images for a given query. These images are then re-ranked by our system using information extracted from both the surrounding text and the images themselves. This give us quite a good pool of images for each class. We also demonstrate that we can extend this pool of images quite easily using a set of related queries for the monkey class. We produce a startingly good set of results for complex web data.
Demo: Animals on the Web
Dataset: Animals on the Web Dataset

 

Ranking Iconic Images

We define an iconic image for an object category (e.g. eiffel tower) as an image with a large clearly delineated instance of the object in a characteristic aspect. We show that for a variety of objects such iconic images exist and argue that these are the images most relevant to that category. Given a large set of images noisily labeled with a common theme, say a Flickr tag, we show how to rank these images according to how well they represent a visual category. We also generate a binary segmentation for each image indicating roughly where the subject is located. The segmentation procedure is learned from data on a small set of iconic images from a few training categories and then applied to several other test categories. We rank the segmented test images according to shape and appearance similarity against a set of 5 hand-labeled images per category. We compute three rankings of the data: a random ranking of the images within the category, a ranking using similarity over the whole image, and a ranking using similarity applied only within the subject of the photograph. We then evaluate the rankings qualitatively and with a user study.
Demo: Ranked Iconic Images


 

-------------------------------------------------------------------------------------------------------------------------------------------------

Publications

-------------------------------------------------------------------------------------------------------------------------------------------------