CSE 591: Recognizing People, Objects, and Actions

Instructor: Tamara Berg  (tlberg -at- cs.sunysb.edu)
Office: 1411 Computer Science
Lectures: Tues/Thurs 11:20-12:40pm Rm N310, Soc Behav Sci
Office Hours: Tues/Thurs 12:40-1:40pm, and by appointment
Course Webpage: http://tamaraberg.com/teaching/Fall_09


Recognition is one of the core pursuits of computer vision. In recognition one attempts to attach semantics to visual data such as images or video. Object recognition is an important subtopic where one builds models to recognize object categories or instances. Other subtopics include: activity recognition -- building descriptions of what people are doing from visual data, face recognition -- attaching identities to pictures or video of faces, and detection -- localizing all instances of a particular category in an image. This course will look at both historical and current methods for recognizing objects, people, actions, and scenes in images and video. Students will have a chance to define their own problems and work on solutions through a course project.

  • Objects - single instance or category based
  • People - faces, pedestrians, pose, and actions
  • Scenes - whole image features, recognition in context, surfaces
  • Recognition by the human visual system
  • Recognition using vision + other modalities

Tentative Schedule

DateTopic Readings Presenter Assignments
Sept 1Intro & Overview of Course-Tamara Get access to matlab, do a tutorial.
Sept 3Computer Vision Review Vision is getting easier every dayTamara Get access to matlab, do a tutorial. HW0 out
Sept 8Bag of feature models - Discriminative Object Recognition from Local Scale-Invariant Features,
Learning Globally-Consistent Local Distance Functions for Shape-Based Classification
Tamara -
Sept 10Bag of feature models - GenerativeVisual Categorization with Bags of Keypoints,
Discovering Objects and Their Location in Images
Tamara -
Sept 15Spatial ModelsObject Class Recognition by Unsupervised Scale-Invariant Learning,
Shape Matching and Object Recognition Using Low Distortion Correspondence
Vicente, Tamara HW1 out.
Sept 17Face DetectionA Statistical Method for 3D Object Detection Applied to Faces and Cars,
Robust Real-Time Face Detection
Aravinda, Debaleena-
Sept 22Face RecognitionFace Recognition using Eigenfaces,
Face Recognition Based on Fitting a 3D Morphable Model
Anupam, Kiwon-
Sept 24Recognition by the human visual systemFace recognition by humans: 20 results Tamara, Bharti-
Sept 29Correction day - no class- - HW2 out
Oct 1Recognition by the human visual systemThe Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search Guest Lecture - Greg Zelinsky-
Oct 6Recognition by the human visual system (cont)- See slides from Sept 24-
Oct 8Categories pro/con, group discussionPrinciples of Categorization,
Categories (Aristotle),
100 years of Psychology of Concepts
In class group discussions See me in office hours to discuss projects.
Oct 13Categories pro/con, group debate- Group presentationsSee me in office hours to discuss projects.
Oct 15Project ProposalsIn class presentations whole classPrepare 5-10 minute presentation
Oct 20Recognizing AttributesAttribute and Simile Classifiers for Face VerificationGuest Lecture - Alex Berg-
Oct 22Intro to People & Actions-Tamara-
Oct 27Pedestrian DetectionPedestrian Detection in Crowded Scenes,
Histograms of Oriented Gradients for Human Detection
Oct 29Pose Estimation in imagesRecovering Human Body Configurations: Combining Segmentation and Recognition,
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations
Nov 3Project Update PresentationsIn class presentations whole classPrepare 5-10 minute presentation
Oct 5Pose Estimation in images (cont)-- -
Nov 10Action RecognitionRecognizing Action at a Distance,
Learning Realistic Human Actions from Movies
Nov 12ScenesOn the semantics of a glance at a scene,
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Nov 17Recognition in ContextObjects in Context(Xufeng cont),
Nov 19Project Update PresentationsIn class presentations whole classPrepare 5-10 minute presentation
Nov 24Recognizing surfacesRecovering Surface Layout from an Image,
Parsing Images of Architectural Scenes
Nov 26Thanksgiving - no class---
Dec 1Pictures & Other Meta-DataIm2GPS,
Estimating Age, Gender and Identity using First Name Priors
Dec 3Guest Lecture - Fernando de la Torre*1310 Computer Science*--
Dec 8Final Project PresentationsIn class presentations Sagnik-Debaleena, Bharti, Yifan-Jose-Ritwik-Fatih, Anupam, Xufeng-Aravinda
In class presentations
Dec 10Final Project PresentationsIn class presentations Visruth-Piyush, Vicente, Hiep, Jonathan, Thanadit, TajIn class presentations
Dec 15-- -Final Project Write-Up Due via email

There will be 3-4 short homeworks during the first month and a half of the course to get students aquainted with computer vision and recognition. Over the final two months of the course students will develop and present a project related to recognition. Students will also be responsible for leading one class paper discussion. One paragraph summaries of each paper will be due before the start of class.

Grading will consist of: Assignments (30%), Project (40%), Paper presentation (10%), Paper summaries (10%), Participation (10%).

No prior experience in computer vision is required to take this course. Homeworks should be done individually, but projects may be done in groups. Homeworks will be completed in matlab.

Submit all paper summaries, homeworks, and project presentations to: cse591@gmail.com

Useful links

Student Matlab licenses can be purchased from mathworks for $99 - Link.
Matlab tutorial by Hany Farid and Eero Simoncelli - Link
A more comprehensive Matlab tutorial by David Griffiths - Link

Label Me - Link
Tiny Images - Link
Code for downloading Flickr images - Link

Computing Features
SIFT features - Link
Scale Invariant Interest Points - Link
Affine Covariant Regions - Link
Shape Contexts - Link
Gist - Link

Other Useful Software
Various Code from INRIA - Link
Various Code from Oxford - Link
Various useful machine learning tools - Link

Reference Books
Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.
Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.
Stephen E Palmer, Vision Science: Photons to Phenomenology, MIT Press, 1999.