CS 790-133: Recognizing People, Objects, and Actions

Instructor: Tamara Berg  (tlberg -at- cs.unc.edu)
Office: FB 236
Lectures: Mon/Wed 11:00-12:15pm Rm SN 011
Office Hours: (tentatively) Wed 2:30-3:30 and by appointment
Course Webpage: http://tamaraberg.com/teaching/Fall_13/790-133


Introduction

Recognition is a core pursuit of computer vision. In recognition one attempts to attach semantics to visual data such as images or video. One important subtopic of recognition is object recognition where one builds models to recognize object categories or instances. Other subtopics include: activity recognition (building descriptions of what people are doing from visual data), face recognition (attaching identities to pictures or video of faces), and detection (localizing all instances of a particular category in an image). This course will look at methods for recognizing objects, people, actions, and scenes in images and video. It will also review recent work on moving beyond traditional outputs toward more complex structured outputs for images such as methods for attribute recognition and image description.


Topics
  • Objects - single instance or category based
  • People - faces, pedestrians, pose, and actions
  • Scenes - whole image features, recognition in context, surfaces
  • Moving beyond traditional outputs - attributes, aesthetics, complex structured outputs

Grading

Grading will consist of projects (40%), topic presentations (40%), participation (20%). Students will have a chance to present a relevant research topic of their choice in small groups. They will also define and implement a recognition project over the semester. Students are also expected to attend class, read the assigned papers, and actively participate in group discussions. If students do not demonstrate that they are reading the assigned papers, the class may be asked to turn in short paper summaries prior to class. There will not be any exams or formal homework assignments.

Prerequisites and Target Audience: No prior experience in computer vision is required although some exposure to image processing, machine learning, or graphics would be helpful. A previous course in linear algebra is recommended. The course will start with some basic background and then move to reading and discussion of relevant research papers and projects. This course is targeted toward graduate students with an interest in computer vision. Undergrads may register with permission of the instructor.

Email me or drop by my office if you have any questions!

Topic Presentations

Students will form small groups to prepare a presentation on a research topic related to the course (group size will be determined based on enrollment). Topics will be presented over a series of 2 lectures. Students should read several papers related to their selected topic, then present a high level cohesive summary of the topic (this should go beyond just detailing specifics of 2-3 papers). 2-3 papers should also be selected for the entire class to read and posted on the course website. Potential papers related to each topic are posted here, but students may also select their own relevant papers.

Projects

Students will implement course projects over the last 2 months of the semester. Projects can range from implementation of a research paper to original research. Project topics related to your research interests are encouraged. Projects may be completed individually or in small groups in the programming language of your choice. Projects will be evaluated based on 3 presentations (proposal, update, and final presentation) and a final written report with demo video if appropriate.

Tentative Schedule

DateTopic Readings Presenter To Do
Aug 21Intro - Slides-tamara-
Aug 26Computer Vision Review - Slides-tamara-
Aug 28Features Review - Slides-tamaraForm groups for topic presentations (potential papers for each topic here).
Sep 2No Class - Labor Day---
Sep 4Machine Learning Review - Slides-tamara-
Sep 9Recognizing Objects (BoF models, spatial models) - SlidesVisual Categorization with Bags of Keypoints,
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
tamara-
Sep 11Recognizing Objects (BoF models, spatial models) - SlidesShape Matching and Object Recognition Using Low Distortion Correspondencetamara-
Sep 16Recognizing Objects (BoF models, spatial models) -tamara-
Sep 18People (introduction) - Slides-tamara-
Sep 23Recognizing People (faces, actions) - Slides"Face recognition via sparse sampling",
"Active Shape Models - Their training and applications"
group 1kishore, aniket, keethan, dinghuang
Sep 25Recognizing People (faces, actions) - Slides2Learning Realistic Human Actions from Movies"group 1kishore, aniket, keethan, dinghuang
Sep 30Localization (detection, pose) - Slides"Histograms of Oriented Gradients for Human Detection",
"Object Detection with Discriminatively Trained Part-Based Models"
group 2andrew, chun-wei, lu, qingyu
Oct 2Localization (detection, pose) - Slides"Recognition Using Visual Phrases"group 2andrew, chun-wei, lu, qingyu
Oct 7Project Proposals-allPrepare 5 minute proposal presentation
Oct 9Project Proposals-allPrepare 5 minute proposal presentation
Oct 14Scenes (introduction)-tamara-
Oct 16Recognizing Scenes (recognition, parsing, surface recovery) - Slides1, Slides2"SuperParsing: Scalable Nonparametric Image Parsing with Superpixels",
"Recovering Surface Layout from an Image"
group 3hyo jin, ian, hongsheng, meng, young-woon
Oct 21Recognizing Scenes (recognition, parsing, surface recovery) - Slides3, Slides4, Slides5"From 3D Scene Geometry to Human Workspace"group 3hyo jin, ian, hongsheng, meng, young-woon
Oct 23To Categorize or not to categorize - Slides1, Slides2 "Cognition & Categorization",
"TagProp: Discriminative Metric Learning"
group 4jared, joshua, schuyler, brian
Oct 28To Categorize or not to categorize - Slides3, Slides4"Exemplar SVM"group 4jared, joshua, schuyler, brian
Oct 30Project Updates-allPrepare 5 minute project update presentation & submit 2 page write-up detailing progress
Nov 4Project Updates-allPrepare 5 minute project update presentation & submit 2 page write-up detailing progress
Nov 6Recognizing Attributes (as mid-level representation, relative attributes) - Slides1, Slides, Slides3"Relative attributes",
"Multi-attribute queries: To Merge or Not to Merge?"
group 5tianxiang, wen, yi, ke
Nov 11Recognizing Attributes (as mid-level representation, relative attributes) - Slides1, Slides2Attribute and Simile classifiers for Face verificationsgroup 5tianxiang, wen, yi, ke
Nov 13Recognizing perceptual phenomena (aesthetics, memorability)"Finding Iconic Images"group 6dave, priyadarshi, sangwoo, niti
Nov 18Recognizing perceptual phenomena (aesthetics, memorability)"Assessing the aesthetic quality of photographs using generic image descriptors",
What makes an image memorable?
group 6dave, priyadarshi, sangwoo, niti
Nov 20What's next? (generating image descriptions, humans in the loop, large scale...)-tamara-
Nov 25What's next? (generating image descriptions, humans in the loop, large scale...)-tamara-
Nov 27No Class - Thanksgiving---
Dec 2Project PresentationsPrepare 5 minute project presentationall1) Dinghuang, 2) Hongsheng, 3) HyoJin & Meng, 4) Yi & Ke, 5) Dave, 6) Kishore, 7) Priyardashi, 8) Andrew & Chun-wei, 9) Sangwoo, 10) Brian
Dec 4Project PresentationsPrepare 5 minute project presentationall1) Schuyler, 2) Keethan, 3) Wen, 4) Tianxiang, 5) Aniket & Niti, 6) Joshua, 7) Jared, 8) Lu, 9) Young-woon, 10) Qingyu
Dec 6---Final Project Reports and Demos/Videos due


Useful links

Matlab
UNC students can get matlab from ITS here

Data
Label Me - Link
Tiny Images - Link
Code for downloading Flickr images - Link

Computing Features
SIFT features - Link
Scale Invariant Interest Points - Link
Affine Covariant Regions - Link
Shape Contexts - Link
Gist - Link

Other Useful Software
Various Code from INRIA - Link
Various Code from Oxford - Link
Various useful machine learning tools - Link

Reference Books
Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.
Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.
Stephen E Palmer, Vision Science: Photons to Phenomenology, MIT Press, 1999.

Disclaimer
The professor reserves the right to make changes to the syllabus, including project due dates. These changes will be announced as early as possible.