CS 790-133: Recognizing People, Objects, and Actions

Instructor: Tamara Berg  (tlberg -at- cs.unc.edu)
Office: FB 236
Lectures: Mon/Wed 10:10-11:25pm Rm FB 008
Office Hours: Mon 11:25-12:25 and by appointment
Course Webpage: http://tamaraberg.com/teaching/Spring_16/790-133


Recognition is a core pursuit of computer vision. In recognition one attempts to attach semantics to visual data such as images or video. One important subtopic of recognition is object recognition where one builds models to recognize object categories or instances. Other subtopics include: activity recognition (building descriptions of what people are doing from visual data), face recognition (attaching identities to pictures or video of faces), and detection (localizing all instances of a particular category in an image). This course will look at methods for recognizing objects, people, actions, and scenes in images and video. It will also review recent work on moving beyond traditional outputs toward more complex structured outputs for images such as methods for attribute recognition, image description, and question answering.

Email me or drop by my office if you have any questions!

  • Objects - single instance or category based
  • People - faces, pedestrians, pose, and actions
  • Scenes - recognition in context, surfaces, parsing
  • Other Recognition tasks - attributes, aesthetics, memorability
  • Moving beyond traditional outputs - descriptions, question-answering


Grading will consist of projects/assignments (40%), topic presentations (20%), paper summaries (10%), brainstorming and participation (30%). Students will have a chance to present a relevant research topic in small groups. Additionally, they will define and implement a project related to visual recognition over the semester. Students are also expected to attend class, read and summarize assigned papers, and actively participate in group discussions and brainstorming sessions. There will not be any exams.

All submissions/assignments should be emailed to: comp790.133@gmail.com.

Prerequisites and Target Audience: No prior experience in computer vision is required although some exposure to image processing, machine learning, or graphics is highly recommended. A previous course in linear algebra is also highly recommended. The course will start with some basic background and then move to reading and discussion of relevant research papers and projects. This course is targeted toward graduate students with an interest in computer vision. Undergrads may register with permission of the instructor.

Topic Presentations

Students will form small groups to prepare a presentation on a research topic related to the course (group size will be determined based on enrollment). Topics will be presented over a series of 2 lectures. Students should read several papers related to their selected topic, then present a high level cohesive summary of the topic (this should go beyond just detailing specifics of a couple of papers). 3-4 papers should also be selected for the entire class to read and posted on the course website. Groups should arrange to meet with the instructor 2 weeks prior to their presentations to go over their presentation draft.

Brainstorming Sessions

After each topic, we will hold one class dedicated to brainstorming ideas surrounding the topic. Prior to class students should submit a 2 page write-up and 2 slides proposing an novel idea related to the topic. Students will then present and discuss their ideas in an informal brainstorming session. Novelty can be expressed in many ways, e.g. a novel technical extension to an existing problem, a new sub-problem related to the topic, a dataset that would be useful for expanding work on the topic, a commercial application that could be enabled by work on the topic, etc.


Students will implement course projects on a topic of their choice over the semester. Projects can range from implementation of an existing research paper to original research. Project topics related to your research interests are encouraged. Projects may be completed individually or in small groups in the programming language of your choice. Projects will be evaluated based on presentations (proposal, 2 updates, and final presentation) and a final written report with demo video if appropriate.

Tentative Schedule

DateTopic Readings Presenter To Do
Jan 11Intro - intro.pptx-tamara-
Jan 13Computer Vision Review - visionReview.pptx-tamara-
Jan 18No class - holiday-tamara-
Jan 20Features Review - features.pptx-tamara-
Jan 25Class canceled due to weather-tamara-
Jan 27Machine Learning Review - machinelearning1.pptx-tamarainstall your favorite machine learning tool and image dataset and familiarize yourself with image classification (resource links below)
Feb 1Recognizing Objects - objectrecognition1.pptx"Visual Categorization with Bags of Keypoints",
"Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories"
tamarapaper summaries - summary_template.docx
Feb 3Recognizing Objects"ImageNet Classification with Deep Convolutional Neural Networks",
"Very Deep Convolutional Networks for Large-Scale Image Recognition"
tamarapaper summaries
Feb 8Brainstorming - Object Recognition-allsubmit 2 page idea + 2 slides
Feb 10Project Proposals-allsubmit 1 page proposal and prepare 5 minute presentation
Feb 15Class canceled due to weather---
Feb 17Recognizing People - slides1,slides2Deep Face: Closing the Gap to Human-Level Performance in Face Verification,
Attribute and Simile Classifiers for Face Verification
group 1 (Adam, Zherong, Jae-Sung, Cheng-Yang)paper summaries
Feb 22Recognizing People - slides1,slides2Real-Time Human Pose Recognition in Parts from Single Depth Images,
Learning Actions from the Web
group 1 (Adam, Zherong, Jae-Sung, Cheng-Yang)paper summaries
Feb 22, 5pm - make-up class @ Mellow MushroomBrainstorming - Person Recognition-allsubmit 2 page idea + 2 slides
Feb 24Localizing objects - slides1,slides2Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models,
Rich feature hierarchies for accurate object detection and semantic segmentation
group 2 (Yen-Chun, Devan, Yeu-Chern, Ric)paper summaries
Feb 29No class - traveling ---
March 2Localizing objects - slides1, slides2Sliding Shapes for 3D Object Detection in Depth Images,
Histograms of Oriented Gradients for Human Detection
group 2 (Yen-Chun, Devan, Yeu-Chern, Ric)paper summaries
March 7Brainstorming - Localization-allsubmit 2 page idea + 2 slides
March 9Project Updates-allsubmit 3 page update and prepare 5 minute presentation
March 14No class - Spring Break---
March 16No class - Spring Break---
March 21Scene Understanding - slides1, slides2,slides3SUN Database:Large-scale Scene Recognition from Abbey to Zoo,
Learning Hierarchical Features for Scene Labeling
group 3 (Lee, Jen, Dong, Licheng, Marc)paper summaries
March 23Scene Understanding - slides1, slides2Learning Informative Edge Maps for Indoor Scene Layout Prediction,
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
group 3 (Lee, Jen, Dong, Licheng, Marc)paper summaries
March 28Brainstorming - Scene Understanding-allsubmit 2 page idea + 2 slides
March 30Other Recognition tasks - attributes, perceptual characteristics - slides1Understanding and Predicting Image Memorability at a Large Scale,
Context-aware saliency detection
group 4 (Anna, Yue, Matt, Chris, Calvin)paper summaries
April 4Project Updates-allsubmit 3 page update and prepare 5 minute presentation
April 6Other Recognition tasks - attributes, perceptual characteristics - slides1, slides2Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies,
AVA: A Large-Scale Database for Aesthetic Visual Analysis
group 4 (Anna, Yue, Matt, Chris, Calvin)paper summaries
April 11Brainstorming - other tasks-allsubmit 2 page idea + 2 slides
April 13Beyond Recognition - Description & Question-AnsweringCollective Generation of Natural Image Descriptions,
Show and Tell: A Neural Image Caption Generator
tamarapaper summaries
April 18Beyond Recognition - Description & Question-AnsweringVisual Madlibs: Fill in the blank Description Generation and Question Answering,
VQA: Visual Question Answering
tamarapaper summaries
April 20Brainstorming - What's next?-allsubmit 2 page idea + 2 slides
April 25Project Presentations-Devan, Jae Sung & Zherong, Yue, Licheng & Ric, Cheng-Yang, Yeu Chern-
April 27Project Presentations-Dong, Matt, Adam, Marc, Jen, Lee & Calvin-
April 29Project Write-Ups--submit 8 page conference formatted paper

Useful links

UNC students who want to use matlab can obtain it here

Compilation of computer vision resources (books, code, datasets, classes, tutorials, papers) - link
Caffe deep learning framework - link
TensorFlow deep learning framework - link
Deep learning with Torch - link

Reference Books
Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.
Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.
Stephen E Palmer, Vision Science: Photons to Phenomenology, MIT Press, 1999.

The professor reserves the right to make changes to the syllabus, including project due dates. These changes will be announced as early as possible.