CSE 595: Words & Pictures

Instructor: Tamara Berg  (tlberg -at- cs.sunysb.edu)
Office: 1411 Computer Science
Lectures: Tues/Thurs 11:20-12:40pm Rm 2129 CS
Office Hours: Tues/Thurs 3:40-5:10pm
Course Webpage: http://tamaraberg.com/teaching/Spring_11/wordspics


Introduction

This course will explore topics straddling the boundary between Natural Language Processing and Computer Vision. Words and pictures are often naturally linked. Some common examples include: the billions of pages on the web containing images and text, captioned news photographs, and youtube videos with speech or closed captioning. In order to search, classify and exploit these collections it will be necessary to utilize both the visual and textual information effectively. We will learn how to make use of the complementary nature of words and pictures through topic lectures and analysis of state of the art research. Students will also have a chance to define their own multi-modal problems and solutions through a class project.


Topics
  • Clustering for Image Labeling
  • Recognition as Translation
  • Discriminative classification of images with associated text
  • Generative models of words and pictures
  • Identifying people and pose using captions or scripts
  • Learning Visual attributes from images and text
  • Moving beyond objects
  • Categorizing web videos
  • Generating natural language descriptions for images

MS Basic Project Option
  • Sign up with CSE 522 to complete the MS Basic Project Option

Tentative Schedule

DateTopic Readings Presenter Assignments
Feb 1Intro & Overview of Course - Slides-Tamara Get access to matlab, do a matlab tutorial: here, and here
Feb 3Computer Vision Review - Slides-Tamara Get access to matlab, do a matlab tutorial: here, and here
Feb 8Natural Language Processing Review - Slides -Tamara -
Feb 10Features & Representations - Slides-Tamara-
Feb 15Features & Representations (cont) -TamaraHW1 out
Feb 17Bag of Words Models and Clustering - Slides-Tamara-
Feb 22Paper Presentations - Slides1, Slides2Learning the Semantics of Words and Pictures,
Clustering Art
Nikhil, Rohith-
Feb 24Paper Presentations - Slides1, Slides2Object Recognition as Machine Translation,
Object Recognition as Machine Translation - Part 2: Exploiting Image Database Clustering Models
Martino, Rong-
March 1Classification - Slides- TamaraHW2 out
March 3Paper Presentations - Slides1, Slides2Animals on the Web,
Harvesting Image Databases from the Web
Tamara-
March 8Paper Presentations - Slides1, Slides2Building Text Features for Object Image Classification,
Watch, Listen & Learn: Co-training on Captioned Images and Videos
Farheen, Pratiksha-
March 10Generative & Topic Models - Slides- Tamara-
March 15Paper Presentations - Slides1,Slides2Matching Words & Pictures,
Unsupervised Learning of Visual Sense Models for Polysemous Words
Jaewoo, DeepakHW3 out
March 17People in Images & Text - Slides- Tamara-
March 22Project Proposals - Slides about ProjectsPrepare a 5 minute proposal presentation All-
March 24Paper Presentations - Slides1, Slides2Who's in the Picture?,
Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation
Tamara, Naresh-
March 29Paper Presentations - Slides1, Slides2"Hello! My name is... Buffy" – Automatic Naming of Characters in TV Video,
Learning from Ambiguously Labeled Images
Arun, Aneesh-
March 31Paper Presentations - Slides1, Slides2Learning Sign Language by Watching TV (using Weakly Aligned Subtitles),
Who is "You"? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue
Sanjeev, Sahil-
April 5Project UpdatesPrepare a 5 minute project update presentation All-
April 7Group Project DiscussionsIn class group meetings All-
April 12Paper Presentations
Automatic Attribute Discovery and Characterization from Noisy Web Data,
Visual Recognition with Humans in the Loop
Tamara-
April 14Paper Presentations - Slides1, Slides2Taxonomic Classification for Web-based Videos,
YouTubeCat: Learning to Categorize Wild Web Videos
Ravneet, Cajeton-
April 19Spring Break- --
April 21Spring Break- --
April 26Project Updates 2Prepare a 5 minute project update presenation All-
April 28Group Project DiscussionsIn class group meetings All-
May 3Paper Presentations - Slides1, Slides2How Many Words is a Picture Worth? Automatic Caption Generation for News Images,
Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
Ashish, Sandesh-
May 5Paper PresentationsBabyTalk: Generating Simple Natural Language Descriptions for Images Girish-
May 10Final Project PresentationsIn class presentations In class presentationsRavneet,Sahil,Sanjeev
Jaewoo,Cajeton
Pratiksha,Nikhil,Ashish
Deepak,Farheen
May 12Final Project PresentationsIn class presentations In class presentationsNaresh,Rohith,Sandesh
Aneesh,Arun
Martino,Rong
Girish
May 17-- -Final Project Write-Up Due via email - 8 page document including abstract, introduction (motivation), method, results, & figures


No prior experience in computer vision or natural language processing is required to take this course. Homeworks and projects may be done in groups of up to 3 (please note hw groups on write-ups). Homeworks will be completed in matlab.

Submit all paper summaries, homeworks, and project presentations to: cse595@gmail.com

Grading
There will be 3 homeworks during the first month and a half of the course to get students aquainted with words and pictures. Over the final two months of the course students will develop and present a project related to words and pictures. Students will also be responsible for leading one class paper discussion. For each paper presentation day, a one paragraph summary of the assigned paper of your choice will be due before the start of class.

Grading will consist of: Assignments (30%), Project (40%), Paper presentation (10%), Paper summaries (10%), Participation (10%).


Students will be allowed 5 free homework or project late days of their choice over the semester. After those are used late homeworks/projects will be accepted with a 10% reduction in value per day late.

Reference Books
1) Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.
2) Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.
3) Jurafsky and Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, McGraw Hill, 2008.
4) Christopher D. Manning, and Hinrich Schuetze. Foundations of Statistical Natural Language Processing

Matlab
Student Matlab licenses can be purchased from mathworks for $99 - Link.
Matlab tutorial by Hany Farid and Eero Simoncelli - Link
A more comprehensive Matlab tutorial by David Griffiths - Link


Americans with Disabilities Act: If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential.

Academic Integrity: Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Technology & Management, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/

Critical Incident Management: Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Faculty in the HSC Schools and the School of Medicine are required to follow their school-specific procedures.