CSE 595: Words & Pictures

Instructor: Tamara Berg  (tlberg -at- cs.sunysb.edu)
Office: 1411 Computer Science
Lectures: Tues/Thurs 1:00-2:20pm Rm 2129 CS
Office Hours: Tues/Thurs 2:20-3:20pm and by appointment
Course Webpage: http://tamaraberg.com/teaching/Fall_12/wordspics
TA: Wei Liu (weiliu2 -at- cs.stonybrook.edu), Office hours Thurs 4-5pm, Rm 2110 CS


  • Get access to Matlab and do a tutorial.
  • Wei Liu will be our TA. Office hours Thurs 4-5pm in 2110.
  • Homeworks may be completed in pairs.
  • You will be allowed 3 free HW late days (total) over the course of the semester.
  • Start thinking about project ideas and come see me in office hours.
  • Schedule updated due to hurricane Sandy!
  • Links to useful resources for projects available here.


This course will explore topics straddling the boundary between Natural Language Processing and Computer Vision. Words and pictures are often naturally linked. Some common examples include: the billions of pages on the web containing images and text, captioned news photographs, and youtube videos with speech or closed captioning. In order to search, classify and exploit these collections it will be necessary to utilize both the visual and textual information effectively. We will learn how to make use of the complementary nature of words and pictures through topic lectures and analysis of state of the art research. Students will also have a chance to define their own multi-modal problems and solutions through a class project.

  • Clustering Tagged Images
  • Multi-modal Classification
  • Generative and Topic Models
  • Recognition in images with captions or video with scripts
  • Learning by Watching
  • Inferring semantics from images with associated text
  • Generating natural language descriptions for images

MS Basic Project Option
  • Sign up as CSE 522 to complete the MS Basic Project Option

Tentative Schedule

DateTopic Readings Presenter Assignments
Aug 28Intro & Overview of Course - Slides-Tamara Get access to matlab, do a matlab tutorial, e.g. here
Aug 30Computer Vision Review - Slides-Tamara Get access to matlab, do a matlab tutorial, e.g.here
Sept 4No Class (Labor Day Holiday) ---
Sept 6Natural Language Processing Review - Slides -Tamara -
Sept 11Matlab Review - demo.tar.gz-Tamara-
Sept 13Natural Language Processing Review (cont) -Tamara HW1 out
Sept 18Features & Representations - Slides-Tamara-
Sept 20Features & Representations (cont) -Tamara-
Sept 25Clustering - Slides-Tamara-
Sept 27Classification - Slides-TamaraHW2 out
Oct 2Clustering & Classification (Topic Presentation) - Slides2"Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Information",
"Building Text Features for Object Image Classification"
Group 1-
Oct 4Harvesting Databases From the Web (Topic Presentation) - Slides1, Slides2"Animals on the Web",
"Harvesting Image Databases from the Web"
Group 2-
Oct 9Video Classification (Topic Presentation) - Slides1"Taxonomic Classification for Web-based Videos",
"YouTubeCat: Learning to Categorize Wild Web Videos"
Group 3-
Oct 11Graphical Models - Slides-TamaraHW3 out
Oct 16Generative Models (Topic Presentation) - Slides1"Matching Words & Pictures",
"Unsupervised Learning of Visual Sense Models for Polysemous Words"
Group 4-
Oct 18Recognition in Videos with Scripts (Topic Presentation) - Slides1, Slides2"Learning Human Actions from Movies",
"Watch, Listen & Learn: Co-training on Captioned Images and Videos"
Group 5-
Oct 23catch up day--Please visit office hours to discuss projects
Oct 25Learning by Watching (Topic Presentation) - Slides1"Learning Sign Language by Watching TV (using Weakly Aligned Subtitles)",
"Learning to Sportscast: A Test of Grounded Language Acquisition"
Group 6Please visit office hours to discuss projects
Oct 30Hurricane!Project Proposals delayed--
Nov 1Hurricane!Project Proposals delayed--
Nov 6Project Proposals--Prepare 5 minute project proposal. Email presentations to cse595@gmail.com.
Nov 8Project Proposals--Prepare 5 minute project proposal. Email presentations to cse595@gmail.com.
Nov 13Learning about Semantics I (Topic Presentation) - Slides1"Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers",
"Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images"
Group 7-
Nov 15Project Updates--Prepare 3 minute project update. You should have at least collected data and implemented some part of your algorithm by now. Send slides to cse595@gmail.com by midnight Nov 14, so I can merge into 1 presentation.
Nov 20Learning about Semantics II (Topic Presentation) - Slides1"Automatic Attribute Discovery and Characterization from Noisy Web Data",
"Learning models for object recognition from natural language descriptions"
Group 8-
Nov 22No Class (Thanksgiving)---
Nov 27Other Cues (Topic Presentation) - Slides1"Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags",
"Who is "You"? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue"
Group 9-
Nov 29Generating Text for Images"Im2Text: Describing Images Using 1 Million Captioned Photographs",
"Collective Generation of Natural Image Descriptions"
Dec 4Final Project Presentations--Presentations due via email to cse595@gmail.com by midnight Dec 3.
Dec 6Final Project Presentations--Presentations due via email to cse595@gmail.com by midnight Dec 3.
Dec 7Project Code Reviews--Sign up for Appointment Time (3:30-8:30pm)
Dec 17Final Project Write-Ups--Final project write-ups due via email (8 pages organized similarly to a conference paper) to cse595@gmail.com.

There will be 3 homeworks during the first two months of the course to get students aquainted with words and pictures. Over the final month of the course students will develop and present a project related to words and pictures. Students will also be responsible for presenting in one group topic discussion. To increase participation and discussion during class, at the end of each class students should turn in a paper with the number (if any) of significant questions, answers, or comments they posed in class.

Grading will consist of: Assignments (30%), Project (30%), Participation (30%), Topic Presentation (10%).

Homeworks may be completed in pairs with one submission per pair for each assignment. Homeworks may be discussed with anyone in the class, but each pair of students should write and submit their own code. Any evidence of copied, shared, or transmitted source code will be regarded as evidence of academic dishonesty and penalized as necessary. Code from the internet is allowed, but must be cited and the extent of internet code incorporated must be indicated in comments within the code as well as in the homework write-up.

Projects will be completed in groups, but all project members are expected to contribute to the project equally. Individual project grades may be given according to how each student has contributed.

No prior experience in computer vision or natural language processing is required to take this course. Homeworks will be completed in matlab, projects in the language of your choice. Submit all paper summaries, homeworks, and project presentations to: cse595@gmail.com. Late homeworks and projects will be accepted with a 10% reduction in value per day late after use of your 3 free late days.

Reference Books
1) Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.
2) Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.
3) Jurafsky and Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, McGraw Hill, 2008.
4) Christopher D. Manning, and Hinrich Schuetze. Foundations of Statistical Natural Language Processing

Student Matlab licenses can be purchased from mathworks for $99 - Link.
Matlab tutorial by Hany Farid and Eero Simoncelli - Link
A more comprehensive Matlab tutorial by David Griffiths - Link

Americans with Disabilities Act: If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential.

Academic Integrity: Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Technology & Management, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/

Critical Incident Management: Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Faculty in the HSC Schools and the School of Medicine are required to follow their school-specific procedures.

Religious Holidays: Any student with deadline conflicts due to observance of religious holidays should contact the professor.