- Get access to Matlab and do a tutorial.
- Wei Liu will be our TA. Office hours Thurs 4-5pm in 2110.
- Homeworks may be completed in pairs.
- You will be allowed 3 free HW late days (total) over the course of the semester.
- Start thinking about project ideas and come see me in office hours.
- Schedule updated due to hurricane Sandy!
- Links to useful resources for projects available here.
This course will explore topics straddling the boundary between Natural
Language Processing and Computer Vision. Words and pictures are often
naturally linked. Some common examples include: the billions of pages on the
web containing images and text, captioned news photographs, and youtube videos
with speech or closed captioning. In order to search, classify and exploit
these collections it will be necessary to utilize both the visual and textual
information effectively. We will learn how to make use of the complementary
nature of words and pictures through topic lectures and analysis of state of
the art research. Students will also have a chance to define their own
multi-modal problems and solutions through a class project.
- Clustering Tagged Images
- Multi-modal Classification
- Generative and Topic Models
- Recognition in images with captions or video with scripts
- Learning by Watching
- Inferring semantics from images with associated text
- Generating natural language descriptions for images
MS Basic Project Option |
- Sign up as CSE 522 to complete the MS Basic Project Option
|Date||Topic ||Readings ||Presenter ||Assignments|
|Aug 28||Intro & Overview of Course - Slides||-||Tamara ||Get access to matlab, do a matlab tutorial, e.g. here
|Aug 30||Computer Vision Review - Slides||-||Tamara ||Get access to matlab, do a matlab tutorial, e.g.here
|Sept 4||No Class (Labor Day Holiday) ||-||-||-
|Sept 6||Natural Language Processing Review - Slides ||-||Tamara ||-
|Sept 11||Matlab Review - demo.tar.gz||-||Tamara||-
|Sept 13||Natural Language Processing Review (cont) ||-||Tamara ||HW1 out
|Sept 18||Features & Representations - Slides||-||Tamara||-
|Sept 20||Features & Representations (cont) ||-||Tamara||-
|Sept 25||Clustering - Slides||-||Tamara||-
|Sept 27||Classification - Slides||-||Tamara||HW2 out
|Oct 2||Clustering & Classification (Topic Presentation) - Slides2||"Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Information", |
"Building Text Features for Object Image Classification"
|Oct 4||Harvesting Databases From the Web (Topic Presentation) - Slides1, Slides2||"Animals on the Web",|
"Harvesting Image Databases from the Web"
|Oct 9||Video Classification (Topic Presentation) - Slides1||"Taxonomic Classification for Web-based Videos",|
"YouTubeCat: Learning to Categorize Wild Web Videos"
|Oct 11||Graphical Models - Slides||-||Tamara||HW3 out
|Oct 16||Generative Models (Topic Presentation) - Slides1||"Matching Words & Pictures", |
"Unsupervised Learning of Visual Sense Models for Polysemous Words"
|Oct 18||Recognition in Videos with Scripts (Topic Presentation) - Slides1, Slides2||"Learning Human Actions from Movies",|
"Watch, Listen & Learn: Co-training on Captioned Images and Videos"
|Oct 23||catch up day||-||-||Please visit office hours to discuss projects
|Oct 25||Learning by Watching (Topic Presentation) - Slides1||"Learning Sign Language by Watching TV (using Weakly Aligned Subtitles)",|
"Learning to Sportscast: A Test of Grounded Language Acquisition"
|Group 6||Please visit office hours to discuss projects
|Oct 30||Hurricane!||Project Proposals delayed||-||-
|Nov 1||Hurricane!||Project Proposals delayed||-||-
|Nov 6||Project Proposals||-||-||Prepare 5 minute project proposal. Email presentations to firstname.lastname@example.org.
|Nov 8||Project Proposals||-||-||Prepare 5 minute project proposal. Email presentations to email@example.com.
|Nov 13||Learning about Semantics I (Topic Presentation) - Slides1||"Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers", |
"Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images"
|Nov 15||Project Updates||-||-||Prepare 3 minute project update. You should have at least collected data and implemented some part of your algorithm by now. Send slides to firstname.lastname@example.org by midnight Nov 14, so I can merge into 1 presentation.
|Nov 20||Learning about Semantics II (Topic Presentation) - Slides1||"Automatic Attribute Discovery and Characterization from Noisy Web Data",|
"Learning models for object recognition from natural language descriptions"
|Nov 22||No Class (Thanksgiving)||-||-||-
|Nov 27||Other Cues (Topic Presentation) - Slides1||"Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags",|
"Who is "You"? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue"
|Nov 29||Generating Text for Images||"Im2Text: Describing Images Using 1 Million Captioned Photographs",|
"Collective Generation of Natural Image Descriptions"
|Dec 4||Final Project Presentations||-||-||Presentations due via email to email@example.com by midnight Dec 3.
|Dec 6||Final Project Presentations||-||-||Presentations due via email to firstname.lastname@example.org by midnight Dec 3.
|Dec 7||Project Code Reviews||-||-||Sign up for Appointment Time (3:30-8:30pm)
|Dec 17||Final Project Write-Ups||-||-||Final project write-ups due via email (8 pages organized similarly to a conference paper) to email@example.com.
There will be 3 homeworks during the first two months of the course to get
students aquainted with words and pictures. Over the final month of the course
students will develop and present a project related to words and pictures.
Students will also be responsible for presenting in one group topic discussion.
To increase participation and discussion during class, at the end of each class
students should turn in a paper with the number (if any) of significant
questions, answers, or comments they posed in class.
Grading will consist of: Assignments (30%), Project (30%), Participation (30%),
Topic Presentation (10%).
Homeworks may be completed in pairs with one submission per pair for each
assignment. Homeworks may be discussed with anyone in the class, but each pair
of students should write and submit their own code. Any evidence of copied,
shared, or transmitted source code will be regarded as evidence of academic
dishonesty and penalized as necessary. Code from the internet is allowed, but must
be cited and the extent of internet code incorporated must be indicated in comments
within the code as well as in the homework write-up.
Projects will be completed in groups, but all
project members are expected to contribute to the project equally. Individual
project grades may be given according to how each student has contributed.
No prior experience in computer vision or natural language processing is
required to take this course. Homeworks will be completed in
matlab, projects in the language of your choice. Submit all paper summaries, homeworks, and project presentations
Late homeworks and projects will be accepted with a 10% reduction in value per
day late after use of your 3 free late days.
1) Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.
2) Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.
3) Jurafsky and Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, McGraw Hill, 2008.
4) Christopher D. Manning, and Hinrich Schuetze. Foundations of Statistical Natural Language Processing
Student Matlab licenses can be purchased from mathworks for $99 - Link.
Matlab tutorial by Hany Farid and Eero Simoncelli - Link
A more comprehensive Matlab tutorial by David Griffiths - Link
Americans with Disabilities Act:
If you have a physical, psychological, medical or learning
disability that may impact your course work, please contact
Disability Support Services, ECC (Educational Communications
Center) Building, room 128, (631) 632-6748. They will determine
with you what accommodations, if any, are necessary and
appropriate. All information and documentation is confidential.
Each student must pursue his or her academic goals honestly and
be personally accountable for all submitted work. Representing
another person's work as your own is always wrong. Faculty are
required to report any suspected instances of academic
dishonesty to the Academic Judiciary. Faculty in the Health
Sciences Center (School of Health Technology & Management,
Nursing, Social Welfare, Dental Medicine) and School of
Medicine are required to follow their school-specific
procedures. For more comprehensive information on academic
integrity, including categories of academic dishonesty, please
refer to the academic judiciary website at
Critical Incident Management:
Stony Brook University expects students to respect the rights,
privileges, and property of other people. Faculty are required
to report to the Office of Judicial Affairs any disruptive
behavior that interrupts their ability to teach, compromises
the safety of the learning environment, or inhibits students'
ability to learn. Faculty in the HSC Schools and the School of
Medicine are required to follow their school-specific
Religious Holidays: Any student with deadline conflicts due to observance of
religious holidays should contact the professor.