- Welcome to Language and Vision!
- 1/13/15 Look over the topic list and email me your top 3 discussion choices by Thursday (1/15/15).
- 1/27/15 HW1 is online here, due Feb 16.
- 2/17/15 Classes are cancelled due to weather. Schedule updated accordingly. We will hold a make-up class, tentatively Friday Feb 27 4pm.
- 2/17/15 HW2 is online here, due March 6.
- 3/2/15 If you have a project idea or are looking for a collaborator post something here
This course will explore topics straddling the boundary between Natural
Language Processing and Computer Vision. Now that basic visual recognition
algorithms are beginning to work, we can think about predicting higher level
interpretations of images necessary for general image understanding. These
interpretations can be aided by text associated with images/videos and
knowledge about the world learned from language. On the NLP side, images can
help ground language in the physical world, allowing us to develop models for
semantics. Language and Vision is a natural place to explore these questions
as words and pictures are often naturally linked online and in the real world,
and each modality can provide reinforcing information to aid the other.
In this course, we will learn how to make use of the complementary nature of
words and pictures through topic lectures and discussions about state of the
art research. Students will be responsible for completing 2 HW assignments,
reading research papers, and participating in discussions. They will also have
a chance to explore a topic of their choice in more depth through a class
Topics (a subset of these will be covered based on time)|
- Basic Background in NLP, Vision, and Machine Learning
- Features and Representations
- Image Retrieval
- Multi-modal Clustering and Word Sense Disambiguation
- Text as weak labels for image or video classification
- Image/Video Annotation and Natural Language Description Generation
- Natural Language Grounding & Learning by Watching
- Learning Knowledge from the web
- Deep Learning for Images, Text, and multi-modal data
No prior experience in computer vision or natural language processing is
required to take this course although some knowledge of these areas or machine learning
will be useful. Students are allowed 3 free late days for assignments over the semester. Afterward
late assignments will be accepted with a 10% reduction in value per day late.
There will be 2 homeworks assigned during the first two months of the course to get students aquainted with topics in Language and Vision. Over the final month of the course students will develop their own project related to language and vision. This will include a proposal presentation, a written update, and a final presentation and written document. Projects should involve some amount of text and image processing, but the exact topic and amount of language or vision involved can be determined by the student in consultation with the instructor.
Students will also be responsible for reading assigned research papers, submitting short paper summaries, and participating in class discussions. Paper summaries should be submitted in hard copy at the start of each class when there are papers assigned. During 1 class students will be in charge of facilitating discussion of an application related to one of the research topics.
Assignments and Projects may be completed in pairs with one submission per pair.
Assignments may be discussed with anyone in the class, but each pair
of students should implement their own assignments. Code from the internet is allowed, but must
be cited and the extent of internet code incorporated must be indicated in comments
within the code as well as in the write-up.
Grading will consist of: Assignments (30%), Project (40%), Participation (30%).
|Date||Topic ||Readings ||Discussion Leads||Assignments|
|Jan 13||Intro to the Course (slides1), Overview of Computer Vision (slides2)||-||-||-
|Jan 20||Overview of NLP (slides1), Features & Representations (slides2)||-||-||-
|Jan 27||Image Retrieval (slides, discussion slides)||"PageRank for Product Image Search",|
"Animals on the Web"
|Brian, Andrew||HW1 out, paper summaries
|Feb 3||Overview of Machine Learning Techniques (slides)||-||-||-
|Feb 10||Clustering (slides)||"Computing Iconic Summaries of General Visual Concepts",|
Who's in the Picture?
|Chen-Yang, Alexis||paper summaries
|Feb 17||Classes canceled due to weather||-||-||HW2 out
|Feb 24||Classification - Text as weak labels (slides)||
"Building text features for object image classification", |
"Learning realistic human actions from movies"
|Chris, Chun-Wei||paper summaries
|Feb 27, 4pm ||Make-up class - overview of available project resources + brainstorming (slides1, slides2)||-||-||-
|March 3||Attributes (slides)||"Automatic Attribute Discovery and Characterization from Noisy Web Data", |
|Natalie, Yipin||paper summaries
|March 10||Spring Break||-||-||-
|March 17||Project Proposals||-||-||10 minute project proposal presentation, 2 page write-up
|March 24||Description Generation (slides)||"Baby Talk: Understanding and Generating Simple Image Descriptions", |
"Collective Generation of Natural Image Descriptions",
Generating Natural-Language Video Descriptions Using Text-Mined Knowledge"
|Kyle S, Liang, Licheng||paper summaries
|March 31||Auto-Illustration (slides)||"WordsEye: An Automatic Text-to-Scene Conversion System",|
"Learning Spatial Knowledge for Text to 3D Scene Generation"
|Rob, Matthew||paper summaries
|April 7||Learning Knowledge from Data||"NEIL: Extracting Visual Knowledge from Web Data",|
"Bringing Semantics Into Focus Using Visual Abstraction"
|Carl, Hasan||4 page project progress report, paper summaries
|April 14||Deep Learning||"Deep Visual-Semantic Alignments for Generating Image Descriptions"||Eunbyung, Kyle M||paper summaries
|April 21||Final Project Presentations||-||-||10 minute final project presentation
|May 4||-||-||-||Final project write-up due (8 pages, conference paper layout)
1) Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.
2) Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.
3) Jurafsky and Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, McGraw Hill, 2008.
4) Christopher D. Manning, and Hinrich Schuetze. Foundations of Statistical Natural Language Processing