LLLNet

November 10, 2017 ยท View on GitHub

About

This is a Keras implementation of "Look, Listen and Learn" Model on the research by R. Arandjelovic and A. Zisserman, at DeepMind. This model can get cross-modal features between audios and images.

Core Concept

Audio-visual correspondence task (AVC)

Different Point from Original Model

  • SqueezeNet is used for visual CNN. Model Figure

Contents

  1. 1About
  2. 2Core Concept
  3. 3Different Point from Original Model