Topic 1. Text Query based Audio Retrieval

- Retrieving audio signals using their sound content textual descriptions (i.e., audio captions).
- Text query composed of manually written audio captions.
- For each text query, the goal of this task is to retrieve audio files from a given dataset and sort them based their match with the query.
Topic 2. Automated Audio Captioning

- The task of general audio content description using free text.
- An inter-modal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal.
- Modeling concepts (e.g. "muffled sound"), physical properties of objects and environment (e.g. "the sound of a big car", "people talking in a small and empty room"), and high level knowledge ("a clock rings three times").
Research 1. Audio-Text Data Augmentation
Research 2. Audio Captioning
Research 3. Audio-Text Retrieval