Selected publications and preprints.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, et al. (in collaboration with NVIDIA)

Audio-Visual Flamingo: Open Audio-Visual Intelligence for Long and Complex Videos

Audio-Visual Flamingo Logo
Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, et al. (in collaboration with NVIDIA)

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Audio Flamingo Next Logo
Arushi Goel, Sreyan Ghosh, Vatsal Agarwal, Nishit Anand, Kaousheik Jayakumar, et al. (in collaboration with NVIDIA)

MMOU - Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

MMOU Logo
Ramaneswaran Selvakumar*, Kaousheik Jayakumar* et al.

Do Audio-Visual Large Language Models Really See and Hear?

CVPR Paper Logo
Kaousheik Jayakumar, et al.

Multilingual ASR Systems for Indian Languages

Tag Team Logo