
![]() | 133.32MB |
![]() | 3.70GB |
![]() | 2.77GB |
![]() | 3.58GB |
![]() | 3.66GB |
![]() | 3.45GB |
![]() | 3.57GB |
![]() | 3.43GB |
![]() | 3.63GB |
![]() | 3.57GB |
![]() | 3.57GB |
![]() | 3.60GB |
![]() | 3.82GB |
![]() | 3.69GB |
![]() | 3.84GB |
![]() | 3.95GB |
![]() | 3.93GB |
![]() | 3.83GB |
![]() | 3.85GB |
![]() | 3.76GB |
![]() | 3.75GB |
![]() | 3.72GB |
![]() | 3.86GB |
![]() | 3.87GB |
![]() | 3.95GB |
![]() | 3.56GB |
![]() | 3.91GB |
![]() | 4.06GB |
![]() | 3.90GB |
![]() | 3.88GB |
![]() | 3.77GB |
![]() | 3.67GB |
![]() | 3.92GB |
![]() | 3.69GB |
![]() | 3.66GB |
![]() | 3.82GB |
![]() | 3.62GB |
![]() | 3.94GB |
![]() | 4.14GB |
![]() | 3.93GB |
![]() | 4.22GB |
![]() | 3.94GB |
![]() | 3.58GB |
![]() | 3.63GB |
![]() | 4.01GB |
![]() | 3.85GB |
![]() | 3.91GB |
![]() | 3.84GB |
![]() | 3.58GB |
Type: Dataset
Tags: speech isolation, lip reading, face detection
Bibtex:
Tags: speech isolation, lip reading, face detection
Bibtex:
@article{, title= {AVSpeech: Large-scale Audio-Visual Speech Dataset }, journal= {}, author= {Ariel Ephrat and Inbar Mosseri and Oran Lang and Tali Dekel and Kevin Wilson and Avinatan Hassidim and William T. Freeman and Michael Rubinstein}, year= {}, url= {https://looking-to-listen.github.io/avspeech/}, abstract= {AVSpeech is a new, large-scale audio-visual dataset comprising speech video clips with no interfering background noises. The segments are 3-10 seconds long, and in each clip the audible sound in the soundtrack belongs to a single speaking person, visible in the video. In total, the dataset contains roughly 4700 hours* of video segments, from a total of 290k YouTube videos, spanning a wide variety of people, languages and face poses. For more details on how we created the dataset see our paper, Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation (https://arxiv.org/abs/1804.03619). * UPLOADER'S NOTE: This dataset contains 3000 hours of video segments and not the entire 4700 hours. 1700 hours were not included as some no longer existed on youtube, had a copyright violation, not available in the United States, or was of poor quality. Over 1 million segments are included in this torrent, each between 3 - 10 seconds, and in 720p resolution. See README on how to use this dataset}, keywords= {speech isolation, lip reading, face detection}, terms= {}, license= {}, superseded= {} }
by nashoksvy at 2021-08-07 05:33:30 GMT
thanks!
Last edited by nashoksvy at 2021-08-07 05:36:36 GMT
by leg0m4n at 2022-08-15 18:21:36 GMT
Add a comment