AVSpeech: Large-scale Audio-Visual Speech Dataset
Ariel Ephrat and Inbar Mosseri and Oran Lang and Tali Dekel and Kevin Wilson and Avinatan Hassidim and William T. Freeman and Michael Rubinstein

AVSpeech (402 files)
README.txt 1.92kB
clips/xaa.tar 4.07GB
clips/xab.tar 3.84GB
clips/xac.tar 3.67GB
clips/xad.tar 3.60GB
clips/xae.tar 3.76GB
clips/xaf.tar 3.78GB
clips/xag.tar 3.98GB
clips/xah.tar 3.78GB
clips/xai.tar 3.58GB
clips/xaj.tar 3.76GB
clips/xak.tar 3.65GB
clips/xal.tar 3.76GB
clips/xam.tar 3.90GB
clips/xan.tar 3.53GB
clips/xao.tar 3.74GB
clips/xap.tar 3.97GB
clips/xaq.tar 3.72GB
clips/xar.tar 3.88GB
clips/xas.tar 3.79GB
clips/xat.tar 4.15GB
clips/xau.tar 3.52GB
clips/xav.tar 3.78GB
clips/xaw.tar 3.87GB
clips/xax.tar 3.51GB
clips/xay.tar 3.69GB
clips/xaz.tar 3.63GB
clips/xba.tar 3.91GB
clips/xbb.tar 3.52GB
clips/xbc.tar 3.44GB
clips/xbd.tar 3.76GB
clips/xbe.tar 3.42GB
clips/xbf.tar 3.95GB
clips/xbg.tar 3.77GB
clips/xbh.tar 3.77GB
clips/xbi.tar 3.87GB
clips/xbj.tar 3.31GB
clips/xbk.tar 3.53GB
clips/xbl.tar 3.75GB
clips/xbm.tar 3.61GB
clips/xbn.tar 3.64GB
clips/xbo.tar 3.85GB
clips/xbp.tar 3.69GB
clips/xbq.tar 4.03GB
clips/xbr.tar 3.71GB
clips/xbs.tar 3.56GB
clips/xbt.tar 3.93GB
clips/xbu.tar 3.71GB
clips/xbv.tar 3.77GB
clips/xbw.tar 3.70GB
clips/xbx.tar 3.90GB
clips/xby.tar 3.87GB
clips/xbz.tar 3.83GB
clips/xca.tar 3.79GB
clips/xcb.tar 4.09GB
clips/xcc.tar 3.73GB
clips/xcd.tar 3.57GB
clips/xce.tar 3.50GB
clips/xcf.tar 3.80GB
clips/xcg.tar 3.47GB
clips/xch.tar 3.97GB
clips/xci.tar 3.68GB
clips/xcj.tar 3.73GB
clips/xck.tar 3.54GB
clips/xcl.tar 3.81GB
clips/xcm.tar 3.78GB
clips/xcn.tar 3.45GB
clips/xco.tar 3.90GB
clips/xcp.tar 3.48GB
clips/xcq.tar 3.94GB
clips/xcr.tar 3.77GB
clips/xcs.tar 3.67GB
clips/xct.tar 3.68GB
clips/xcu.tar 3.90GB
clips/xcv.tar 3.71GB
clips/xcw.tar 3.80GB
clips/xcx.tar 3.97GB
clips/xcy.tar 3.62GB
clips/xcz.tar 3.72GB
clips/xda.tar 3.92GB
clips/xdb.tar 3.89GB
clips/xdc.tar 3.57GB
clips/xdd.tar 3.55GB
clips/xde.tar 3.71GB
clips/xdf.tar 3.66GB
clips/xdg.tar 3.63GB
clips/xdh.tar 3.98GB
clips/xdi.tar 3.79GB
clips/xdj.tar 3.64GB
clips/xdk.tar 3.78GB
clips/xdl.tar 3.84GB
clips/xdm.tar 3.84GB
clips/xdn.tar 3.72GB
clips/xdo.tar 3.71GB
clips/xdp.tar 3.87GB
clips/xdq.tar 3.63GB
clips/xdr.tar 3.76GB
clips/xds.tar 3.97GB
clips/xdt.tar 4.08GB
clips/xdu.tar 3.96GB
clips/xdv.tar 3.67GB
clips/xdw.tar 3.62GB
clips/xdx.tar 3.69GB
clips/xdy.tar 3.78GB
clips/xdz.tar 3.57GB
clips/xea.tar 3.62GB
clips/xeb.tar 3.59GB
clips/xec.tar 3.97GB
clips/xed.tar 3.83GB
clips/xee.tar 3.87GB
clips/xef.tar 3.74GB
clips/xeg.tar 3.70GB
clips/xeh.tar 3.66GB
clips/xei.tar 3.61GB
clips/xej.tar 3.59GB
clips/xek.tar 3.70GB
clips/xel.tar 3.82GB
clips/xem.tar 3.75GB
clips/xen.tar 3.61GB
clips/xeo.tar 3.70GB
clips/xep.tar 3.64GB
clips/xeq.tar 3.92GB
clips/xer.tar 3.68GB
clips/xes.tar 3.78GB
clips/xet.tar 3.99GB
clips/xeu.tar 3.44GB
clips/xev.tar 3.73GB
clips/xew.tar 4.05GB
clips/xex.tar 3.68GB
clips/xey.tar 3.71GB
clips/xez.tar 3.75GB
clips/xfa.tar 3.91GB
clips/xfb.tar 3.89GB
clips/xfc.tar 3.85GB
clips/xfd.tar 3.80GB
clips/xfe.tar 3.66GB
clips/xff.tar 235.91MB
clips/xfg.tar 3.65GB
clips/xfh.tar 3.90GB
clips/xfi.tar 3.86GB
clips/xfj.tar 3.74GB
clips/xfk.tar 3.63GB
clips/xfl.tar 3.49GB
clips/xfm.tar 3.78GB
clips/xfn.tar 3.95GB
clips/xfo.tar 3.61GB
clips/xfp.tar 3.69GB
clips/xfq.tar 3.96GB
clips/xfr.tar 3.76GB
clips/xfs.tar 3.72GB
clips/xft.tar 3.89GB
clips/xfu.tar 3.86GB
clips/xfv.tar 3.77GB
clips/xfw.tar 3.80GB
clips/xfx.tar 3.63GB
clips/xfy.tar 3.95GB
clips/xfz.tar 3.69GB
clips/xga.tar 3.85GB
clips/xgb.tar 4.00GB
clips/xgc.tar 3.55GB
clips/xgd.tar 3.85GB
clips/xge.tar 3.93GB
clips/xgf.tar 3.81GB
clips/xgg.tar 3.79GB
clips/xgh.tar 3.71GB
clips/xgi.tar 3.84GB
clips/xgj.tar 3.61GB
clips/xgk.tar 3.86GB
clips/xgl.tar 3.46GB
clips/xgm.tar 3.94GB
clips/xgn.tar 3.89GB
clips/xgo.tar 3.90GB
clips/xgp.tar 3.44GB
clips/xgq.tar 4.07GB
clips/xgr.tar 4.07GB
clips/xgs.tar 3.69GB
clips/xgt.tar 3.65GB
clips/xgu.tar 3.80GB
clips/xgv.tar 3.82GB
clips/xgw.tar 4.18GB
clips/xgx.tar 3.55GB
clips/xgy.tar 3.99GB
clips/xgz.tar 3.76GB
clips/xha.tar 3.89GB
clips/xhb.tar 3.97GB
clips/xhc.tar 3.46GB
clips/xhd.tar 3.68GB
clips/xhe.tar 3.56GB
clips/xhf.tar 3.75GB
clips/xhg.tar 3.70GB
clips/xhh.tar 3.67GB
clips/xhi.tar 3.81GB
clips/xhj.tar 3.69GB
clips/xhk.tar 3.85GB
clips/xhl.tar 3.77GB
clips/xhm.tar 3.87GB
clips/xhn.tar 3.89GB
clips/xho.tar 4.03GB
clips/xhp.tar 3.93GB
Too many files! Click here to view them all.
Type: Dataset
Tags: speech isolation, lip reading, face detection

title= {AVSpeech: Large-scale Audio-Visual Speech Dataset },
journal= {},
author= {Ariel Ephrat and Inbar Mosseri and Oran Lang and Tali Dekel and Kevin Wilson and Avinatan Hassidim and William T. Freeman and Michael Rubinstein},
year= {},
url= {https://looking-to-listen.github.io/avspeech/},
abstract= {AVSpeech is a new, large-scale audio-visual dataset comprising speech video clips with no interfering background noises. The segments are 3-10 seconds long, and in each clip the audible sound in the soundtrack belongs to a single speaking person, visible in the video. In total, the dataset contains roughly 4700 hours* of video segments, from a total of 290k YouTube videos, spanning a wide variety of people, languages and face poses. For more details on how we created the dataset see our paper, Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation (https://arxiv.org/abs/1804.03619).

* UPLOADER'S NOTE: This dataset contains 3000 hours of video segments and not the entire 4700 hours. 1700 hours were not included as some no longer existed on youtube, had a copyright violation, not available in the United States, or was of poor quality. Over 1 million segments are included in this torrent, each between 3 - 10 seconds, and in 720p resolution. See README on how to use this dataset},
keywords= {speech isolation, lip reading, face detection},
terms= {},
license= {},
superseded= {}

10 day statistics (3 downloads)

Average Time 3 days,21 hrs, 21 mins, 29 secs
Average Speed 4.47MB/s
Best Time 1 days,04 hrs, 45 mins, 39 secs
Best Speed 14.52MB/s
Worst Time 5 days,05 hrs, 39 mins, 25 secs
Worst Speed 3.32MB/s