WAV Russian Open Speech To Text (STT/ASR) Dataset v1.0-beta
Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin and Yuri Baburov

ru_open_stt_wav (73 files)
asr_calls_2_val.csv 1.57MB
asr_calls_2_val.tar.gz 814.18MB
asr_public_phone_calls_1.csv 35.51MB
asr_public_phone_calls_1.tar.gz 19.46GB
asr_public_phone_calls_2.csv 83.89MB
asr_public_phone_calls_2.tar.gz_aa 21.47GB
asr_public_phone_calls_2.tar.gz_ab 21.47GB
asr_public_phone_calls_2.tar.gz_ac 12.61GB
asr_public_stories_1.csv 6.64MB
asr_public_stories_1.tar.gz 4.01GB
asr_public_stories_2.csv 10.24MB
asr_public_stories_2.tar.gz 8.07GB
audiobooks_2.tar.gz_aa 21.47GB
audiobooks_2.tar.gz_ab 21.47GB
audiobooks_2.tar.gz_ac 21.47GB
audiobooks_2.tar.gz_ad 21.47GB
audiobooks_2.tar.gz_ae 21.47GB
audiobooks_2.tar.gz_af 21.47GB
audiobooks_2.tar.gz_ag 12.53GB
buriy_audiobooks_2_val.csv 1.06MB
buriy_audiobooks_2_val.tar.gz 500.50MB
private_buriy_audiobooks_2.csv 164.23MB
public_lecture_1.csv 925.52kB
public_lecture_1.tar.gz 633.72MB
public_meta_data_v03.csv 1.47GB
public_series_1.csv 2.71MB
public_series_1.tar.gz 1.82GB
public_speech.tar.gz 256.95GB
public_speech_manifest.csv 130.65MB
public_youtube1120.csv 196.84MB
public_youtube1120.tar.gz 114.39GB
public_youtube1120_hq.csv 53.73MB
public_youtube1120_hq.tar.gz 28.97GB
public_youtube700.csv 104.25MB
public_youtube700.tar.gz_aa 21.47GB
public_youtube700.tar.gz_ab 21.47GB
public_youtube700.tar.gz_ac 21.47GB
public_youtube700.tar.gz_ad 7.44GB
public_youtube700_val.csv 971.59kB
public_youtube700_val.tar.gz 471.52MB
radio_2.csv 68.45MB
radio_2.tar.gz 144.70GB
radio_v4_0.tar.gz 66.10GB
radio_v4_1.tar.gz 66.08GB
radio_v4_2.tar.gz 66.30GB
radio_v4_3.tar.gz 66.16GB
radio_v4_4.tar.gz 66.21GB
radio_v4_5.tar.gz 66.08GB
radio_v4_6.tar.gz 66.19GB
radio_v4_7.tar.gz 66.34GB
radio_v4_8.tar.gz 66.02GB
radio_v4_9.tar.gz 66.30GB
radio_v4_a.tar.gz 66.21GB
radio_v4_add.tar.gz 15.72GB
radio_v4_add_manifest.csv 6.94MB
radio_v4_b.tar.gz 66.25GB
radio_v4_c.tar.gz 66.23GB
radio_v4_d.tar.gz 66.04GB
radio_v4_e.tar.gz 65.98GB
radio_v4_f.tar.gz 66.28GB
radio_v4_manifest.csv 508.21MB
ru_RU.csv 667.64kB
ru_ru.tar.gz 1.48GB
russian_single.csv 443.27kB
russian_single.tar.gz 768.00MB
tts_russian_addresses_rhvoice_4voices.csv 288.19MB
tts_russian_addresses_rhvoice_4voices.tar 10.01MB
tts_russian_addresses_rhvoice_4voices.tar.gz_aa 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ab 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ac 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ad 6.59GB
voxforge_ru.csv 957.10kB
voxforge_ru.tar.gz 1.56GB
Type: Dataset
Tags:Dataset, russian, asr, stt, TTS

Bibtex:
@article{,
title= {WAV Russian Open Speech To Text (STT/ASR) Dataset v1.0-beta},
journal= {},
author= {Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin and Yuri Baburov},
year= {},
url= {https://github.com/snakers4/open_stt/},
abstract= {v1.0-beta 

Arguably the largest public Russian STT dataset up to date:
15m utterances;
20 000 hours;
2.3 TB (in mono .wav format in int16);

For more information please visit  https://github.com/snakers4/open_stt/},
keywords= {Dataset, Russian, ASR, STT, TTS},
terms= {},
license= {https://github.com/snakers4/open_stt/#license},
superseded= {}
}


Support
Academic Torrents!

Disable your
ad-blocker!

10 day statistics (2 downloads taking more than 30 seconds)

Average Time 2 days,15 hours, 56 minutes, 45 seconds
Average Speed 8.70MB/s
Best Time 13 hours, 27 minutes, 21 seconds
Best Speed 41.33MB/s
Worst Time 4 days,18 hours, 26 minutes, 09 seconds
Worst Speed 4.86MB/s
Report