WAV Russian Open Speech To Text (STT/ASR) Dataset v0.3-beta
Anna Slizhikova and Alexander Veysov and Dmitry Voronin and Yuri Baburov

ru_open_stt_wav (40 files)
asr_public_phone_calls_1.csv 35.51MB
asr_public_phone_calls_1.tar.gz 19.46GB
asr_public_phone_calls_2.csv 83.89MB
asr_public_phone_calls_2.tar.gz_aa 21.47GB
asr_public_phone_calls_2.tar.gz_ab 21.47GB
asr_public_phone_calls_2.tar.gz_ac 12.61GB
asr_public_stories_1.csv 6.64MB
asr_public_stories_1.tar.gz 4.01GB
asr_public_stories_2.csv 10.24MB
asr_public_stories_2.tar.gz 8.07GB
audiobooks_2.tar.gz_aa 21.47GB
audiobooks_2.tar.gz_ab 21.47GB
audiobooks_2.tar.gz_ac 21.47GB
audiobooks_2.tar.gz_ad 21.47GB
audiobooks_2.tar.gz_ae 21.47GB
audiobooks_2.tar.gz_af 21.47GB
audiobooks_2.tar.gz_ag 12.53GB
private_buriy_audiobooks_2.csv 164.23MB
public_lecture_1.csv 925.52kB
public_lecture_1.tar.gz 633.72MB
public_meta_data_v03.csv 1.47GB
public_series_1.csv 2.71MB
public_series_1.tar.gz 1.82GB
public_youtube700.csv 104.25MB
public_youtube700.tar.gz_aa 21.47GB
public_youtube700.tar.gz_ab 21.47GB
public_youtube700.tar.gz_ac 21.47GB
public_youtube700.tar.gz_ad 7.44GB
ru_RU.csv 667.64kB
ru_ru.tar.gz 1.48GB
russian_single.csv 443.27kB
russian_single.tar.gz 768.00MB
tts_russian_addresses_rhvoice_4voices.csv 288.19MB
tts_russian_addresses_rhvoice_4voices.tar 10.01MB
tts_russian_addresses_rhvoice_4voices.tar.gz_aa 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ab 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ac 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ad 6.59GB
voxforge_ru.csv 957.10kB
voxforge_ru.tar.gz 1.56GB
Type: Dataset
Tags:Dataset, russian, asr, stt

Bibtex:
@article{,
title= {WAV Russian Open Speech To Text (STT/ASR) Dataset v0.3-beta},
journal= {},
author= {Anna Slizhikova and Alexander Veysov and Dmitry Voronin and Yuri Baburov},
year= {},
url= {https://github.com/snakers4/open_stt/},
abstract= {!!! WAV version !!!
v0.4-alpha, added the forgotten txt files

Arguably the largest public Russian STT dataset up to date:

4.6m utterances;
4000 hours;
431 GB (in .wav format in int16);
For more information please go here https://github.com/snakers4/open_stt/},
keywords= {Dataset, russian, asr, stt},
terms= {},
license= {https://github.com/snakers4/open_stt/#license},
superseded= {a12a08b39cf3626407e10e01126cf27c198446c2}
}


Support
Academic Torrents!

Disable your
ad-blocker!

Report