WAV Russian Open Speech To Text (STT/ASR) Dataset
Anna Slizhikova and Alexander Veysov and Dmitry Voronin and Yuri Baburov

ru_open_stt_wav (52 files)
asr_calls_2_val.csv 1.57MB
asr_calls_2_val.tar.gz 814.18MB
asr_public_phone_calls_1.csv 35.51MB
asr_public_phone_calls_1.tar.gz 19.46GB
asr_public_phone_calls_2.csv 83.89MB
asr_public_phone_calls_2.tar.gz_aa 21.47GB
asr_public_phone_calls_2.tar.gz_ab 21.47GB
asr_public_phone_calls_2.tar.gz_ac 12.61GB
asr_public_stories_1.csv 6.64MB
asr_public_stories_1.tar.gz 4.01GB
asr_public_stories_2.csv 10.24MB
asr_public_stories_2.tar.gz 8.07GB
audiobooks_2.tar.gz_aa 21.47GB
audiobooks_2.tar.gz_ab 21.47GB
audiobooks_2.tar.gz_ac 21.47GB
audiobooks_2.tar.gz_ad 21.47GB
audiobooks_2.tar.gz_ae 21.47GB
audiobooks_2.tar.gz_af 21.47GB
audiobooks_2.tar.gz_ag 12.53GB
buriy_audiobooks_2_val.csv 1.06MB
buriy_audiobooks_2_val.tar.gz 500.50MB
private_buriy_audiobooks_2.csv 164.23MB
public_lecture_1.csv 925.52kB
public_lecture_1.tar.gz 633.72MB
public_meta_data_v03.csv 1.47GB
public_series_1.csv 2.71MB
public_series_1.tar.gz 1.82GB
public_youtube1120.csv 196.84MB
public_youtube1120.tar.gz 114.39GB
public_youtube1120_hq.csv 53.73MB
public_youtube1120_hq.tar.gz 28.97GB
public_youtube700.csv 104.25MB
public_youtube700.tar.gz_aa 21.47GB
public_youtube700.tar.gz_ab 21.47GB
public_youtube700.tar.gz_ac 21.47GB
public_youtube700.tar.gz_ad 7.44GB
public_youtube700_val.csv 971.59kB
public_youtube700_val.tar.gz 471.52MB
radio_2.csv 68.45MB
radio_2.tar.gz 144.70GB
ru_RU.csv 667.64kB
ru_ru.tar.gz 1.48GB
russian_single.csv 443.27kB
russian_single.tar.gz 768.00MB
tts_russian_addresses_rhvoice_4voices.csv 288.19MB
tts_russian_addresses_rhvoice_4voices.tar 10.01MB
tts_russian_addresses_rhvoice_4voices.tar.gz_aa 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ab 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ac 21.47GB
tts_russian_addresses_rhvoice_4voices.tar.gz_ad 6.59GB
voxforge_ru.csv 957.10kB
voxforge_ru.tar.gz 1.56GB
Type: Dataset
Tags:Dataset, russian, asr, stt

Bibtex:
@article{,
title= {WAV Russian Open Speech To Text (STT/ASR) Dataset},
journal= {},
author= {Anna Slizhikova and Alexander Veysov and Dmitry Voronin and Yuri Baburov},
year= {},
url= {https://github.com/snakers4/open_stt/},
abstract= {v0.5-bata added the forgotten txt files

Arguably the largest public Russian STT dataset up to date:

7m utterances;
7000 hours;
850 GB (in .wav format in int16);
For more information please go here https://github.com/snakers4/open_stt/},
keywords= {Dataset, russian, asr, stt},
terms= {},
license= {https://github.com/snakers4/open_stt/#license},
superseded= {}
}


Support
Academic Torrents!

Disable your
ad-blocker!

10 day statistics (2 downloads)

Average Time 1 hours, 42 minutes, 48 seconds
Average Speed 108.61MB/s
Best Time 1 hours, 15 minutes, 30 seconds
Best Speed 147.90MB/s
Worst Time 2 hours, 10 minutes, 07 seconds
Worst Speed 85.82MB/s
Report