OPUS Russian Open Speech To Text Dataset v1.01
Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin

folder ru_open_stt_opus (38 files)
filemanifests/tts_russian_addresses_rhvoice_4voices.csv 220.26MB
filemanifests/radio_v4_manifest.csv 515.81MB
filemanifests/radio_v4_add_manifest.csv 7.03MB
filemanifests/radio_pspeech_sample_manifest.csv 32.76MB
filemanifests/radio_2.csv 43.04MB
filemanifests/public_youtube700_val.csv 679.15kB
filemanifests/public_youtube700.csv 74.60MB
filemanifests/public_youtube1120_hq.csv 39.34MB
filemanifests/public_youtube1120.csv 141.83MB
filemanifests/public_speech_manifest.csv 132.35MB
filemanifests/public_series_1.csv 1.92MB
filemanifests/public_lecture_1.csv 660.11kB
filemanifests/private_buriy_audiobooks_2.csv 119.40MB
filemanifests/buriy_audiobooks_2_val.csv 744.95kB
filemanifests/asr_public_stories_2.csv 7.19MB
filemanifests/asr_public_stories_1.csv 4.84MB
filemanifests/asr_public_phone_calls_2.csv 60.34MB
filemanifests/asr_public_phone_calls_1.csv 26.39MB
filemanifests/asr_calls_2_val.csv 1.05MB
filearchives/tts_russian_addresses_rhvoice_4voices.tar.gz 13.86GB
filearchives/radio_v4_manifest.tar.gz 189.01GB
filearchives/radio_v4_add_manifest.tar.gz 3.04GB
filearchives/radio_pspeech_sample_manifest.tar.gz 12.27GB
filearchives/radio_2.tar.gz 26.45GB
filearchives/public_youtube700_val.tar.gz 469.33MB
filearchives/public_youtube700.tar.gz 13.09GB
filearchives/public_youtube1120_hq.tar.gz 5.31GB
filearchives/public_youtube1120.tar.gz 20.43GB
filearchives/public_speech_manifest.tar.gz 50.94GB
filearchives/public_series_1.tar.gz 319.23MB
filearchives/public_lecture_1.tar.gz 122.51MB
filearchives/private_buriy_audiobooks_2.tar.gz 27.74GB
filearchives/buriy_audiobooks_2_val.tar.gz 496.48MB
filearchives/asr_public_stories_2.tar.gz 1.50GB
filearchives/asr_public_stories_1.tar.gz 719.09MB
filearchives/asr_public_phone_calls_2.tar.gz 10.12GB
filearchives/asr_public_phone_calls_1.tar.gz 3.41GB
filearchives/asr_calls_2_val.tar.gz 805.25MB
Type: Dataset
Tags: Dataset, russian, asr, stt, TTS

Bibtex:
@article{,
title= {OPUS Russian Open Speech To Text Dataset v1.01},
journal= {},
author= {Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin},
year= {},
url= {https://github.com/snakers4/open_stt/},
abstract= {v1.0-beta 

Arguably the largest public Russian STT dataset up to date:
15m utterances;
20 000 hours;
2.3 TB (in mono .wav format in int16);

For more information please visit  https://github.com/snakers4/open_stt/},
keywords= {Dataset, russian, asr, stt, TTS},
terms= {https://github.com/snakers4/open_stt/#license},
license= {CC-NC-BY},
superseded= {}
}


Send Feedback