OPUS Russian Open Speech To Text Dataset v1.01
Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin

ru_open_stt_opus (38 files)
archives/asr_calls_2_val.tar.gz 805.25MB
archives/asr_public_phone_calls_1.tar.gz 3.41GB
archives/asr_public_phone_calls_2.tar.gz 10.12GB
archives/asr_public_stories_1.tar.gz 719.09MB
archives/asr_public_stories_2.tar.gz 1.50GB
archives/buriy_audiobooks_2_val.tar.gz 496.48MB
archives/private_buriy_audiobooks_2.tar.gz 27.74GB
archives/public_lecture_1.tar.gz 122.51MB
archives/public_series_1.tar.gz 319.23MB
archives/public_speech_manifest.tar.gz 50.94GB
archives/public_youtube1120.tar.gz 20.43GB
archives/public_youtube1120_hq.tar.gz 5.31GB
archives/public_youtube700.tar.gz 13.09GB
archives/public_youtube700_val.tar.gz 469.33MB
archives/radio_2.tar.gz 26.45GB
archives/radio_pspeech_sample_manifest.tar.gz 12.27GB
archives/radio_v4_add_manifest.tar.gz 3.04GB
archives/radio_v4_manifest.tar.gz 189.01GB
archives/tts_russian_addresses_rhvoice_4voices.tar.gz 13.86GB
manifests/asr_calls_2_val.csv 1.05MB
manifests/asr_public_phone_calls_1.csv 26.39MB
manifests/asr_public_phone_calls_2.csv 60.34MB
manifests/asr_public_stories_1.csv 4.84MB
manifests/asr_public_stories_2.csv 7.19MB
manifests/buriy_audiobooks_2_val.csv 744.95kB
manifests/private_buriy_audiobooks_2.csv 119.40MB
manifests/public_lecture_1.csv 660.11kB
manifests/public_series_1.csv 1.92MB
manifests/public_speech_manifest.csv 132.35MB
manifests/public_youtube1120.csv 141.83MB
manifests/public_youtube1120_hq.csv 39.34MB
manifests/public_youtube700.csv 74.60MB
manifests/public_youtube700_val.csv 679.15kB
manifests/radio_2.csv 43.04MB
manifests/radio_pspeech_sample_manifest.csv 32.76MB
manifests/radio_v4_add_manifest.csv 7.03MB
manifests/radio_v4_manifest.csv 515.81MB
manifests/tts_russian_addresses_rhvoice_4voices.csv 220.26MB
Type: Dataset
Tags: Dataset, russian, asr, stt, TTS

Bibtex:
@article{,
title= {OPUS Russian Open Speech To Text Dataset v1.01},
journal= {},
author= {Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin},
year= {},
url= {https://github.com/snakers4/open_stt/},
abstract= {v1.0-beta 

Arguably the largest public Russian STT dataset up to date:
15m utterances;
20 000 hours;
2.3 TB (in mono .wav format in int16);

For more information please visit  https://github.com/snakers4/open_stt/},
keywords= {Dataset, russian, asr, stt, TTS},
terms= {https://github.com/snakers4/open_stt/#license},
license= {CC-NC-BY},
superseded= {}
}

Report