Speech data recording, storing, pre-processing
Recording:
- 5-10 hrs of speech
- Texts that are not copyrighted/we are allowed to use/share with open licence
- Record preferably studio quality data (44.1 kHz, 16-bit minimum) in .wav
- Speaker metadata, informed consent forms
While recording, keep track on mistakes, hesitations or any changes the speaker does to the text while reading!!!
Storing:
- Where to store a) speech data and b) metadata
Pre-processing:
- Aligning speech data and text on phoneme level (use WebMAUS)
- Filename system - consistency!
- A big data table showing the content and information of EACH FILENAME
- Allows mass renaming and other mass python operations to be done to the files