Skip to content

Working env

  • create conda env

  • install python v < 3.11, > 3.7 (3.10)

  • install Chocolatey (https://chocolatey.org/install)

  • install ffmpeg

    sudo apt update && sudo apt install ffmpeg

  • install rust

    pip install setuptools-rust

  • install Youtube-dl using:

    sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
    sudo chmod a+rx /usr/local/bin/youtube-dl
    

路 馃 Repositories

In order to create and mange 馃 Repositories (datasets, for example) we need to install the huggingface_hub CLI and run the login command. Following 馃 Docs, this can be done by just running these commands on our conda environment:

conda install -c conda-forge huggingface_hub
huggingface-cli login

The huggingface-cli login command will ask us for a token, which is automatically generated in our account. We only have to follow the clear instructions that appear on the terminal.

路 馃 Datasets library

Before we start creating our flamenco audio dataset, we need to setup the environment and install the appropriate packages. Hugging Face (馃) Datasets is a library for easily accessing and sharing datasets. It works on Python 3.7+. We will follow the installation instructions provided in the 馃 docs.

We go for the conda option and install it on our whisper environment using:

conda install -c huggingface -c conda-forge datasets