Skip to content

Latest commit

 

History

History
24 lines (18 loc) · 1.22 KB

README.md

File metadata and controls

24 lines (18 loc) · 1.22 KB

BERT based Starter Kit for IndoML'24 DataThon

A simple BERT based baseline for DataThon @ IndoML'24, colab_logo.

Data & Details: After registering here, you can get data from here; download the raw data and store it in a directory (ideally, called data/).

Preprocess: Run

python src/preprocess.py --data_dir <your_data_directory>

Download BERT model and tokenizer: You also need the BERT model and tokenizer in appropriate directories, run,

python src/downloadBERT.py

Train & Test: The rest of the code works on all configurations from single CPU, multi-GPU to multi-machine.

python3 src/trainer.py --output <some_output_column>

The code will automatically pick up multiple GPUs, or you can also launch by prefixing it with CUDA_VISIBLE_DEVICES=x,y,z. Feel free to modify any components of this code as you see fit.

All the best!