BERT based Starter Kit for IndoML'24 DataThon

A simple BERT based baseline for DataThon @ IndoML'24, .

Data & Details: After registering here, you can get data from here; download the raw data and store it in a directory (ideally, called data/).

Preprocess: Run

python src/preprocess.py --data_dir <your_data_directory>

Download BERT model and tokenizer: You also need the BERT model and tokenizer in appropriate directories, run,

python src/downloadBERT.py

Train & Test: The rest of the code works on all configurations from single CPU, multi-GPU to multi-machine.

python3 src/trainer.py --output <some_output_column>

The code will automatically pick up multiple GPUs, or you can also launch by prefixing it with CUDA_VISIBLE_DEVICES=x,y,z. Feel free to modify any components of this code as you see fit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BERT based Starter Kit for IndoML'24 DataThon

All the best!

Files

README.md

Latest commit

History

README.md

File metadata and controls

BERT based Starter Kit for IndoML'24 DataThon

All the best!