v0.1.2
christopher-w-murphy
released this
26 Aug 18:38
·
37 commits
to main
since this release
The attention bias in MosaicBERT has attn_bias.ndim == 4
, so I generalized flash_attention_n
to accomodate this.