Skip to content

Releases: flashinfer-ai/flashinfer

v0.0.6

21 Jun 18:47
c146e06
Compare
Choose a tag to compare

0.0.6 (2024-06-21)

Performance Improvements

  • use 1x4 warp layout for small query length (not activated because of large binary size) (#322) (4e89b4d)

v0.0.5

20 Jun 08:42
a0297e7
Compare
Choose a tag to compare

0.0.5 (2024-06-20)

Highlights

Acknowledgement

We thank @ibsidorenko, @LiuXiaoxuanPKU, @Yard1 @AgrawalAmey, @xuzhenqi, @mgerstgrasser, @esmeetu, @yz-tang, @HSQ79815, @Qubitium, @shreygupta2809, @sighingnow, @vinx13, @tqchen, @merrymercy, @comaniac and many others for their contributions and helpful discussions for 0.0.5 release.

Refactor

  • support any GQA group size for tensor-cores kernels (#301) (c111ca)
  • support any page size for tensor-cores kernels (#306) (82fd8c)

Features

  • add use_tensor_cores option to decode kernels to accelerate GQA (#317) (3b50dd5)
  • add group gemm operators (#282) (e08ba42)
  • initial support of distributed operators (#289) (03553da)
  • initial support of logits hook (#298) (ab1e2ad)
  • Separate Q and KV dtypes for decode (#286) (5602659)
  • support cuda graph for batched multi-query(prefill/append) attention (#275) (83ceb67)
  • support cuda graph for batched multi-query(prefill/append) attention (#277) (24cc583)
  • support custom attention mask in prefill/append attention kernels (#266) (7304282)
  • fused speculative sampilng kernels (#259) (cea2bb)
  • expose sampling APIs in pytorch (#238) (092902)

Performance Improvements

v0.0.4

02 May 07:52
62343e6
Compare
Choose a tag to compare

0.0.4 (2024-05-01)

Features

  • pytorch 2.3 support
  • more gqa group sizes
  • add mma instructions for fp8 (#179) (d305798)
  • mma rowsum for fp8 (#180) (5af935c)
  • support any num_heads for get_alibi_slope (#200) (b217a6f)

Bug Fixes

  • fix python package dispatch error message (#182) (8eed01c)

v0.0.3

08 Mar 10:06
238563f
Compare
Choose a tag to compare

0.0.3 (2024-03-08)

Features

Misc

  • add stream argument in BeginForwardFunction of TVMWrapper (#164) (fabfcb5)

Bug Fixes

Performance Improvements

  • multiple q by sm_scale in decode kernels (#144) (660c559)

Release v0.0.2

16 Feb 11:38
1b75874
Compare
Choose a tag to compare

Changelog

  • Support RoPE position info in batch prefill/decode kernels #69 (C++ API only)
  • Use Torch's current stream for ops #111
  • Add pre-built wheels for different pytorch versions. #110
  • Add pre-built wheels for py39 #114

Release v0.0.1

31 Jan 19:03
c55cd60
Compare
Choose a tag to compare