Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: partial unit tests failed #479

Open
zhyncs opened this issue Aug 28, 2024 · 2 comments
Open

bug: partial unit tests failed #479

zhyncs opened this issue Aug 28, 2024 · 2 comments
Assignees

Comments

@zhyncs
Copy link
Member

zhyncs commented Aug 28, 2024

latest main, A100

for ele in $(ls); do python3 -m pytest ${ele}; done
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0
rootdir: /flashinfer/python
plugins: anyio-4.4.0
collected 270 items

test_alibi.py ..............F..............F.........................................................................................................................sss..ss...s... [ 61%]
......sss..ss...s.........sss..ss...s.........sss..ss...s.........sss..ss...s.........sss..ss...s........                                                                           [100%]

======================================================================================== FAILURES =========================================================================================
_________________________________________________________________________ test_single_decode_alibi[128-32-33001] __________________________________________________________________________

seq_len = 33001, num_heads = 32, head_dim = 128

    @pytest.mark.parametrize("seq_len", [1, 9, 81, 729, 33001])
    @pytest.mark.parametrize("num_heads", [4, 8, 32])
    @pytest.mark.parametrize("head_dim", [128, 256])
    def test_single_decode_alibi(
        seq_len,
        num_heads,
        head_dim,
    ):
        q = torch.randn(num_heads, head_dim).to(0).half()
        k = torch.randn(seq_len, num_heads, head_dim).to(0).half()
        v = torch.randn(seq_len, num_heads, head_dim).to(0).half()

        o = flashinfer.single_decode_with_kv_cache(q, k, v, pos_encoding_mode="ALIBI")
        mask = torch.ones(1, seq_len, dtype=torch.bool).to(0)
        o_ref = alibi_attention(q.unsqueeze(0), k, v, mask).squeeze(0)
>       torch.testing.assert_close(
            o.cpu().numpy(), o_ref.cpu().numpy(), rtol=1e-3, atol=1e-3
        )
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 7 / 4096 (0.2%)
E       Greatest absolute difference: 0.00244140625 at index (2, 1) (up to 0.001 allowed)
E       Greatest relative difference: 3.302734375 at index (2, 19) (up to 0.001 allowed)

test_alibi.py:39: AssertionError
_________________________________________________________________________ test_single_decode_alibi[256-32-33001] __________________________________________________________________________

seq_len = 33001, num_heads = 32, head_dim = 256

    @pytest.mark.parametrize("seq_len", [1, 9, 81, 729, 33001])
    @pytest.mark.parametrize("num_heads", [4, 8, 32])
    @pytest.mark.parametrize("head_dim", [128, 256])
    def test_single_decode_alibi(
        seq_len,
        num_heads,
        head_dim,
    ):
        q = torch.randn(num_heads, head_dim).to(0).half()
        k = torch.randn(seq_len, num_heads, head_dim).to(0).half()
        v = torch.randn(seq_len, num_heads, head_dim).to(0).half()

        o = flashinfer.single_decode_with_kv_cache(q, k, v, pos_encoding_mode="ALIBI")
        mask = torch.ones(1, seq_len, dtype=torch.bool).to(0)
        o_ref = alibi_attention(q.unsqueeze(0), k, v, mask).squeeze(0)
>       torch.testing.assert_close(
            o.cpu().numpy(), o_ref.cpu().numpy(), rtol=1e-3, atol=1e-3
        )
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 3 / 8192 (0.0%)
E       Greatest absolute difference: 0.0018310546875 at index (0, 227) (up to 0.001 allowed)
E       Greatest relative difference: 0.01119232177734375 at index (0, 227) (up to 0.001 allowed)

test_alibi.py:39: AssertionError
================================================================================= short test summary info =================================================================================
FAILED test_alibi.py::test_single_decode_alibi[128-32-33001] - AssertionError: Tensor-likes are not close!
FAILED test_alibi.py::test_single_decode_alibi[256-32-33001] - AssertionError: Tensor-likes are not close!
================================================================== 2 failed, 232 passed, 36 skipped in 69.47s (0:01:09) ===================================================================
@zhyncs zhyncs self-assigned this Aug 28, 2024
@yzh119
Copy link
Collaborator

yzh119 commented Aug 29, 2024

It's because in flashinfer we currently use -5e4 as a surrogate of -inf, and when sequence length is large the alibi bias might be smaller than -5e4. The main reason of choosing -5e4 is that -inf cannot do some operations (and will result in nan) and we want this value is within the valid data range of the data type of m (it's fp32 in almost all cases but we provide an option of using fp16 when allow_fp16_qk_reduction=True).

@zhyncs
Copy link
Member Author

zhyncs commented Aug 29, 2024

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants