[Question] Sampling kernel only support FP32 now? #531

yz-tang · 2024-10-15T11:15:18Z

I found test_sampling.cu, there is only for FP32 test。I try use FP16, It not work.

yzh119 · 2024-10-16T08:48:12Z

It's easy to add support for fp16:

In

Lines 39 to 40 in d81af97

    
           cudaError_t status = sampling::SamplingFromProb(static_cast<float*>(probs.data_ptr()), 
        
                                                           static_cast<float*>(uniform_samples.data_ptr()),

(and all other functions in this file), we cast all inputs to fp32

flashinfer/python/csrc/sampling.cu

Lines 33 to 40 in d81af97

    
           probs = probs.to(torch::kFloat32); 
        
           uniform_samples = uniform_samples.to(torch::kFloat32); 
        
           cudaStream_t torch_current_stream = c10::cuda::getCurrentCUDAStream(device.index()); 
        
           auto samples = torch::empty({batch_size}, torch::dtype(torch::kInt32).device(device)); 
        
           cudaError_t status = sampling::SamplingFromProb(static_cast<float*>(probs.data_ptr()), 
        
                                                           static_cast<float*>(uniform_samples.data_ptr()),

. To use fp16 kernels, we just need to dispatch different data types using the dispatch macro (

flashinfer/python/csrc/pytorch_extension_utils.h

Line 26 in d81af97

    
           #define DISPATCH_PYTORCH_DTYPE_TO_CTYPE_FP16(pytorch_dtype, c_type, ...)                 \

)

But as you mentioned, fp16 might fail some extreme cases because the fp16 probabilities might not sum up to 1 anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Sampling kernel only support FP32 now? #531

[Question] Sampling kernel only support FP32 now? #531

yz-tang commented Oct 15, 2024

yzh119 commented Oct 16, 2024 •

edited

Loading

[Question] Sampling kernel only support FP32 now? #531

[Question] Sampling kernel only support FP32 now? #531

Comments

yz-tang commented Oct 15, 2024

yzh119 commented Oct 16, 2024 • edited Loading

yzh119 commented Oct 16, 2024 •

edited

Loading