Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy of Phi-3.5 Vision Models in Web-LLM is way off #586

Open
atlury opened this issue Sep 30, 2024 · 4 comments
Open

Accuracy of Phi-3.5 Vision Models in Web-LLM is way off #586

atlury opened this issue Sep 30, 2024 · 4 comments

Comments

@atlury
Copy link

atlury commented Sep 30, 2024

I have been testing the accuracy of Phi-3.5 Vision models in Web-LLM https://chat.webllm.ai/#/chat vs Nvidia nims available https://build.nvidia.com/microsoft/phi-3_5-vision-instruct

  1. The Nvidia Nims results
    (a) Image - square root image
    Question - describe the image mathematically and solve it
    Answer - The image shows a hand-drawn representation of a square root symbol with the number 25 underneath it. Mathematically, this can be represented as √25. The square root of 25 is 5, so the solution to the mathematical expression is 5.

(b) Image - x+y=4 image
Question - describe the image mathematically and solve it
Answer - The image shows a simple algebraic equation and two separate equations. The main equation is 'x + y = 4', which is a linear equation with two variables, x and y. The first sub-equation is 'x = 2', which gives the value of x. The second sub-equation is 'y = ?', which is asking for the value of y. To solve for y, we substitute the value of x into the main equation: 2 + y = 4. By subtracting 2 from both sides, we find that y = 2. Therefore, the solution to the system of equations is x = 2 and y = 2.

(c) Image - x^2+y^2=100 image
Question - describe the image mathematically and solve it
Answer - The image contains a handwritten equation and a question. The equation is:
x^2 + y^2 = 100
The question asks for the value of y when x is given as 5. To solve for y, we substitute x with 5 in the equation:
(5)^2 + y^2 = 100 25 + y^2 = 100
Now, we solve for y^2:
y^2 = 100 - 25 y^2 = 75
Taking the square root of both sides to solve for y:
y = ±√75 y = ±√(25 * 3) y = ±5√3
Therefore, the value of y is ±5√3.

  1. The Web-LLM (fp32) results from https://chat.webllm.ai/#/chat
    (a) Image - square root image
    Question - describe the image mathematically and solve it
    Answer - The image provided does not contain a clear mathematical express to interpret or solve it......

(b) Image - x+y=4 image
Question - describe the image mathematically and solve it
Answer - The image provided seems to contain a hand-drawn image of a mathematical expression......

(c) Image - x^2+y^2=100 image
Question - describe the image mathematically and solve it
Answer -

The web-llm fp32 phi-3.5 vision doesn't really give the same results and doesnt solve it. Any ideas?

Attached images
sq
x y
xplusysq

@CharlieFRuan
Copy link
Contributor

Thanks! Will take a look this week. Though the baseline you have seems to be an un-quantized version of phi3.5-vision, while WebLLM uses 4bit quantized (hence the code name q4).

@atlury
Copy link
Author

atlury commented Sep 30, 2024

Yes thanks @CharlieFRuan.

I understand its quantized but will it have such a large difference? I will wait for your results.

@TomLucidor
Copy link

Seconding this, are there pre-established results or articles that point to either in favor of Q4 and Q5, or higher quantization like Q6 and Q8?

@atlury
Copy link
Author

atlury commented Oct 7, 2024

@TomLucidor I havent seen much benchmarks in terms of accuracy for quantized vs unquantized models especially for the above vision models. I plan to try out the openvino version....

I have played with temperature and top-p parameters and there isnt much improvement in terms of quality output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants