Gemma 2 2B crashes on mobile phone #524

flatsiedatsie · 2024-08-04T21:13:12Z

Whenever I try to load it, it crashes Chrome.

This is on a Pixel 6a with 6Gb of RAM.

Context is set to 1K.
16 bit WebGPU is available.
Using latest version of WebLLM from CDN.

To make sure it wasn't simply too big, I tried running Gemma 2 2B via Wllama (1.63GB Q4 .gguf). That did run.

Additional tests

I also tried Phi 3 Mini. The same thing happened. It crashed the browser, while the .gguf version via Wllama did manage to run (or perhaps I should say "crawl").
I also tried Tiny Llama. That did run, and at high speed too.

CharlieFRuan · 2024-08-04T22:34:11Z

Do you happen to have the console log? Besides, what is the maxStorageBufferBindingSize in your webgpureport.org?

flatsiedatsie · 2024-08-05T07:22:20Z

It's 2GB.

// fulll screenshots:

CharlieFRuan · 2024-08-05T17:19:56Z

It may be due to one of the limits being exceeded (not necessarily the buffer size, 2GB sounds enough). Gemma requires a larger size for certain buffers than other models due to its large vocab size 256K, compared to other models like Llama3.1 being 128K. I might have to look into this later

Edit: actually, just saw that you mentioned Phi 3 Mini crashes as well. I will try to look into this. Meanwhile, if you have some sort of log, it would be very helpful, perhaps with remote debugging.

flatsiedatsie · 2024-08-05T20:25:58Z

I'm already using USB debugging, so I can help you there.

What kind of info would you like? Is there a debug logging mode I can activate?

// edit: I went through my recent error screenshots and got a few that belong to Web-LLM. Not sure to what degree these relate to this issue though.

CharlieFRuan · 2024-08-05T20:27:45Z

Ahh yes, there is a DEBUG mode here: #519 (comment)

Any log that may relate to the crash would be helpful, thanks!

flatsiedatsie · 2024-08-05T21:16:16Z

I'm using a slightly different UI, my own project :-)

Can I enable debug mode from Javascript?

CharlieFRuan · 2024-08-05T21:19:08Z

Ah yes! There is a logLevel option in EngineConfig. You can set it to INFO like here https://github.com/mlc-ai/web-llm/blob/main/examples/simple-chat-ts/src/simple_chat.ts#L345

flatsiedatsie · 2024-08-05T21:32:24Z

Already found it, thanks :-)

window.web_llm_worker = new Worker(
						new URL('./web_llm_worker.js', import.meta.url), { type: 'module' }
					)
					
					// Creating the WebLLM engine
					window.web_llm_engine = await webllm.CreateWebWorkerMLCEngine(
						window.web_llm_worker,
				    	web_llm_model_id,
				    	{ 
							initProgressCallback: function (mes) { 
								//console.log('WebLLM init progress message received: ', mes); 
								window.handle_web_llm_init_progress(mes); 
							}, 
							appConfig: window.web_llm_app_config,
							logLevel: "DEBUG"
						},
						chatOpts
					);

flatsiedatsie · 2024-08-05T23:01:00Z

What the heck.. now that I've enabled debugging.. Gemma 2 2B suddenly works 0_0.

Phi 3 mini crashed, but retrying a few times I managed to get a response!

So strange.

// ..and then it crashed again. No interesting output in the debug though.

CharlieFRuan · 2024-08-06T01:23:05Z

I see... thanks for the info!

CharlieFRuan · 2024-08-06T01:48:44Z

There are various issues similar to this on mobile devices, probably something related to WebGPU on Android Chromes. I don't have something on top of my mind. Not sure if updating Android version and using the latest Chrome Canary would alleviate.

flatsiedatsie · 2024-08-06T07:25:09Z

The phone went into standby, and then when I woke it up and tried running inference I saw this:

It seems to be related to 'losing the WebGPU'. Should I call MLCEngine.reload(model) before each inference? Or can I detect if the model has been removed from memory by the OS somehow? How can I hook into A valid external Instance reference no longer exist?

CharlieFRuan · 2024-08-06T18:54:51Z

Quick question, are you using WebWorker, ServiceWorker, or the plain MLCEngine? For ServiceWorker, my understanding is that this PR has fixed this: #471

flatsiedatsie · 2024-08-06T22:56:12Z

WebWorker.

I noticed I hadn't put a try-catch around WebLLM there (a testament to it's quality), but I've added that now in the hopes of catching the GPU disappeared event, and then simple restarting the engine.

WebLLM says "please initialize again", but what a setting to let WebLLM do this by itself? "Stay alive until told otherwise" could even be a default?

CharlieFRuan · 2024-08-10T03:07:49Z

This seems to be an issue where, the web worker is terminated due to the phone going standby, but your frontend logic's states are still preserved, hence directly sending a request, expecting the model to be loaded. We had similar issue with service worker before: #471.

This PR #533 moves the fix for service worker to web worker as well. You can test it locally, or try it out when the new npm is published.

The main logic is that, when the backend realizes there is a mismatch between the frontend's expected loaded model, and the backend's actually-loaded model, the backend calls reload() internally.

CharlieFRuan · 2024-08-10T07:36:35Z

This should be added to npm 0.2.56. Let me know if the issue is fixed!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma 2 2B crashes on mobile phone #524

Gemma 2 2B crashes on mobile phone #524

flatsiedatsie commented Aug 4, 2024 •

edited

Loading

CharlieFRuan commented Aug 4, 2024 •

edited

Loading

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

CharlieFRuan commented Aug 5, 2024 •

edited

Loading

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

CharlieFRuan commented Aug 5, 2024

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

CharlieFRuan commented Aug 5, 2024

flatsiedatsie commented Aug 5, 2024

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

CharlieFRuan commented Aug 6, 2024

CharlieFRuan commented Aug 6, 2024

flatsiedatsie commented Aug 6, 2024 •

edited

Loading

CharlieFRuan commented Aug 6, 2024

flatsiedatsie commented Aug 6, 2024 •

edited

Loading

CharlieFRuan commented Aug 10, 2024

CharlieFRuan commented Aug 10, 2024

Gemma 2 2B crashes on mobile phone #524

Gemma 2 2B crashes on mobile phone #524

Comments

flatsiedatsie commented Aug 4, 2024 • edited Loading

CharlieFRuan commented Aug 4, 2024 • edited Loading

flatsiedatsie commented Aug 5, 2024 • edited Loading

CharlieFRuan commented Aug 5, 2024 • edited Loading

flatsiedatsie commented Aug 5, 2024 • edited Loading

CharlieFRuan commented Aug 5, 2024

flatsiedatsie commented Aug 5, 2024 • edited Loading

CharlieFRuan commented Aug 5, 2024

flatsiedatsie commented Aug 5, 2024

flatsiedatsie commented Aug 5, 2024 • edited Loading

CharlieFRuan commented Aug 6, 2024

CharlieFRuan commented Aug 6, 2024

flatsiedatsie commented Aug 6, 2024 • edited Loading

CharlieFRuan commented Aug 6, 2024

flatsiedatsie commented Aug 6, 2024 • edited Loading

CharlieFRuan commented Aug 10, 2024

CharlieFRuan commented Aug 10, 2024

flatsiedatsie commented Aug 4, 2024 •

edited

Loading

CharlieFRuan commented Aug 4, 2024 •

edited

Loading

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

CharlieFRuan commented Aug 5, 2024 •

edited

Loading

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

flatsiedatsie commented Aug 5, 2024 •

edited

Loading

flatsiedatsie commented Aug 6, 2024 •

edited

Loading

flatsiedatsie commented Aug 6, 2024 •

edited

Loading