Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAUI .NET 8.0 built with Full AOT Compilation reports a lot of Mono : AOT NOT FOUND #101135

Open
crui3er opened this issue Apr 15, 2024 · 96 comments

Comments

@crui3er
Copy link

crui3er commented Apr 15, 2024

Description

When I build maui application for Android with full aot compilation, I see a lot of aot not found log messags for mono aot logger.
Especially for generic class/methods. Even for the ones I expect that should be statically detected.
E.g.:
04-15 18:43:11.957 28518 28518 D Mono : AOT NOT FOUND: System.Collections.Concurrent.ConcurrentDictionary`2<Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceIdentifier, Microsoft.Extensions.DependencyInjection.ServiceProvider/ServiceAccessor>:.ctor ().

04-15 18:43:23.703 28518 28518 D Mono : AOT NOT FOUND: System.Runtime.CompilerServices.AsyncTaskMethodBuilder1<System.Threading.Tasks.VoidTaskResult>:GetStateMachineBox<MauiAotTest.Components.Pages.Weather/<OnInitializedAsync>d__2> (MauiAotTest.Components.Pages.Weather/<OnInitializedAsync>d__2&,System.Threading.Tasks.Task1<System.Threading.Tasks.VoidTaskResult>&).

See attached files: one with full log for the app and the one with aot not found only messages.

LogNote_47021f4c_20240415_18.43.07_tst2_aot_full_not_found.txt
LogNote_47021f4c_20240415_18.43.07_tst2_aot_full.txt

Steps to Reproduce

  1. Create a template maui blazor app
  2. Build it with AndroidEnableProfiledAot=false and install on Android device
  3. Enable mono aot logging with command adb shell setprop debug.mono.log default,timing=bare,assembly,mono_log_level=debug,mono_log_mask=aot
  4. Captute log with logcat
  5. See a lot of aot not found (exclude the ones for wrappers) log messages.

Link to public reproduction project repository

https://github.com/crui3er/MauiAotTest

Version with bug

8.0.3 GA

Is this a regression from previous behavior?

Not sure, did not test other versions

Last version that worked well

Unknown/Other

Affected platforms

Android

Affected platform versions

No response

Did you find any workaround?

No

Relevant log output

No response

@crui3er
Copy link
Author

crui3er commented Apr 16, 2024

It's also not clear whether should AOT compiler for Android process generic types and methods when $(AndroidEnableProfiledAot)==false
I found it's confusing when reading https://devblogs.microsoft.com/dotnet/dotnet-8-performance-improvements-in-dotnet-maui/#androidstripilafteraot
Looks like it works only for iOS.
Can @jonathanpeppers please shead a light on the subject.
What can I read or may be you can recommend how to reduce jit as much as possible during starting app.

@PureWeen PureWeen transferred this issue from dotnet/maui Apr 16, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 16, 2024
@jonathanpeppers
Copy link
Member

I think this is a known issue, some info at the bottom of:

There are these tiny methods that are always JIT no matter what:

02-23 09:03:46.327 10401 10401 D Mono    : AOT NOT FOUND: (wrapper runtime-invoke) object:runtime_invoke_void (object,intptr,intptr,intptr).
02-23 09:03:46.334 10401 10401 D Mono    : AOT NOT FOUND: (wrapper managed-to-native) System.Diagnostics.Debugger:IsAttached_internal ().
02-23 09:03:46.367 10401 10401 D Mono    : AOT NOT FOUND: (wrapper native-to-managed) Android.Runtime.JNINativeWrapper:Wrap_JniMarshal_PPL_V (intptr,intptr,intptr).

But I don't think these are "actual methods", but tiny wrappers that enable generics or p/invoke. There is some JIT happening there currently on Android, which might be because we use MONO_AOT_MODE=Normal and allow JIT. I believe iOS/Catalyst would use MONO_AOT_MODE=Full, because those platforms require it.

@ivanpovazan
Copy link
Member

@crui3er just for my understanding, is there a crash that happens with the application or the question is just about unexpected log messages?

Regarding full AOT compilation on Android, afaik you would have to set -p:AndroidAotMode=Full, @jonathanpeppers please confirm, that would force it (although I am not sure if we are officially supporting this mode). If it isn't set some wrappers will always fallback to JIT during runtime (and AOT NOT FOUND in logs is expected behaviour).

@crui3er
Copy link
Author

crui3er commented Apr 17, 2024

@ivanpovazan Not, there is no crash, but there are unexpected log messages.
So I am aware that log messages for wrappers are ok (AOT NOT FOUND: (wrapper xxx)). @jonathanpeppers I do not take them in account.
In attached sample (in fact, it's template maui blazor app) jitting is used for stuff in MauiAotTest.Components.Pages.Weather.OnInitializedAsync method or during service provide usage (see my first comment).

In my maui app I working on there are a lot of log messages with AOT NOT FOUND. Even with using recorded aot profile there are jitted methods are which are in recorded profile.

So I am trying to figure out what is going wrong with this small sample app.

@ivanpovazan
Copy link
Member

In my maui app I working on there are a lot of log messages with AOT NOT FOUND. Even with using recorded aot profile there are jitted methods are which are in recorded profile.

Could you share more information on that: like the aot profile that you are using and which methods you would not expect to see in the log?

@jonathanpeppers
Copy link
Member

I don’t think Full will work on Android, does that mode prevent JIT?

@fanyang-mono
Copy link
Member

Full AOT is what iOS uses, which does not JIT anything, since that's a requirement for iOS.

@crui3er
Copy link
Author

crui3er commented Apr 18, 2024

I tested with -p:AndroidAotMode=Full and it fails on start wtih ExecutionEngineException.

F mono-rt : [ERROR] FATAL UNHANDLED EXCEPTION: System.ExecutionEngineException: Attempting to JIT compile method '(wrapper other) void Java.Interop.JavaVMInterface:PtrToStructure (intptr,object)' while running in aot-only mode. See https://docs.microsoft.com/xamarin/ios/internals/limitations for more information.

@crui3er
Copy link
Author

crui3er commented Apr 18, 2024

Could you share more information on that: like the aot profile that you are using and which methods you would not expect to see in the log?

I recorded custom aot profile https://github.com/crui3er/MauiAotTest/blob/master/custom.aprof and do test with it.
During recording and testing I do the following actions: start app, go to 'Counter' page, click 'Click me' button several times, then go to 'Weather' page and finally go again to 'Counter' page.

Then I compared AOT NOT FOUND log records and stat file generated from custom profile https://github.com/crui3er/MauiAotTest/blob/master/custom.aprof.stat.txt

Here is a few examples of aot not found records in the log for methods mentioned in profile.

stat:

void System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start (MauiAotTest.Components.Pages.Weather/<OnInitializedAsync>d__2&) 

log:

Mono    : AOT NOT FOUND: System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start<MauiAotTest.Components.Pages.Weather/<OnInitializedAsync>d__2> (MauiAotTest.Components.Pages.Weather/<OnInitializedAsync>d__2&).

stat:

void System.Collections.Concurrent.ConcurrentDictionary`2<Microsoft.AspNetCore.Components.Routing.RouteKey, Microsoft.AspNetCore.Components.Routing.RouteTable>:.ctor ()

log:

Mono    : AOT NOT FOUND: System.Collections.Concurrent.ConcurrentDictionary`2<Microsoft.AspNetCore.Components.Routing.RouteKey, Microsoft.AspNetCore.Components.Routing.RouteTable>:.ctor ().

stat:

System.Type System.Text.Json.Serialization.JsonConverter`1<System.Threading.Tasks.VoidTaskResult>:get_Type ()

log:

Mono    : AOT NOT FOUND: System.Text.Json.Serialization.JsonConverter`1<System.Threading.Tasks.VoidTaskResult>:get_Type ().

Here is AOT NOT FOUND log records excluding the one for wrappers.
LogNote_47021f4c_20240418_16.32.26_tst2_custom_profile_filtered.txt

Sample app is updated. Now it uses recorded aot profile.

@fanyang-mono
Copy link
Member

@crui3er Thanks for providing more information about this. I will take a look at this.

@fanyang-mono fanyang-mono self-assigned this Apr 18, 2024
@ivanpovazan ivanpovazan removed the untriaged New issue has not been triaged by the area owner label Apr 18, 2024
@alexyakunin
Copy link

alexyakunin commented Apr 23, 2024

Hi there, just curious, are there any updates on that?

I want to add some context: the app we're working on shows ~20K methods in "AOT NOT FOUND" log entries - and that's just for the startup time. We were questioning whether AOT is even working at all - well, it does, coz there are also lots of "AOT FOUND" entries, but it's reasonable to assume that if you have even 50/50 split between these, you're probably getting just 50% of max. possible savings on JIT. And with AOT turned off, our startup time is almost 2x higher. In other words, if AOT issues would be fixed, our app's startup time would go from ~ 1.7s to probably just 0.3-0.5s or so, which is obviously a dramatic change.

Please let @crui3er know if you need anything else. We can provide logs from the actual app + give you instructions on how to reproduce the issue there, if a small sample won't be enough to identify the root cause(s).

And IMO it's super important to address this: compared to Native AOT, profile-based AOT (assuming it 100% works) is what most of mobile apps need. It allows to balance between the app size and the speed of the most crucial parts of the app, which is almost always the startup time for mobile apps. So nearly any MAUI and Blazor Hybrid app would benefit from this heavily.

@alexyakunin
Copy link

alexyakunin commented Apr 23, 2024

Also, should we change the title of this issue? It's not about just full AOT, it's about both full and profiled AOT. And it's understandable why full AOT may miss or intentionally omit some methods (the # of possible generic instances explodes exponentially with the codebase size), but profiled AOT is expected to produce AOT code at least for every method from the AOT profile.

@fanyang-mono
Copy link
Member

I was able to build and run the app provided here. When setting AndroidEnableProfiledAot=true and enabling logging. I could see the 2/3 example @crui3er provided earlier, for example 1, they are not the same method.

I also noticed that this profile was gathered using the legacy profiler. Next, I will try the new profiler to see if this issue still exists. If so, I will investigate further.

@crui3er
Copy link
Author

crui3er commented Apr 30, 2024

I also noticed that this profile was gathered using the legacy profiler. Next, I will try the new profiler to see if this issue still exists. If so, I will investigate further.

What is a new profiler and how to use it?

@fanyang-mono
Copy link
Member

fanyang-mono commented Apr 30, 2024

Actually, the new profiler is not fully supported on Android yet. The profiler I was talking about is this: https://github.com/dotnet/runtime/blob/main/docs/design/mono/profiled-aot.md

@crui3er
Copy link
Author

crui3er commented Apr 30, 2024

Actually, the new profiler is not fully supported on Android yet. The profiler I was talking about is this: https://github.com/dotnet/runtime/blob/main/docs/design/mono/profiled-aot.md

I think @jonathanpeppers did experimental support https://github.com/jonathanpeppers/xamarin-android/blob/dotnet-pgo/src/Xamarin.Android.Build.Tasks/Microsoft.Android.Sdk/targets/Microsoft.Android.Sdk.Aot.targets but it's not included in .NET8, is it?

@crui3er
Copy link
Author

crui3er commented Apr 30, 2024

I was able to build and run the app provided here. When setting AndroidEnableProfiledAot=true and enabling logging. I could see the 2/3 example @crui3er provided earlier, for example 1, they are not the same method.

@fanyang-mono Why do you think that methods in example 1 are not the same?

@alexyakunin
Copy link

alexyakunin commented Apr 30, 2024

@fanyang-mono funny enough, we were trying to use it as well - not sure what's the state of .mibc format in the long run, but it seems currently it's not supported by release version of .NET 8, but you can see here we were trying both .aprof and .mibc formats here: https://github.com/Actual-Chat/actual-chat/blob/dev/src/dotnet/App.Maui/App.Maui.csproj#L265 Collecting .mibc isn't an issue - as well as merging these profiles. But it looks like Android build targets just don't use <AndroidMibcProfile> & fallback to full AOT (i.e. behave like there is no profile provided).

We created this issue because there are so many "AOT_NOT_FOUND" for regular .aprof output that it works much worse than full AOT for us (which still produces tons of "AOT_NOT_FOUND", but this is at least explainable). The fact profiled AOT doesn't really work is a huge issue affecting every MAUI Android app, and an existential issue for apps like ours, where the startup time is crucial. I wrote earlier that fully working profiled AOT is expected to drop our startup time to 0.5...1s on Android - vs current 2s. In other words, it's as different as night and day.

Just to illustrate how bad this is:

  • Startup time on iPhohe 13 Pro in interpreter-only mode: 1.1s
  • Startup time on Galaxy S23 Ultra in full AOT mode: 1.7-1.8s
  • Startup time on Galaxy S23 Ultra with .aprof profile: 2.4s or more.

And unrelated, but: we spent a decent amount of time trying to enable AOT for iOS at least for some assemblies, but this inevitably leads to crashes. That's why for now the app works in interpreter mode there.

@alexyakunin
Copy link

alexyakunin commented May 1, 2024

Another illustration of how bad this is. Below is a screenshot from DotTrace showing where most of the time is spent in release build with full AOT - as I said, due to the issue listed here it's our best option for now. The timeline is constrained by [0..1.2s] interval. The original .nettrace was recorded via:

adb reverse tcp:9000 tcp:9000
start dotnet-dsrouter client-server -tcps 127.0.0.1:9000 -ipcc /tmp/maui-app --verbose debug
start dotnet-trace collect --diagnostic-port /tmp/maui-app --output "_Profiling/android.nettrace" --providers Microsoft-Windows-DotNETRuntime:0x1F000080018:5

image

As you see, JIT takes almost 74% of time there. So if profiled AOT would work, it could be just 0.3s or so.

@jonathanpeppers
Copy link
Member

There is a PR adding .mibc support, that still has some open questions (blockers):

However, I don't think using .mibc would be able to avoid any additional AOT NOT FOUND messages. It seems like it would just be using a more modern format for profiled AOT than what we use for the built-in profile today.

Regarding:

Startup time on Galaxy S23 Ultra in full AOT mode: 1.7-1.8s
Startup time on Galaxy S23 Ultra with .aprof profile: 2.4s or more.

Are you able to share a .nettrace or .speedscope file of startup? There might be something we can fix in .NET MAUI or recommend.

You could also record your own AOT profile, if the file size of "full AOT" (AOT everything) is too large. This would just be a tradeoff where it wouldn't AOT everything, but just the methods called during your recording.

@fanyang-mono
Copy link
Member

fanyang-mono commented May 1, 2024

I was able to build and run the app provided here. When setting AndroidEnableProfiledAot=true and enabling logging. I could see the 2/3 example @crui3er provided earlier, for example 1, they are not the same method.

@fanyang-mono Why do you think that methods in example 1 are not the same?

One is a non-generic method -> System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start
The other is a generic method -> System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start<MauiAotTest.Components.Pages.Weather/<OnInitializedAsync>d__2>

The second one has a type argument and needs to be compiled differently than the first one.

However, I just check the method System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start. It is a generic method. https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs#L21

It seems that custome.aprof.stat.txt contains some inaccurate information. Hopefully, custome.aprof contains the correct information.

@fanyang-mono
Copy link
Member

@alexyakunin @crui3er Could you compare the profile text files that you got from both methods to see if they are the same? I am curious about that. They might be the same.

@fanyang-mono
Copy link
Member

fanyang-mono commented May 1, 2024

I investigated the situation more and found out that

System.Type System.Text.Json.Serialization.JsonConverter`1<System.Threading.Tasks.VoidTaskResult>:get_Type ()

didn't pass the checks inside add_single_profile_method (https://github.com/dotnet/runtime/blob/main/src/mono/mono/mini/aot-compiler.c#L13822-L13870), which means it didn't meet the requirements to be AOT compiled. I believe the other two methods weren't compiled due to the same reason.

And the new profiler alos use add_single_profile_method to decide if a given method could be AOT compiled or not. Thus, @jonathanpeppers was correct, using the new profiler won't help more methods get AOT compiled either.

@vyacheslav-volkov
Copy link

I completely agree with the opinion that one of the most serious and long-standing issues with Xamarin.Android is the slow startup time for applications. If you search the internet for "Xamarin.Android slow startup," you'll find thousands of discussions on this topic. Even with all possible optimizations, including AOT compilation, the startup time remains slow. This problem is particularly noticeable with UI frameworks such as Avalonia, UNO, and MAUI. Developers simply don't have the ability to solve this problem on their own, as it is rooted in the fundamental aspects of the platform's operation, and a significant amount of time is spent on JIT compilation.

When .NET Native was introduced, I thought it would be the solution to the slow startup problem for Android. Starting with version .NET 8.0, it became stable for iOS, and I began actively using it. The results are impressive: a fairly large application on an iPhone X launches as quickly as any native application and even faster than a similar application on a Samsung Galaxy S22 Ultra, despite all possible optimizations for Android. The gap between the release of these devices is five years, and I dread to imagine the startup time on a five-year-old Android device.

However, observing the discussions about .NET Native and the activity around this topic, I get the impression that the team does not give this problem enough priority, and no specific timelines have been set for its resolution. For example, in one of the discussions on GitHub, the following is mentioned:

These will likely work under Mono, but will need to be fixed one day in .NET 10 or some future release that supports NativeAOT.
dotnet/android#8724

This gives the impression that allocating resources for NativeAOT on Android is not a priority, and instead, new releases include optimizations that only provide marginal improvements (e.g., -10% startup time for test cases). However, in real-world conditions, such improvements do not solve the problem. If an application takes 2000ms to start, even reducing it to 1800ms makes little difference, and at best, such optimizations are noticeable only under ideal conditions.

It seems to me that the team does not fully grasp the depth of this issue. Many of my colleagues have already switched to Flutter specifically because of the slow startup times on Android. When their clients or customers ask why the Android application launches so slowly, developers are forced to reply that it is a limitation of the technology they are using, they may also suggest switching to iOS, where there are no such problems, but this is not an option.

In my opinion, issues like this, and especially the implementation of NativeAOT support for Android, should be considered critically important. I would like to hear the team's thoughts on this matter: what should we expect? Will NativeAOT support for Android be added in the near future, or should we only hope for small, incremental performance improvements that don't solve anything and are waiting for everyone to switch to Flutter?

@jonathanpeppers
Copy link
Member

jonathanpeppers commented Aug 14, 2024

If an application takes 2000ms to start, even reducing it to 1800ms makes little difference

I've found simply profiling your app (with dotnet-trace) and improving things, you should be able to get this quite a bit shorter. If you've already recorded a custom AOT profile, or already "AOT everything", what is left is to profile your own code. In the customers I've helped in the past, we've always been able to achieve reasonable results on a mid-range device.

@vyacheslav-volkov
Copy link

Hi @jonathanpeppers, yes, I do all of this in my applications, and yet the iPhone X with NativeAOT still performs faster than the Galaxy S22 Ultra. Additionally, to go through all these steps, you need to use various scripts and possess additional knowledge on how to use and set everything up, whereas, for example, Flutter offers a convenient profiler built into the environment that requires no extra knowledge.

In the end, to write a "fast application" for Android that still lags behind native applications in terms of startup speed, you need to perform a whole range of additional operations, which not every developer can manage, just to make their application work somewhat faster. I believe that this expectation is where the main problem lies. A developer expects that the release build will immediately work as it should, but instead, they encounter performance issues where they don't expect them.

NativeAOT could be the solution to this problem. Yes, there are still limitations on using dynamic code, but they are not that difficult to overcome, resulting in an application that performs as fast as a native one. Isn't that what we want for a cross-platform application? Moreover, I’m almost 100% sure that no one uses Android applications without ProfiledAOT or FullAOT because, in that case, you can forget about startup performance. This also means they are already using trimming, so transitioning to NativeAOT wouldn't require much additional effort. Over time, more libraries and frameworks will become fully compatible with NativeAOT, making integration seamless for developers without any issues.

By the way, the folks from the Avalonia team used NativeAOT for Android and compared performance here. The video speaks for itself—this is what every developer expects from their application's startup speed.

I’ve had an excellent experience using NativeAOT on iOS, and the result is that the application runs almost as if it were written in Swift/ObjectiveC. Because of this, I’m very interested in understanding the Android team's thoughts on NativeAOT and why it appears to have such a low priority. When developing cross-platform applications, we rely on a shared codebase, so for developers targeting both iOS and Android, it wouldn't make much difference if Android didn't support dynamic code—after all, iOS doesn’t support it either.

@fanyang-mono fanyang-mono removed their assignment Aug 14, 2024
@jonathanpeppers
Copy link
Member

@vyacheslav-volkov can you share some .nettrace or .speedscope files of your application's startup? It doesn't sound like you've tried this?

NativeAOT would probably make your app faster, but so would some general performance investigations of your code.

@vyacheslav-volkov
Copy link

@jonathanpeppers I can't upload .nettrace or .speedscope from work projects, but I've uploaded a sample app that only initializes my framework and shows one layout, and on a Galaxy Note 10 with FullAOT it takes 1 second. at the same time, on iOS this configuration with NativeAOT starts almost instantly on iPhone X. I've added .speedscope and .nettrace directly to the repository.
If you can check and tell me how to fix it without rewriting everything into classes instead of structs, I will be glad, but otherwise my question on NativeAOT is still relevant.

https://github.com/vyacheslav-volkov/PerfAndroidTest

@alexyakunin
Copy link

alexyakunin commented Aug 16, 2024

If an application takes 2000ms to start, even reducing it to 1800ms makes little difference

I've found simply profiling your app (with dotnet-trace) and improving things, you should be able to get this quite a bit shorter. If you've already recorded a custom AOT profile, or already "AOT everything", what is left is to profile your own code. In the customers I've helped in the past, we've always been able to achieve reasonable results on a mid-range device.

I am actually curious... What's the largest app on MAUI + Blazor that's built at Microsoft? Are there any at all?

Sorry for a bit angry response, but you guys kinda underestimate the amount of efforts some of us have to invest to improve startup time, as well as our level of experience. I'll list just some of things we already did:

  • All service registrations are replaced w/ delegate-based ones
  • So every client-side service is resolved either explicitly, or via an intermediate service caching hub like this one: https://github.com/Actual-Chat/actual-chat/blob/dev/src/dotnet/UI.Blazor.App/Services/ChatUIHub.cs
  • Custom service container that allows us to start some services (mostly related to web view) early while registering the rest (this saves 50-100ms on startup).
  • Use of MemoryPack everywhere (it's a serializer with Roslyn code generator, so no runtime codegen)
  • Similarly, all our proxies are also generated in compile time
  • Profiling & postponing everything that takes time @ startup or moving it to parallel threads.
  • Warming up whatever runs on later startup stages earlier in a concurrent thread - actually this doesn't help much, seemingly b/c JIT concurrency is limited, and it's all about these AOT_NOT_FOUND
  • Getting rid of key generic use cases. E.g. our proxies were heavily using ArgumentList<T0,T1,...> type to capture method call arguments with just one allocation (contrary to object[] used by nearly any other proxy generator), and I ended up adding an "array mode" for this abstraction to get rid of generics & make it more AOT-friendly: ActualLab/Fusion@1e12033
  • Identifying and creating this issue, which is a key bottleneck in our case now.

But in the end, we can do only so much - e.g. we can't really change:

  • The code involving AsyncTaskMethodBuilder<T> & similar types (i.e. async code)
  • ConcurrentDictionary.GetOrAdd(key, static (key, state) => ..., state) scenarios
  • Most of cases involving Fusion's Computed<T> abstraction.

As for NativeAOT, I've made a simple test to see how far we're from being able to use it, and it just confirmed my worst thoughts:

All in all, I think you guys make a HUGE mistake:

  1. There are generics, Span<T> & tons of other types which makes .NET blazingly fast on the server side. I'm a big fan of .NET partially due to its perf.: https://itnext.io/geting-4x-speedup-with-net-core-3-0-simd-intrinsics-5c9c31c47991 (it's my post)
  2. You claim .NET code is 100% portable / runs everywhere - and it does. But features like generics and Span<T>, which are supposed to make it fast, are also mobile startup performance killers.
  3. And it's all due to the fact Mono AOT is incapable of generating generic method instances we really need.
  4. Native AOT w/o universal shared generics also doesn't look promising - unless you guys add some tooling allowing to e.g. programmatically enumerate possible generic parameters for certain types & methods right during the link stage. E.g. in our case we technically can do this, but it will require a decent amount of work. As you might guess, it's totally unacceptable for any real app to fail at a random point just because one of generic methods doesn't have an AOT version.

So IMO you need to address this issue. And @vyacheslav-volkov is absolutely right: there is a ton of critics of MAUI itself, but MAUI app startup time is a key issue faced by people like us - i.e. the ones who made it through all the hoops. The ones who found fixes or workarounds for everything else people complain about.

@alexyakunin
Copy link

alexyakunin commented Aug 16, 2024

iPhone X with NativeAOT still performs faster than the Galaxy S22 Ultra

Probably it's time to repeat here that INTERPRETED version of our app for iPhone 13 starts faster (1.3s) than it's AOT-compiled version on Galaxy S23 Ultra (~ 1.8s).

Here is how https://actual.chat startup times look across different platforms / devices:

  • 1.2s: Ryzen 7950X3D, Windows app (regular .NET 8 app)
  • 1.3s: iPhone 13, iOS app w/o any AOT code (i.e. interpreter mode build)
  • 1.6s: Ryzen 7950X3D, WASM app in Firefox (no AOT)
  • 1.8s: Galaxy S23 Ultra, Android app with profile-guided AOT - and ~ 2s on a top-tier device means you'll be seeing >5s startup times on mid- & entry-level devices, which almost certainly means elevated ANR rate & Google Play penalization
  • 2s: iPhone 13, WASM app in Safari/WebKit
  • 2.1s: Ryzen 7950X3D, WASM app in Chrome
  • 2.8s: Galaxy S23 Ultra, WASM app in Chrome

Notes:

  • We don't use AOT for our WASM app mainly b/c it doesn't improve its load time, but more than doubles the app size.
  • We don't use AOT for our iOS app due to a HUGE 200MB binary it produces. Probably my recent changes to our ArgumentList will fix this though.

@alexyakunin
Copy link

alexyakunin commented Aug 16, 2024

One last note on Mono AOT: honestly, we'd do whatever it takes in terms of adding extra codegen to our app to get rid of JIT during the startup. The problem is: we can't. This issue literally blocks any options except getting rid of most of structs (or generic methods, which seems even worse option).

So the advice like "try profiling your code & eliminate hot paths" isn't helpful at all - it's slow because the hot path is in Mono JIT compiler, and we can't get rid of it without turning a huge amount of good code into a worse one (i.e. structs -> classes migration) + likely, breaking compatibility of any old client with our API, etc.

IDK how hard it is to address this issue, but... I also don't understand how you guys can claim that AOT is there (even a profile-guided one!) without a huge asterisk nearby.

@alexyakunin
Copy link

alexyakunin commented Aug 16, 2024

By the way, the folks from the Avalonia team used NativeAOT for Android and compared performance here. The video speaks for itself—this is what every developer expects from their application's startup speed.

Yep... We'd try to migrate to NativeAOT as soon as it becomes available solely due to this. The startup time is crucial for apps like ours.

@vyacheslav-volkov
Copy link

vyacheslav-volkov commented Aug 16, 2024

@alexyakunin I spent some time making my library NativeAOT compatible (I haven't pushed the latest version to github yet) but the result was the worst. This is how I solved the aot generic problem, I added this method:

    public static void LinkerIncludeGenericType(
        [DynamicallyAccessedMembers(DynamicallyAccessedTypes.DynamicMemberInfo)]
        Type type)
    {
        if (!RuntimeFeature.IsDynamicCodeSupported && Default.FalseCondition)
            _ = type.BaseType;
    }

This method is only intended to tell the linker about the generic type, which will not affect the non-AOT version. I use reflection a lot because I write my own bindings, and a typical binding looks like this:

_testModel1.Bind(m => m.Value).To(_testModel1, m => m.Value1);

And under the hood I track all the generic instances that I need to create this binding:

    public static BindingResult<TTarget> To<TTarget, TFrom, TSource, TTo>(this BindingSyntax<TTarget, TFrom> syntax, TSource source,
        [RequireStaticDelegate]
        Expression<Func<TSource, TTo>> sourceMember,
        BindingParameterValue targetNullValue, BindingParameterValue fallback, [RequireStaticDelegate] Action<IBindingBuilderContext>? configurator = null,
        IReadOnlyMetadataContext? metadata = null)
        where TTarget : class
        where TSource : class
    {
        if (!RuntimeFeature.IsDynamicCodeSupported && Default.FalseCondition)
        {
            if (typeof(TFrom) != typeof(BindingSyntaxEvent) && (typeof(TFrom).IsValueType || typeof(TTo).IsValueType))
            {
                MugenExtensions.Convert<TFrom, TTo>(default);
                MugenExtensions.Convert<TTo, TFrom>(default);
            }

            if (typeof(TTo).IsValueType)
                PropertyAccessorMemberInfoBase.LinkerInclude<TTo>();
        }
        //other code

and the code from PropertyAccessorMemberInfoBase

    public static void LinkerInclude<TValue>()
    {
        ReflectionMugenExtensions.LinkerIncludeGenericType(typeof(PropertyAccessorMemberInfo<object, TValue>));
        ReflectionMugenExtensions.LinkerIncludeGenericType(typeof(StaticPropertyAccessorMemberInfo<TValue>));
    }

This approach works great and keeps track of all the required types on its own.

@alexyakunin
Copy link

@vyacheslav-volkov thanks for sharing the example - yeah, that approach is nice. I thought about creating a dedicated piece of "logic" that similarly touches every type unnoticed by linker (mostly to make sure that if it gets evaluated for false condition, it happens ~ once), but this approach allows to keep that code right where it has to be.

One quick question - what is DynamicallyAccessedTypes.DynamicMemberInfo? I.e. is it a mistake or some mix of flags?

@alexyakunin
Copy link

Also, LinkerIncludeGenericType is non-generic - it's to prevent AOT from generating useless method instances?

@vyacheslav-volkov
Copy link

@alexyakunin

One quick question - what is DynamicallyAccessedTypes.DynamicMemberInfo? I.e. is it a mistake or some mix of flags?

Yes, you can define the flags you need for the type, in my case they are construct flags:

public const DynamicallyAccessedMemberTypes DynamicMemberInfo = DynamicallyAccessedMemberTypes.PublicConstructors | DynamicallyAccessedMemberTypes.NonPublicConstructors;

Also, LinkerIncludeGenericType is non-generic - it's to prevent AOT from generating useless method instances?

Yes, there's no point in making it generic since you can just pass the type.

@alexyakunin
Copy link

alexyakunin commented Aug 16, 2024

@jonathanpeppers I have another question - maybe you can help:

Imagine you have a build-time proxy code generator, which generates generic proxies - e.g. if it's an interface proxy, its generated implementation passes the call to a general-purpose interceptor. And there is a variety of interceptors. In the end, they accept an object like this describing a call: https://github.com/ActualLab/Fusion/blob/master/src/ActualLab.Interception/Invocation.cs

Now, we want all of this to be fully compatible with NativeAOT. Or full Mono AOT so that ILLink is able to identify all actual dependencies of these interceptors. And what we can do is to make proxy generator to add something that would allow us to add per-interceptor extras to make sure every type & method actually used by any of interceptors is "touched".

For the sake of clarity, interceptors may use SomeType<T> or SomeMethod<T>, where T is either an argument or an "unwrapped" result of one of intercepted methods ("to unwrap" = pull T out of Task<T>, ValueTask<T>, or simply T), and:

  • SomeType<T> is a generic type that can be statically touched, if you can generate a fake call with T parameter (e.g. by a fake call like TouchSerializer<T>() => Use(Serializer<T>.Instance))
  • SomeMethod<T> is a generic method that can be similarly statically touched (e.g. by a fake call like TouchSerialize<T>() => Use(Serializer.Untyped.Serialize<T>(default!, default!)).

And I am thinking of making each proxy to fake call methods like .UseArgument<TUnwrapped>, UseResult<TUnwrapped>, etc. on IProxyCodeTouch, and "implement" a few instances of this interface (~ on per-interceptor basis) "touching" actual generic types and methods interceptors use.

The assumption here is: if ILLink concludes the interface method is used (with certain arg types), it implies all of its implementations are used as well, and thus whatever depends on them is also used.

What are your thoughts on this? Will this approach work? Are there any better options?

@alexyakunin
Copy link

alexyakunin commented Aug 16, 2024

Also, are there any tricks allowing to transition from Type type to a generic instance of type SomeType<T> parameterized w/ type in such a way linker "sees" it, assuming type is just an argument of a fake method rather than a generic argument, assuming we don't want to generate a lot of useless AOT for fake generic method instances.

In other words, are there any scenarios in which ILLink concludes that a method uses SomeType<T> parameterized by the value of passed Type type, when it sees a call to SomeMethod(Type type)?

@alexyakunin
Copy link

alexyakunin commented Aug 20, 2024

@vyacheslav-volkov just want to say I fully agree with this:

It seems to me that the team does not fully grasp the depth of this issue. Many of my colleagues have already switched to Flutter specifically because of the slow startup times on Android. When their clients or customers ask why the Android application launches so slowly, developers are forced to reply that it is a limitation of the technology they are using, [...]

I can add that maybe up to 95% of developers simply won't dig that far to identify this specific issue. Most of them have no time for this, and only a fraction of the remaining ones has the experience needed to figure out what's going on.

That's why only a few people complaining here. But the fact others don't even bother to dig is actually a very negative thing for Microsoft, because their conclusion is much simpler: MAUI is intrinsically slow. We've invested a lot to build a product on it, and now the only thing we can do is to switch to Flutter or some other alternative. + Lesson learned: trusting Microsoft was a mistake.

And from the business perspective this is the only conclusion you can make. The explanation doesn't really matter while the underlying issue isn't fixed.

@charlesroddie
Copy link

charlesroddie commented Aug 23, 2024

All these partial AOT modes, including profiled AOT, are just red herrings. We should ignore everything to do with partial AOT, profiled AOT, etc., and get full AOT working. Once that's done, this issue, and the need for the partial solutions, goes away. This looks like the issue to solve: #101135 (comment)

Now there is StripILAfterAOT working so a simple test of working AOT would be that with StripILAfterAOT set, there is zero IL left after stripping.

@alexyakunin
Copy link

alexyakunin commented Aug 24, 2024

We should ignore everything to do with partial AOT, profiled AOT, etc., and get full AOT working.

Politely disagree: it depends on your goals.

If there is an interpreter or JIT and your goal is to just speed up the app (e.g. on startup, which is the most frequent case) without making tons of changes, profiled AOT is the best way to do this. Yes, there is a chance it would miss some methods, but so what? It's still a fully working app, just starts (or does what it's supposed to do) 1% slower. And in most of cases it doesn't really matter. + JIT means you may use things like Dynamic Method to speed up certain things in runtime.

And if there is no interpreter and no JIT, full AOT is the only option that remains. But it's also the most painful one - especially for a larger app. And all because C# was never meant to be a statically compiled language. I'd also say that we need universal shared generics here, otherwise it's very hard to guarantee the app won't fail by attempting to access a generic (e.g. via reflection) that wasn't statically compiled.

I'd say that I'd rather prefer to have a fully working Mono-based profiled AOT on Android vs NativeAOT:

  • If we'd get sub-0.4s app startup time with Mono AOT, I probably won't bother to optimize it further
  • As for NativeAOT, it requires a decent amount of changes - probably a month of work or more. And it's all about different sacrifices + a lot of codegen via Roslyn code generators.

@agocke
Copy link
Member

agocke commented Aug 30, 2024

I have no idea about any of the Mono stuff here, but let me just comment on the overall architectural picture:

If you are using generics and generic specialization, you are not improving performance, you are making a performance tradeoff. This is true for all programming language implementations. There is no way to share code with a different calling convention without introducing some performance cost. .NET users have been incorrectly obsessing over microbenchmarks for years to eek out a win in a single function call and neglected the tradeoffs in this space. Generic specialization of value types offers the highest possible throughput for those code paths, at the expense of generating more code.

For JIT runtimes this means more time JITing, more memory used for JITed code, and potentially more icache misses due to lower code density.

For AOT runtimes this means vastly more generated code, larger binary sizes, longer compile times, and potentially more icache misses. The code size penalty is even larger for AOT because it can't use runtime conditions to predict whether or not a specialization is actually used at runtime, and therefore it must generate all potential ones. This is particularly bad for generic virtual methods or generic interface methods, where both the implementation and substitution are unknown and the size of the generated code grows quadratically. It's not impossible for an AOT app using GVMs to have the GVM specialization code be a substantial portion of the entire app.

When using AOT with generics you need to strongly consider whether it's better to simply allocate a class or boxed interface rather than using specialization. You may gain a few microseconds due to specialized code, but lose on all other metrics.

Native AOT w/o universal shared generics also doesn't look promising - unless you guys add some tooling allowing to e.g. programmatically enumerate possible generic parameters for certain types & methods right during the link stage.

Separately, this should only ever happen if you're using reflection, like MakeGenericType. If your application crashes due to missing generic code and Native AOT isn't giving you a warning about potential failures, then there is a bug in Native AOT.

If it is giving you a warning and your code is failing, you need to fix your code. AOT has some fundamental incompatibilities with reflection. Some of them are hard incompatibilities, like Assembly.Load being impossible, but others come with unacceptable performance tradeoffs. Universal shared generics also produce unfixable performance problems in other applications: #71210.

We may still implement USG for Native AOT in the future, but it will come with a different set of tradeoffs and be unusable by a different set of customers.

@alexyakunin
Copy link

alexyakunin commented Sep 2, 2024

@agocke ,

  1. We understand the tradeoffs for generic methods parameterized w/ ValueTypes. Nevertheless, we deliberately choose to use them - in particular, based on micro- and other kinds of benchmarks, such as live profiling.

  2. When you state things like "you are not improving performance, you are making a performance tradeoff"... Well, I instantly fall into "let me give you a demagogy masterclass" mood (or an eye-opener?). How about a way more generic statement: EVERY DECISION YOU MAKE IMPLIES TRADEOFFS. Long story short, it doesn't help when you state something obvious as a response to some angry person's comment.

  3. Maybe it worth reiterating that: no one writes a code for AOT on .NET. No one. We painfully modify the code to work with AOT solely because it's implemented in such a way that it breaks almost everything. Wanna check how bad it is? Well, dotnet run doesn't even try to fully emulate AOT behavior. It's super easy to produce an AOT project that fails after dotnet publish, but runs smoothly with dotnet run.

And I don't get why you guys find this acceptable. IMHO it's deeply wrong to break a bunch of features instead of making them work in some way, even if it's much slower. You can explain the slowness - and moreover, we can address the slowness, because typically all we need here is to profile & optimize the hot path. But when you break literally everything, we have to change each and every broken thing. E.g. I would be fine with either universal generics or interpreter - whatever, just don't JIT it. Based on what we see w/ interpreter on iOS, this would still allow us to shave off 50% of startup time. But somehow JIT + broken AOT is all we have, and you're trying to convince us it's fine.

Moreover, AOT breaks specifically what helps JITted apps to run faster. And you can't know what's broken unless you run it.

  1. And finally... "Why do you see the speck that is in your brother's eye, but don't consider the beam that is in your own eye?"

Am I the author of any of these methods? Am I the one who concluded it makes sense to call AsyncTaskMethodBuilder.Start<TStateMachine>(ref TStateMachine) per every single async method call?

And if Microsoft can't author AOT-friendly code, why it expects others can easily jump through all the hoops to author it? Doesn't this indicate that whoever makes the decisions on how AOT is supposed to work made a bunch of wrong calls in this specific case (i.e. generic handling?)

Long story short, I don't see why it makes sense to look for excuses here, when the first step in solving a problem is at least recognizing it.

@alexyakunin
Copy link

alexyakunin commented Sep 2, 2024

Separately, this should only ever happen if you're using reflection...

I don't understand why banning devs from using what's quite convenient is viewed as an acceptable trade-off here.

The way I would approach this is: if reflection is a mistake, we should try removing it from .NET. And if it's a genuinely useful thing (that's what 90+% of developers will tell you), we should stop pretending it's ok to break it in AOT builds only.

@alexyakunin
Copy link

alexyakunin commented Sep 2, 2024

P.S. The conversation is getting a bit heated... Can we try to refocus it on how to solve this specific issue / why it's complex? I'd love to know why these specific constraints exist, and what specifically prevents Microsoft from making them much less restrictive.

This comment also worth reading: #106748 (comment) and a few more following it. A brief summary: if this fix is complex, and it's needed only for Android, how it's possible that the very same Mono AOT generates the code for all generic instances for iOS?

@alexyakunin
Copy link

Hi, are there any updates on that?

@alberk8
Copy link

alberk8 commented Oct 4, 2024

P.S. The conversation is getting a bit heated... Can we try to refocus it on how to solve this specific issue / why it's complex? I'd love to know why these specific constraints exist, and what specifically prevents Microsoft from making them much less restrictive.

This comment also worth reading: #106748 (comment) and a few more following it. A brief summary: if this fix is complex, and it's needed only for Android, how it's possible that the very same Mono AOT generates the code for all generic instances for iOS?

To be brutally honest, this will only quick change if and when Google ban VM from running on it's platform. Just look at Apple, you either comply or take your game elsewhere.

@alexyakunin
Copy link

alexyakunin commented Oct 10, 2024

To be brutally honest, this will only quick change if and when Google ban VM from running on it's platform. Just look at Apple, you either comply or take your game elsewhere.

Well, if they'd be out of any choice - of course.

I don't see why banning runtime codegen makes sense though (e.g. for Apple). And if I'd be working on JIT for Android / similar platforms, I'd certainly implement it as JIT w/ file system cache. Maybe I miss something, but it seems obviously faster to link previously compiled method code vs generating it each and every time.

@winkmichael
Copy link

Any update Microsoft, this is a rather big issue, apps take many many seconds to load on Android.

@alexyakunin
Copy link

alexyakunin commented Oct 14, 2024

For MS folks: I'll be bumping up this topic on Reddit until we get a meaningful fix - for the sake of clarity, the bug was reported in April, so any patience has its limits.

And this topic is an amazing example of how to turn one of your advocates into, well, at least someone who's mad at you. How it's possible to make every single step wrong?

  • No any assessment / confirmation of how severe is the issue
  • Absolutely vague answers on when / whether it's going to be addressed
  • No single person from MS here took the responsibility
  • Plain ignorance of some reasonable questions (e.g. why it doesn't work on Android, if it works on iOS?)
  • Etc.

@alexyakunin
Copy link

alexyakunin commented Oct 18, 2024

A few updates - after some investigations today:

  • iOS uses -O=gsharedvt + a few other options to generate shared generics
  • Android targets don't use any of these options.

My attempt to use -O=gsharedvt for Android in our project didn't improve the startup time - on contrary, it made it worse.

Beside that, I noticed that AndroidAotMode argument in GetAotAssemblies and Mode argument in MonoAOTCompiler task accept different values - e.g. the later one can accept FullInterp, but the first one accepts only Normal and Hybrid. In other words, it seems impossible to enable interpreter w/ AOT on Android, even though technically it seems it should be possible (and it works on iOS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests