link.Elf: eliminate an O(N^2) algorithm in flush() #21681

andrewrk · 2024-10-12T06:54:10Z

Make shared_objects a StringArrayHashMap so that deduping does not need to happen in flush. That deduping code also was using an O(N^2) algorithm, which is not allowed in this codebase. There is another violation of this rule in resolveSymbols but this commit does not address it.

This required reworking shared object parsing, breaking it into independent components so that we could access soname earlier.

Shared object parsing had a few problems that I noticed and fixed in this commit:

Many instances of incorrect use of align(1).
shnum * @sizeOf(elf.Elf64_Shdr) can overflow based on user data.
@divExact can cause illegal behavior based on user data.
Strange versyms logic that wasn't present in mold nor lld. The logic was not commented and there is no git blame information in ziglang/zig nor kubkon/zld. I changed it to match mold and lld instead.
Use of ArrayList for slices of memory that are never resized.
finding DT_VERDEFNUM in a different loop than finding DT_SONAME. Ultimately I think we should follow mold's lead and ignore this integer, relying on null termination instead.
Doing logic based on VER_FLG_BASE rather than ignoring it like mold and LLD do. No comment explaining why the behavior is different.
Mutating the original ELF symbols rather than only storing the mangled name on the new Symbol struct.

I noticed something that I didn't try to address in this commit: Symbol stores a lot of redundant information that is already present in the ELF symbols. I suspect that the codebase could benefit from reworking Symbol to not store redundant information.

Additionally:

Add some type safety to std.elf.
Eliminate 1-3 file system reads for determining the kind of input files, by taking advantage of file name extension and handling error codes properly.
Move more error handling methods to link.Diags and make them infallible and thread-safe
Make the data dependencies obvious in the parameters of parseSharedObject. It's now clear that the first two steps (Header and Parsed) can be done during the main Compilation pipeline, rather than waiting for flush().

kubkon · 2024-10-12T08:27:23Z

src/link/Elf.zig

+    // TODO This loop has 2 major flaws:
+    // 1. It is O(N^2) which is never allowed in the codebase.
+    // 2. It mutates shared_objects, which is a non-starter for incremental compilation.


The reason we have this as well as a similar loop for objects (well, relocatables) is that we parse all relocatables from all archives and all shared objects that appear on the linker line. This means we end up with an upper bound on all potentially required inputs. By this I mean that not all relocatables from archives or shared objects will necessary be needed and actually may yield duplicate symbols errors or lead to an outright broken output image. Therefore, we do symbol resolution twice: first with all objects parsed and extracted, then we mark any unreferenced object as dead, prune the sets, and re-run the resolution again. This non-standard algorithm matches what mold does, however it is not what lld does. lld for instance has the concept of lazy inputs which are analysed only when required. My understanding after studying the work of Rui/mold is that the lld approach which is effectively the legacy approach is slower than mold's if inputs are parsed in parallel.

As for the O(N^2) problem, it can be addressed without any architectural changes, it just needs to not call orderedRemove inside the loop. You can implement this kind of thing in O(N) by doing one loop only and keeping track of the number of removals as you go along.

Regarding garbage collection, subsequent updates might cause a different set of things to remain alive, so the data must not be mutated. Instead, just like the rest of the compiler pipeline, it should create read-only transformations of state from one thing to another, selectively invalidating and redoing parts of the pipeline depending on what changed since last time.

As a reminder, I want to focus on single-threaded linking that happens concurrently with the rest of the zig compiler first, before exploring multi-threaded linking. You did this in the other order for MachO but I think it will yield a better outcome to do instead as I suggested for ELF. I understand the urge to compare directly the performance against mold or lld but that is completely missing the point. We need to lean into our unique advantage in the Zig project which is the fact that the compiler and linker are tightly coupled and we can therefore start linking and compiling at the same time. In this case a single-threaded linker may even yield the best performance since it has less overhead, as long as the compiler also has work to do in the meantime.

Yeah, I like that plan, and looking forward to the outcome!

andrewrk · 2024-10-12T08:56:20Z

lib/compiler/objcopy.zig

+                                        // Original author of this next line had elf.SHN_UNDEF
+                                        // here which does not make sense given that this field
+                                        // is elf.VER_NDX
+                                        if (def.ndx != .LOCAL)
+                                            def.ndx = @enumFromInt(sections_update[src.sh_info].remap_idx);


@xxxbxxx can you shed some light on this? does the VER_NDX type that I added to std.elf not make sense, or is using elf.SHN_UNDEF here incorrect?

yes, indeed, the type of the '0' value is wrong. yeah for strong typing!

And the test is actually incomplete I think: '1' is also a special value not to be re-indexed,
According to:
https://refspecs.linuxfoundation.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/symversion.html

11.7.2. Symbol Version Table
[...] These values are identifiers that are provided by [...] the vd_ndx member of the
Elfxx_Verdef structure.
The values 0 and 1 are reserved.

0 The symbol is local, not available outside the object.

1 The symbol is defined in this object and is globally available.

-> so I guess the test should be def.ndx != .LOCAL and def.ndx != .GLOBAL .

@sizeof

Make shared_objects a StringArrayHashMap so that deduping does not need to happen in flush. That deduping code also was using an O(N^2) algorithm, which is not allowed in this codebase. There is another violation of this rule in resolveSymbols but this commit does not address it. This required reworking shared object parsing, breaking it into independent components so that we could access soname earlier. Shared object parsing had a few problems that I noticed and fixed in this commit: * Many instances of incorrect use of align(1). * `shnum * @sizeof(elf.Elf64_Shdr)` can overflow based on user data. * `@divExact` can cause illegal behavior based on user data. * Strange versyms logic that wasn't present in mold nor lld. The logic was not commented and there is no git blame information in ziglang/zig nor kubkon/zld. I changed it to match mold and lld instead. * Use of ArrayList for slices of memory that are never resized. * finding DT_VERDEFNUM in a different loop than finding DT_SONAME. Ultimately I think we should follow mold's lead and ignore this integer, relying on null termination instead. * Doing logic based on VER_FLG_BASE rather than ignoring it like mold and LLD do. No comment explaining why the behavior is different. * Mutating the original ELF symbols rather than only storing the mangled name on the new Symbol struct. I noticed something that I didn't try to address in this commit: Symbol stores a lot of redundant information that is already present in the ELF symbols. I suspect that the codebase could benefit from reworking Symbol to not store redundant information. Additionally: * Add some type safety to std.elf. * Eliminate 1-3 file system reads for determining the kind of input files, by taking advantage of file name extension and handling error codes properly. * Move more error handling methods to link.Diags and make them infallible and thread-safe * Make the data dependencies obvious in the parameters of parseSharedObject. It's now clear that the first two steps (Header and Parsed) can be done during the main Compilation pipeline, rather than waiting for flush().

the new types make this code seem a bit strange

kubkon reviewed Oct 12, 2024

View reviewed changes

andrewrk commented Oct 12, 2024

View reviewed changes

andrewrk added 5 commits October 12, 2024 10:44

objcopy: update for std.elf type safety

9b0a394

the new types make this code seem a bit strange

fix 32-bit build

344a21b

objcopy: check both global and local when remapping

49637f5

std.dynamic_library: update to new elf API

de04a8a

andrewrk force-pushed the reduce-flush branch from a992f33 to de04a8a Compare October 12, 2024 17:44

andrewrk enabled auto-merge October 12, 2024 21:09

andrewrk merged commit f7c5882 into master Oct 12, 2024
10 checks passed

andrewrk deleted the reduce-flush branch October 12, 2024 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

link.Elf: eliminate an O(N^2) algorithm in flush() #21681

link.Elf: eliminate an O(N^2) algorithm in flush() #21681

andrewrk commented Oct 12, 2024

kubkon Oct 12, 2024

andrewrk Oct 12, 2024

kubkon Oct 12, 2024

andrewrk Oct 12, 2024

xxxbxxx Oct 12, 2024 •

edited

Loading

andrewrk Oct 12, 2024

link.Elf: eliminate an O(N^2) algorithm in flush() #21681

link.Elf: eliminate an O(N^2) algorithm in flush() #21681

Conversation

andrewrk commented Oct 12, 2024

kubkon Oct 12, 2024

Choose a reason for hiding this comment

andrewrk Oct 12, 2024

Choose a reason for hiding this comment

kubkon Oct 12, 2024

Choose a reason for hiding this comment

andrewrk Oct 12, 2024

Choose a reason for hiding this comment

xxxbxxx Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

andrewrk Oct 12, 2024

Choose a reason for hiding this comment

xxxbxxx Oct 12, 2024 •

edited

Loading