-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
link.Elf: eliminate an O(N^2) algorithm in flush() #21681
Conversation
// TODO This loop has 2 major flaws: | ||
// 1. It is O(N^2) which is never allowed in the codebase. | ||
// 2. It mutates shared_objects, which is a non-starter for incremental compilation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we have this as well as a similar loop for objects (well, relocatables) is that we parse all relocatables from all archives and all shared objects that appear on the linker line. This means we end up with an upper bound on all potentially required inputs. By this I mean that not all relocatables from archives or shared objects will necessary be needed and actually may yield duplicate symbols errors or lead to an outright broken output image. Therefore, we do symbol resolution twice: first with all objects parsed and extracted, then we mark any unreferenced object as dead, prune the sets, and re-run the resolution again. This non-standard algorithm matches what mold
does, however it is not what lld
does. lld
for instance has the concept of lazy inputs which are analysed only when required. My understanding after studying the work of Rui/mold
is that the lld
approach which is effectively the legacy approach is slower than mold
's if inputs are parsed in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the O(N^2) problem, it can be addressed without any architectural changes, it just needs to not call orderedRemove inside the loop. You can implement this kind of thing in O(N) by doing one loop only and keeping track of the number of removals as you go along.
Regarding garbage collection, subsequent updates might cause a different set of things to remain alive, so the data must not be mutated. Instead, just like the rest of the compiler pipeline, it should create read-only transformations of state from one thing to another, selectively invalidating and redoing parts of the pipeline depending on what changed since last time.
As a reminder, I want to focus on single-threaded linking that happens concurrently with the rest of the zig compiler first, before exploring multi-threaded linking. You did this in the other order for MachO but I think it will yield a better outcome to do instead as I suggested for ELF. I understand the urge to compare directly the performance against mold or lld but that is completely missing the point. We need to lean into our unique advantage in the Zig project which is the fact that the compiler and linker are tightly coupled and we can therefore start linking and compiling at the same time. In this case a single-threaded linker may even yield the best performance since it has less overhead, as long as the compiler also has work to do in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like that plan, and looking forward to the outcome!
lib/compiler/objcopy.zig
Outdated
// Original author of this next line had elf.SHN_UNDEF | ||
// here which does not make sense given that this field | ||
// is elf.VER_NDX | ||
if (def.ndx != .LOCAL) | ||
def.ndx = @enumFromInt(sections_update[src.sh_info].remap_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xxxbxxx can you shed some light on this? does the VER_NDX type that I added to std.elf not make sense, or is using elf.SHN_UNDEF
here incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed, the type of the '0' value is wrong. yeah for strong typing!
And the test is actually incomplete I think: '1' is also a special value not to be re-indexed,
According to:
https://refspecs.linuxfoundation.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/symversion.html
11.7.2. Symbol Version Table
[...] These values are identifiers that are provided by [...] thevd_ndx
member of the
Elfxx_Verdef
structure.
The values 0 and 1 are reserved.
0 The symbol is local, not available outside the object. 1 The symbol is defined in this object and is globally available.
-> so I guess the test should be def.ndx != .LOCAL and def.ndx != .GLOBAL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Make shared_objects a StringArrayHashMap so that deduping does not need to happen in flush. That deduping code also was using an O(N^2) algorithm, which is not allowed in this codebase. There is another violation of this rule in resolveSymbols but this commit does not address it. This required reworking shared object parsing, breaking it into independent components so that we could access soname earlier. Shared object parsing had a few problems that I noticed and fixed in this commit: * Many instances of incorrect use of align(1). * `shnum * @sizeof(elf.Elf64_Shdr)` can overflow based on user data. * `@divExact` can cause illegal behavior based on user data. * Strange versyms logic that wasn't present in mold nor lld. The logic was not commented and there is no git blame information in ziglang/zig nor kubkon/zld. I changed it to match mold and lld instead. * Use of ArrayList for slices of memory that are never resized. * finding DT_VERDEFNUM in a different loop than finding DT_SONAME. Ultimately I think we should follow mold's lead and ignore this integer, relying on null termination instead. * Doing logic based on VER_FLG_BASE rather than ignoring it like mold and LLD do. No comment explaining why the behavior is different. * Mutating the original ELF symbols rather than only storing the mangled name on the new Symbol struct. I noticed something that I didn't try to address in this commit: Symbol stores a lot of redundant information that is already present in the ELF symbols. I suspect that the codebase could benefit from reworking Symbol to not store redundant information. Additionally: * Add some type safety to std.elf. * Eliminate 1-3 file system reads for determining the kind of input files, by taking advantage of file name extension and handling error codes properly. * Move more error handling methods to link.Diags and make them infallible and thread-safe * Make the data dependencies obvious in the parameters of parseSharedObject. It's now clear that the first two steps (Header and Parsed) can be done during the main Compilation pipeline, rather than waiting for flush().
the new types make this code seem a bit strange
a992f33
to
de04a8a
Compare
Make shared_objects a StringArrayHashMap so that deduping does not need to happen in flush. That deduping code also was using an O(N^2) algorithm, which is not allowed in this codebase. There is another violation of this rule in resolveSymbols but this commit does not address it.
This required reworking shared object parsing, breaking it into independent components so that we could access soname earlier.
Shared object parsing had a few problems that I noticed and fixed in this commit:
shnum * @sizeOf(elf.Elf64_Shdr)
can overflow based on user data.@divExact
can cause illegal behavior based on user data.I noticed something that I didn't try to address in this commit: Symbol stores a lot of redundant information that is already present in the ELF symbols. I suspect that the codebase could benefit from reworking Symbol to not store redundant information.
Additionally: