Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

link.Elf: eliminate an O(N^2) algorithm in flush() #21681

Merged
merged 5 commits into from
Oct 12, 2024
Merged

Conversation

andrewrk
Copy link
Member

Make shared_objects a StringArrayHashMap so that deduping does not need to happen in flush. That deduping code also was using an O(N^2) algorithm, which is not allowed in this codebase. There is another violation of this rule in resolveSymbols but this commit does not address it.

This required reworking shared object parsing, breaking it into independent components so that we could access soname earlier.

Shared object parsing had a few problems that I noticed and fixed in this commit:

  • Many instances of incorrect use of align(1).
  • shnum * @sizeOf(elf.Elf64_Shdr) can overflow based on user data.
  • @divExact can cause illegal behavior based on user data.
  • Strange versyms logic that wasn't present in mold nor lld. The logic was not commented and there is no git blame information in ziglang/zig nor kubkon/zld. I changed it to match mold and lld instead.
  • Use of ArrayList for slices of memory that are never resized.
  • finding DT_VERDEFNUM in a different loop than finding DT_SONAME. Ultimately I think we should follow mold's lead and ignore this integer, relying on null termination instead.
  • Doing logic based on VER_FLG_BASE rather than ignoring it like mold and LLD do. No comment explaining why the behavior is different.
  • Mutating the original ELF symbols rather than only storing the mangled name on the new Symbol struct.

I noticed something that I didn't try to address in this commit: Symbol stores a lot of redundant information that is already present in the ELF symbols. I suspect that the codebase could benefit from reworking Symbol to not store redundant information.

Additionally:

  • Add some type safety to std.elf.
  • Eliminate 1-3 file system reads for determining the kind of input files, by taking advantage of file name extension and handling error codes properly.
  • Move more error handling methods to link.Diags and make them infallible and thread-safe
  • Make the data dependencies obvious in the parameters of parseSharedObject. It's now clear that the first two steps (Header and Parsed) can be done during the main Compilation pipeline, rather than waiting for flush().

Comment on lines +1688 to +1690
// TODO This loop has 2 major flaws:
// 1. It is O(N^2) which is never allowed in the codebase.
// 2. It mutates shared_objects, which is a non-starter for incremental compilation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we have this as well as a similar loop for objects (well, relocatables) is that we parse all relocatables from all archives and all shared objects that appear on the linker line. This means we end up with an upper bound on all potentially required inputs. By this I mean that not all relocatables from archives or shared objects will necessary be needed and actually may yield duplicate symbols errors or lead to an outright broken output image. Therefore, we do symbol resolution twice: first with all objects parsed and extracted, then we mark any unreferenced object as dead, prune the sets, and re-run the resolution again. This non-standard algorithm matches what mold does, however it is not what lld does. lld for instance has the concept of lazy inputs which are analysed only when required. My understanding after studying the work of Rui/mold is that the lld approach which is effectively the legacy approach is slower than mold's if inputs are parsed in parallel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the O(N^2) problem, it can be addressed without any architectural changes, it just needs to not call orderedRemove inside the loop. You can implement this kind of thing in O(N) by doing one loop only and keeping track of the number of removals as you go along.

Regarding garbage collection, subsequent updates might cause a different set of things to remain alive, so the data must not be mutated. Instead, just like the rest of the compiler pipeline, it should create read-only transformations of state from one thing to another, selectively invalidating and redoing parts of the pipeline depending on what changed since last time.

As a reminder, I want to focus on single-threaded linking that happens concurrently with the rest of the zig compiler first, before exploring multi-threaded linking. You did this in the other order for MachO but I think it will yield a better outcome to do instead as I suggested for ELF. I understand the urge to compare directly the performance against mold or lld but that is completely missing the point. We need to lean into our unique advantage in the Zig project which is the fact that the compiler and linker are tightly coupled and we can therefore start linking and compiling at the same time. In this case a single-threaded linker may even yield the best performance since it has less overhead, as long as the compiler also has work to do in the meantime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like that plan, and looking forward to the outcome!

Comment on lines 1183 to 1187
// Original author of this next line had elf.SHN_UNDEF
// here which does not make sense given that this field
// is elf.VER_NDX
if (def.ndx != .LOCAL)
def.ndx = @enumFromInt(sections_update[src.sh_info].remap_idx);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xxxbxxx can you shed some light on this? does the VER_NDX type that I added to std.elf not make sense, or is using elf.SHN_UNDEF here incorrect?

Copy link
Contributor

@xxxbxxx xxxbxxx Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, indeed, the type of the '0' value is wrong. yeah for strong typing!

And the test is actually incomplete I think: '1' is also a special value not to be re-indexed,
According to:
https://refspecs.linuxfoundation.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/symversion.html

11.7.2. Symbol Version Table
[...] These values are identifiers that are provided by [...] the vd_ndx member of the
Elfxx_Verdef structure.
The values 0 and 1 are reserved.

0   The symbol is local, not available outside the object.
1   The symbol is defined in this object and is globally available.

-> so I guess the test should be def.ndx != .LOCAL and def.ndx != .GLOBAL .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Make shared_objects a StringArrayHashMap so that deduping does not
need to happen in flush. That deduping code also was using an O(N^2)
algorithm, which is not allowed in this codebase. There is another
violation of this rule in resolveSymbols but this commit does not
address it.

This required reworking shared object parsing, breaking it into
independent components so that we could access soname earlier.

Shared object parsing had a few problems that I noticed and fixed in
this commit:
* Many instances of incorrect use of align(1).
* `shnum * @sizeof(elf.Elf64_Shdr)` can overflow based on user data.
* `@divExact` can cause illegal behavior based on user data.
* Strange versyms logic that wasn't present in mold nor lld. The logic
  was not commented and there is no git blame information in ziglang/zig
  nor kubkon/zld. I changed it to match mold and lld instead.
* Use of ArrayList for slices of memory that are never resized.
* finding DT_VERDEFNUM in a different loop than finding DT_SONAME.
  Ultimately I think we should follow mold's lead and ignore this
  integer, relying on null termination instead.
* Doing logic based on VER_FLG_BASE rather than ignoring it like mold
  and LLD do. No comment explaining why the behavior is different.
* Mutating the original ELF symbols rather than only storing the mangled
  name on the new Symbol struct.

I noticed something that I didn't try to address in this commit: Symbol
stores a lot of redundant information that is already present in the ELF
symbols. I suspect that the codebase could benefit from reworking Symbol
to not store redundant information.

Additionally:
* Add some type safety to std.elf.
* Eliminate 1-3 file system reads for determining the kind of input
  files, by taking advantage of file name extension and handling error
  codes properly.
* Move more error handling methods to link.Diags and make them
  infallible and thread-safe
* Make the data dependencies obvious in the parameters of
  parseSharedObject. It's now clear that the first two steps (Header and
  Parsed) can be done during the main Compilation pipeline, rather than
  waiting for flush().
the new types make this code seem a bit strange
@andrewrk andrewrk merged commit f7c5882 into master Oct 12, 2024
10 checks passed
@andrewrk andrewrk deleted the reduce-flush branch October 12, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants