Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #25293 - Puma support for smart proxy #623

Closed
wants to merge 1 commit into from

Conversation

lzap
Copy link
Member

@lzap lzap commented Nov 19, 2018

Initial implementation of Puma done by Ido: #613

This works for me, if I need to do some changes I will add additional commits on top of Idos to keep authorship information clear.

lib/launcher.rb Outdated Show resolved Hide resolved
config/settings.yml.example Show resolved Hide resolved
lib/launcher.rb Outdated Show resolved Hide resolved
lib/launcher.rb Outdated Show resolved Hide resolved
lib/launcher.rb Outdated Show resolved Hide resolved
lib/launcher.rb Outdated Show resolved Hide resolved
lib/launcher.rb Outdated Show resolved Hide resolved
@lzap
Copy link
Member Author

lzap commented Jan 15, 2019

Rebased, this nows adds entries to @servers and graceful start works, logging as well.

lib/launcher.rb Outdated Show resolved Hide resolved
@lzap
Copy link
Member Author

lzap commented Mar 1, 2019

Added support for sd_notify. This is now ready for the final review @iNecas and @ekohl

bundler.d/puma.rb Outdated Show resolved Hide resolved
Copy link
Member

@iNecas iNecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing like testing the changes in production.

Some findings:

  1. when installing puma in production by gem, make sure you have openssl-devel present. If you want to give it try as well with latest nightlies, this might be handy:
yum install gcc rpm-build rpm-devel rpmlint make python bash coreutils diffutils patch rpmdevtools openssl-devel ruby-devel
  1. In production, we set :bind_host to ::, which caused the puma to fail with the scheme ssl does not accept registry part: ::, changing :bind_host setting to '*' fixed the issue: we might need to update the installer to use this value (and check webrick is fine with it).

  2. the request.env['https'] is set to https in puma (which is non of the values we are testing right now) - needs to be added

  3. verify_mode was not set, so the client certificate were not loaded

  4. once the verify_mode is set, the client cert is not available under SSL_CLIENT_CERT env , but puma.peercert instead: we need to change all the occurrences of SSL_CLIENT_CERT

lib/launcher.rb Outdated
app_details[:SSLArgs] = {
:ca => @settings.ssl_ca_file,
:key => @settings.ssl_private_key,
:cert => @settings.ssl_certificate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing :verify_mode => 'peer'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporating this change.

@ekohl
Copy link
Member

ekohl commented Mar 2, 2019

1. In production, we set `:bind_host` to `::`, which caused the puma to fail with `the scheme ssl does not accept registry part: ::`, changing `:bind_host` setting to '*' fixed the issue: we might need to update the installer to use this value (and check webrick is fine with it).

We did this because * only bound to IPv4 on EL7. On other platforms we do bind to *. theforeman/puppet-foreman_proxy@77ed56f has more details. theforeman/puppet-foreman_proxy@0940a05 narrowed this a bit.

@mvollmer
Copy link

mvollmer commented Mar 4, 2019

1. In production, we set `:bind_host` to `::`, which caused the puma to fail with `the scheme ssl does not accept registry part: ::`

This is likely just a parsing issue, fixed by using [::] as recommended in https://tools.ietf.org/html/rfc2732

lib/launcher.rb Outdated
query_list = app[:SSLArgs].to_a.map do |x|
"#{CGI::escape(x[0].to_s)}=#{CGI::escape(x[1])}"
end
host = "ssl://#{address}/?#{query_list.join('&')}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @iNecas has reported, if address contains : or . characters, this is being misparsed. Using ssl://[#{address}]/ is probably the right thing to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a common code for both IPv4 and IPv6, we feed it correctly: ['0.0.0.0', '[::1]'] so I guess this will not fix the issue.

@lzap
Copy link
Member Author

lzap commented Mar 7, 2019

Added two changes, not sure what to do with other items. For the :: problem we can either fix this in the installer or change the code - if it spots "::" it will convert this to "[::]" which is a bit lame but it will do it. Ping me on IRC for faster turnaround, I have huge backlog on GH. Let me know.

lib/launcher.rb Outdated Show resolved Hide resolved
@lzap
Copy link
Member Author

lzap commented Mar 19, 2019

Changed slightly the code according to @ekohl recommendation. To listen on all interfaces use *, to listen on IPv4 only use 0.0.0.0. Unfortunately puma does automatically listen on IPv6 and IPv4 when :: is provided (which is converted to correct syntax [::] in the code). Therefore I have documented this misbehavior in a comment and settings example. There is a difference between webrick and puma in this case, I wonder if I should copy the same behavior also to webrick to prevent confusion.

@lzap
Copy link
Member Author

lzap commented Aug 6, 2019

@ekohl since we just branched off, let's merge it now as an opt-it?

@ekohl
Copy link
Member

ekohl commented Apr 27, 2020

@adamruzicka could you also have a look? I believe the launcher code might be useful in smart_proxy_dynflow and these APIs could allow dropping code there.

@ekohl
Copy link
Member

ekohl commented Apr 28, 2020

We had a discussion in the platform team and we feel that at this point in the release cycle we don't have time to support this. We're still very busy with EL8 support and Rails 6 as well as the regular release work. Nightly is very red and next week we're supposed to start stabilization week. We can't handle more instability and the cipher change would require an installer change was well to migrate from the current model (hardcoded + blacklist) to this model. While I think that's a very good step, at this point in the cycle it's too busy.

Our conclusion was that we want to merge this soon in the 2.2 development cycle to give it time to harden. I'd also to see the follow up plan, like how we would enable multiple threads and verify thread safety.

@mmoll
Copy link
Contributor

mmoll commented Apr 28, 2020

rebased with current Rubocop rules 🤖.

@lzap
Copy link
Member Author

lzap commented May 14, 2020

Thanks for info, I will poke you once we branch off.

@ares
Copy link
Member

ares commented May 19, 2020

2.1-stable exists ;-)

@lzap
Copy link
Member Author

lzap commented Sep 17, 2020

Our conclusion was that we want to merge this soon in the 2.2 development cycle to give it time to harden. I'd also to see the follow up plan, like how we would enable multiple threads and verify thread safety.

@ekohl can you please bring this to attention on the meeting again? We have passed the suggested timeframe but it's post 2.2 branching and still lot of time in the 2.3 cycle, it could be a good time to merge this.

@ekohl
Copy link
Member

ekohl commented Sep 17, 2020

I'm still not convinced it's a good idea. I can't speak for others, but I don't have time for this.

Perhaps you can start with splitting some things up. For example, right now it changes how SSL ciphers are configured. That might be a good part to extract and merge. That already includes a migration that will need changes in the installer and in itself is complicated enough.

@lzap
Copy link
Member Author

lzap commented Sep 17, 2020

I squashed into a single commit to make my life easier when dealing with those conflicts (keeping co-authorship). I rebased and implemented a util class SdNotifyAll which handles systemd notifications for multiple threads. This is ready to be merged.

@lzap
Copy link
Member Author

lzap commented Sep 24, 2020

Fixed rubocop issues, I don't think this is too valid so disabled this rule:

Offenses:

bundler.d/Gemfile.local.rb:1:1: C: Naming/FileName: The name of this source file (Gemfile.local.rb) should use snake_case.
gem 'solargraph', group: :development
^
lib/puma-patch.rb:1:1: C: Naming/FileName: The name of this source file (puma-patch.rb) should use snake_case.
module Puma
^

353 files inspected, 2 offenses detected

Can we proceed with merging? I spent so much time on this one, we have the option to go back to webrick so we should be safe if issues arise.

@lzap
Copy link
Member Author

lzap commented Sep 24, 2020

Changes requested from @ekohl and @ShimShtein please guys re-review.

@lzap
Copy link
Member Author

lzap commented Sep 30, 2020

I'd like to add that Webrick has always been a multi-threaded server, this is enabled by default so I don't expect problems with threads after switch to Puma.

@ekohl
Copy link
Member

ekohl commented Sep 30, 2020

As I stated again and again: I still don't really see the benefit in using Puma. There is no performance benefit and it now only complicates the code. If I read puma/puma#2260 I'm not sure it's wise to spawn multiple launchers from the same process and things may break in the future. Nothing in this PR or the Redmine issue explains why this is a good idea and I think this is a bad idea. Unless you convince me otherwise that there is a benefit, I don't want to merge this.

@lzap lzap closed this Sep 30, 2020
@lzap lzap deleted the puma branch September 30, 2020 12:41
@lzap
Copy link
Member Author

lzap commented Sep 30, 2020

For the record, I know I could do better in regard to description, but this was a follow-up PR. The purpose of this is that Webrick seems to be dying under load:

https://bugzilla.redhat.com/show_bug.cgi?id=1614087
https://projects.theforeman.org/issues/24634

Webrick 1.4 or newer however does fix the issue. The Puma migration started before we actually migrated Smart Proxy to SCL Ruby (which was blocking upgrading Webrick from 1.3).

I guess we don't really needed. I was driving this forward since we are changing core to Puma as well and I was hoping for a single stack. You can configure Puma in (multithreaded) singleinstance too to follow Webrick deployment strategy, but you don't want this to continue. Allright.

@ehelms
Copy link
Member

ehelms commented Sep 30, 2020

From my perspective, being a novice on the internals of the smart proxy, I like the idea of landing on the same technologies across our stack where possible. This means we get more out of our investment, more knowledge and in theory understanding of what might be happening under the hood in each. For example, we benefit from investing and supporting systemd notify support in Puma if we use it in two places.

All technology shifts come with risks, and I admit I do not fully know the extent this may impact. Sounds like nobody truly does and we do not have the confidence in our testing to give a clear yes or no.

We are starting to push more workloads onto the smart proxy, which does concern me from a workload perspective. And I would wonder if Puma is a better forward looking choice to handle those workloads. Examples are the global registration workflow, container registry support that Katello is looking in to, using the smart proxy to support Foreman Proxy with Content moving to port 443, more reverse proxying type use cases.

I apologize if I have asked this before (possibly even in this PR). What is it about the smart proxy that leads us to run it as we do instead of running it as a more traditional sinatra application + application server?

@lzap
Copy link
Member Author

lzap commented Oct 1, 2020

Nothing in this PR or the Redmine issue explains why this is a good idea and I think this is a bad idea.

The original motivation was my own ticket https://projects.theforeman.org/issues/24634 and then Ido (the original author of this PR) took over and created a new RM issue for whatever reason which has an empty description. At the time, I have tested that webrick 1.3 suffers from slow connection termination and puma does not. That was the initial goal, which is no longer a must as we migrated to SCL Ruby with Webrick 1.4+.

Now, it's still a useful move not only for the reason I explained of having a single web stack for both Core and Proxy, but also for scaling reasons. Webrick has always been a multi-treaded server with new-thread-per-request design. This is an old-school design which does not scale under heavy load - too many threads can be created per process and there is no way of controlling this behavior as this is hardcoded in the codebase (server.rb).

Puma, on the other hand, is using thread-pool technique with wait queues which is much better design in terms of reliability and robustness. There is indeed a set limit of requests processed in parallel and this can be managed by operations, this does not necessary better performance as you noted, but it is definitely better admin-experience and reliability.

On top of that, we can setup Puma as multi-process server. There was no plan to do this in the initial phase, as I noted above. But it's good to have this option if we find that a single process is no longer able to handle the load. You can't do this with Webrick at all.

If I read puma/puma#2260 I'm not sure it's wise to spawn multiple launchers from the same process and things may break in the future.

I don't understand this point, in the initial phase we would only use a single-process deployment strategy as we are not aware of any consequences multiple processes could bring. Then of course could can start experimenting with other strategies once the code was in. But even then, I don't understand how systemd-notify has to do anything with this. You haven't touched systemd through the whole review and now you see this as a stopper.

All technology shifts come with risks, and I admit I do not fully know the extent this may impact. Sounds like nobody truly does and we do not have the confidence in our testing to give a clear yes or no.

That's why I designed this patch in a way that users can switch between webrick and puma. Also the plan was to start with single-process deployment. We could start small and learn in our scale-labs. The idea is that Puma is much more flexible and configurable than Webrick.

What is it about the smart proxy that leads us to run it as we do instead of running it as a more traditional sinatra application + application server?

Nothing. We spawn some background threads in the app which is a bit unusual but Webrick has always has been a multi-threaded server and we haven't experienced any problems so far. For multiple processes would would need to make sure that those background threads (inotify monitors of ISC DHCP leases file) do work correctly, but I don't see any problem with this and IMHO this would work without a change.

@ekohl
Copy link
Member

ekohl commented Oct 1, 2020

For the record, I know I could do better in regard to description, but this was a follow-up PR. The purpose of this is that Webrick seems to be dying under load:

https://bugzilla.redhat.com/show_bug.cgi?id=1614087
https://projects.theforeman.org/issues/24634

Webrick 1.4 or newer however does fix the issue. The Puma migration started before we actually migrated Smart Proxy to SCL Ruby (which was blocking upgrading Webrick from 1.3).

Thank you. That information was very helpful and I certainly didn't know that.

I guess we don't really needed. I was driving this forward since we are changing core to Puma as well and I was hoping for a single stack. You can configure Puma in (multithreaded) singleinstance too to follow Webrick deployment strategy, but you don't want this to continue. Allright.

I think my major concern is that the way Puma is that this dual identity process doesn't appear to fit with the Puma deployment.

Puma can bind to multiple endpoints (both HTTP and HTTPS. I wonder if we should instead embrace this model.

One change would be that we would drop http_rackup_path and https_rackup_path in favor of a generic rackup_path. In all built in plugins, they are all the same. That would already simplify things. I just opened #773.

Then another layer is that a module in its entirety can be disabled, HTTP, HTTPS or both. Perhaps we can hook in the existing sinatra routing layer to look at the HTTP protocol. Maybe it could also be solved in middleware.

I don't understand this point, in the initial phase we would only use a single-process deployment strategy as we are not aware of any consequences multiple processes could bring. Then of course could can start experimenting with other strategies once the code was in. But even then, I don't understand how systemd-notify has to do anything with this. You haven't touched systemd through the whole review and now you see this as a stopper.

The problem that I see is that you start 2 puma launchers if both HTTP and HTTPS are enabled. However, in that PR it assumes there is only one. That means we would need to clean the environment as well. That tells me that we are probably swimming against the stream and we may see more issues in the future.

As I am currently fully occupied with other matters, I don't want to pick up more matters. This PR also changes the SSL cipher configuration. That's technically a change that can live outside of this PR single both webrick and Puma can accept the same format. However, that change also needs an installer change. There's a migration involved. Right now I don't have time for that. If I would merge this, I would feel responsible for those changes as well. That's why I don't want to merge this: I don't have time for the resulting work. This doesn't mean others can't do it.

@ekohl
Copy link
Member

ekohl commented Oct 2, 2020

I opened puma/puma#2404 for some guidance from upstream on how to handle it properly.

@ekohl
Copy link
Member

ekohl commented Oct 2, 2020

quoting puma/puma#2404 (comment)

And I would say multiple instances of anything Puma in the same process at the same time is not currently supported, but it may work, it may not.

I think that's the root concern I had and why I was very hesitant in merging this. I'm working on a proposal based on suggestions in that issue to see if that would work.

@ehelms
Copy link
Member

ehelms commented Oct 2, 2020

I think I know, but I would appreciate someone refreshing me on the architectural reasons behind having a single application running trying to support HTTP and HTTPs based services over alternatives such as:

  1. 2 services running on two different ports
  2. 2 services running with a reverse proxy in front

@ekohl
Copy link
Member

ekohl commented Oct 2, 2020

#774 is my attempt at supporting Puma. If it works well, we should probably rewrite the launcher to start the Puma process with parameters from settings.yml.

I think I know, but I would appreciate someone refreshing me on the architectural reasons behind having a single application running trying to support HTTP and HTTPs based services over alternatives such as:

1. 2 services running on two different ports

2. 2 services running with a reverse proxy in front

The major reason this exists is that some things must happen over HTTP. For example, unattended installs such as kickstarts and preseeds typically don't support HTTPS at all or it's a pain to set up the trust chain. The Smart Proxy can reverse proxy these calls to the Foreman. In doing so, it also provides the URLs to the Foreman application. This allows Foreman to generate the correct callback URL (which happens after the build is done; by default this is something like http://foreman.example.com/unattended/built but should become http://smart-proxy.example.com:8000/unattended/built).

However, HTTP is insecure. There is no authentication and it's unencrypted. That's why we prefer to run everything over HTTPS when we can. To limit the attack surface, every module can be configured to listen only on HTTPS (the installer does this for us).

In practice we only need HTTP for the template and httpboot modules. If you disable those, you don't really need to listen on HTTP at all. That's why the Proxy also allows you to not listen on HTTP at all. Again, to reduce attack surface. There are also companies with policies that disallow HTTP and this suits them. HTTP is also easy in development.

So effectively the Smart Proxy has 2 virtual hosts where we distinguish on the protocol. That's also the approach I've taken in my PR, based on a suggestion by a Puma developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants