Fixes #25293 - Puma support for smart proxy #623

lzap · 2018-11-19T10:32:51Z

Initial implementation of Puma done by Ido: #613

This works for me, if I need to do some changes I will add additional commits on top of Idos to keep authorship information clear.

lib/launcher.rb

config/settings.yml.example

lib/launcher.rb

lzap · 2019-01-15T17:17:17Z

Rebased, this nows adds entries to @servers and graceful start works, logging as well.

config/settings.yml.example

lib/launcher.rb

lzap · 2019-03-01T08:46:54Z

Added support for sd_notify. This is now ready for the final review @iNecas and @ekohl

bundler.d/puma.rb

iNecas

There is nothing like testing the changes in production.

Some findings:

when installing puma in production by gem, make sure you have openssl-devel present. If you want to give it try as well with latest nightlies, this might be handy:

yum install gcc rpm-build rpm-devel rpmlint make python bash coreutils diffutils patch rpmdevtools openssl-devel ruby-devel

In production, we set :bind_host to ::, which caused the puma to fail with the scheme ssl does not accept registry part: ::, changing :bind_host setting to '*' fixed the issue: we might need to update the installer to use this value (and check webrick is fine with it).
the request.env['https'] is set to https in puma (which is non of the values we are testing right now) - needs to be added
verify_mode was not set, so the client certificate were not loaded
once the verify_mode is set, the client cert is not available under SSL_CLIENT_CERT env , but puma.peercert instead: we need to change all the occurrences of SSL_CLIENT_CERT

iNecas · 2019-03-01T18:35:01Z

lib/launcher.rb

+        app_details[:SSLArgs] = {
+          :ca => @settings.ssl_ca_file,
+          :key => @settings.ssl_private_key,
+          :cert => @settings.ssl_certificate


missing :verify_mode => 'peer'

Incorporating this change.

ekohl · 2019-03-02T03:11:09Z

1. In production, we set `:bind_host` to `::`, which caused the puma to fail with `the scheme ssl does not accept registry part: ::`, changing `:bind_host` setting to '*' fixed the issue: we might need to update the installer to use this value (and check webrick is fine with it).

We did this because * only bound to IPv4 on EL7. On other platforms we do bind to *. theforeman/puppet-foreman_proxy@77ed56f has more details. theforeman/puppet-foreman_proxy@0940a05 narrowed this a bit.

mvollmer · 2019-03-04T08:06:18Z

1. In production, we set `:bind_host` to `::`, which caused the puma to fail with `the scheme ssl does not accept registry part: ::`

This is likely just a parsing issue, fixed by using [::] as recommended in https://tools.ietf.org/html/rfc2732

mvollmer · 2019-03-04T08:10:16Z

lib/launcher.rb

+        query_list = app[:SSLArgs].to_a.map do |x|
+          "#{CGI::escape(x[0].to_s)}=#{CGI::escape(x[1])}"
+        end
+        host = "ssl://#{address}/?#{query_list.join('&')}"


As @iNecas has reported, if address contains : or . characters, this is being misparsed. Using ssl://[#{address}]/ is probably the right thing to use.

This is a common code for both IPv4 and IPv6, we feed it correctly: ['0.0.0.0', '[::1]'] so I guess this will not fix the issue.

lzap · 2019-03-07T14:06:18Z

Added two changes, not sure what to do with other items. For the :: problem we can either fix this in the installer or change the code - if it spots "::" it will convert this to "[::]" which is a bit lame but it will do it. Ping me on IRC for faster turnaround, I have huge backlog on GH. Let me know.

lib/launcher.rb

lzap · 2019-03-19T13:24:53Z

Changed slightly the code according to @ekohl recommendation. To listen on all interfaces use *, to listen on IPv4 only use 0.0.0.0. Unfortunately puma does automatically listen on IPv6 and IPv4 when :: is provided (which is converted to correct syntax [::] in the code). Therefore I have documented this misbehavior in a comment and settings example. There is a difference between webrick and puma in this case, I wonder if I should copy the same behavior also to webrick to prevent confusion.

lzap · 2019-08-06T12:27:50Z

@ekohl since we just branched off, let's merge it now as an opt-it?

ekohl · 2020-04-27T13:10:09Z

@adamruzicka could you also have a look? I believe the launcher code might be useful in smart_proxy_dynflow and these APIs could allow dropping code there.

ekohl · 2020-04-28T17:13:13Z

We had a discussion in the platform team and we feel that at this point in the release cycle we don't have time to support this. We're still very busy with EL8 support and Rails 6 as well as the regular release work. Nightly is very red and next week we're supposed to start stabilization week. We can't handle more instability and the cipher change would require an installer change was well to migrate from the current model (hardcoded + blacklist) to this model. While I think that's a very good step, at this point in the cycle it's too busy.

Our conclusion was that we want to merge this soon in the 2.2 development cycle to give it time to harden. I'd also to see the follow up plan, like how we would enable multiple threads and verify thread safety.

mmoll · 2020-04-28T19:13:48Z

rebased with current Rubocop rules 🤖.

lzap · 2020-05-14T08:55:42Z

Thanks for info, I will poke you once we branch off.

ares · 2020-05-19T16:02:09Z

2.1-stable exists ;-)

lzap · 2020-09-17T08:59:56Z

Our conclusion was that we want to merge this soon in the 2.2 development cycle to give it time to harden. I'd also to see the follow up plan, like how we would enable multiple threads and verify thread safety.

@ekohl can you please bring this to attention on the meeting again? We have passed the suggested timeframe but it's post 2.2 branching and still lot of time in the 2.3 cycle, it could be a good time to merge this.

ekohl · 2020-09-17T09:47:25Z

I'm still not convinced it's a good idea. I can't speak for others, but I don't have time for this.

Perhaps you can start with splitting some things up. For example, right now it changes how SSL ciphers are configured. That might be a good part to extract and merge. That already includes a migration that will need changes in the installer and in itself is complicated enough.

lzap · 2020-09-17T10:29:03Z

I squashed into a single commit to make my life easier when dealing with those conflicts (keeping co-authorship). I rebased and implemented a util class SdNotifyAll which handles systemd notifications for multiple threads. This is ready to be merged.

lzap · 2020-09-24T12:09:07Z

Fixed rubocop issues, I don't think this is too valid so disabled this rule:

Offenses:

bundler.d/Gemfile.local.rb:1:1: C: Naming/FileName: The name of this source file (Gemfile.local.rb) should use snake_case.
gem 'solargraph', group: :development
^
lib/puma-patch.rb:1:1: C: Naming/FileName: The name of this source file (puma-patch.rb) should use snake_case.
module Puma
^

353 files inspected, 2 offenses detected

Can we proceed with merging? I spent so much time on this one, we have the option to go back to webrick so we should be safe if issues arise.

Co-authored-by: Ido Kanner <[email protected]>

lzap · 2020-09-24T12:31:49Z

Changes requested from @ekohl and @ShimShtein please guys re-review.

lzap · 2020-09-30T11:56:42Z

I'd like to add that Webrick has always been a multi-threaded server, this is enabled by default so I don't expect problems with threads after switch to Puma.

ekohl · 2020-09-30T12:08:06Z

As I stated again and again: I still don't really see the benefit in using Puma. There is no performance benefit and it now only complicates the code. If I read puma/puma#2260 I'm not sure it's wise to spawn multiple launchers from the same process and things may break in the future. Nothing in this PR or the Redmine issue explains why this is a good idea and I think this is a bad idea. Unless you convince me otherwise that there is a benefit, I don't want to merge this.

lzap · 2020-09-30T13:04:24Z

For the record, I know I could do better in regard to description, but this was a follow-up PR. The purpose of this is that Webrick seems to be dying under load:

https://bugzilla.redhat.com/show_bug.cgi?id=1614087
https://projects.theforeman.org/issues/24634

Webrick 1.4 or newer however does fix the issue. The Puma migration started before we actually migrated Smart Proxy to SCL Ruby (which was blocking upgrading Webrick from 1.3).

I guess we don't really needed. I was driving this forward since we are changing core to Puma as well and I was hoping for a single stack. You can configure Puma in (multithreaded) singleinstance too to follow Webrick deployment strategy, but you don't want this to continue. Allright.

ehelms · 2020-09-30T13:20:22Z

From my perspective, being a novice on the internals of the smart proxy, I like the idea of landing on the same technologies across our stack where possible. This means we get more out of our investment, more knowledge and in theory understanding of what might be happening under the hood in each. For example, we benefit from investing and supporting systemd notify support in Puma if we use it in two places.

All technology shifts come with risks, and I admit I do not fully know the extent this may impact. Sounds like nobody truly does and we do not have the confidence in our testing to give a clear yes or no.

We are starting to push more workloads onto the smart proxy, which does concern me from a workload perspective. And I would wonder if Puma is a better forward looking choice to handle those workloads. Examples are the global registration workflow, container registry support that Katello is looking in to, using the smart proxy to support Foreman Proxy with Content moving to port 443, more reverse proxying type use cases.

I apologize if I have asked this before (possibly even in this PR). What is it about the smart proxy that leads us to run it as we do instead of running it as a more traditional sinatra application + application server?

lzap · 2020-10-01T06:53:44Z

Nothing in this PR or the Redmine issue explains why this is a good idea and I think this is a bad idea.

The original motivation was my own ticket https://projects.theforeman.org/issues/24634 and then Ido (the original author of this PR) took over and created a new RM issue for whatever reason which has an empty description. At the time, I have tested that webrick 1.3 suffers from slow connection termination and puma does not. That was the initial goal, which is no longer a must as we migrated to SCL Ruby with Webrick 1.4+.

Now, it's still a useful move not only for the reason I explained of having a single web stack for both Core and Proxy, but also for scaling reasons. Webrick has always been a multi-treaded server with new-thread-per-request design. This is an old-school design which does not scale under heavy load - too many threads can be created per process and there is no way of controlling this behavior as this is hardcoded in the codebase (server.rb).

Puma, on the other hand, is using thread-pool technique with wait queues which is much better design in terms of reliability and robustness. There is indeed a set limit of requests processed in parallel and this can be managed by operations, this does not necessary better performance as you noted, but it is definitely better admin-experience and reliability.

On top of that, we can setup Puma as multi-process server. There was no plan to do this in the initial phase, as I noted above. But it's good to have this option if we find that a single process is no longer able to handle the load. You can't do this with Webrick at all.

If I read puma/puma#2260 I'm not sure it's wise to spawn multiple launchers from the same process and things may break in the future.

I don't understand this point, in the initial phase we would only use a single-process deployment strategy as we are not aware of any consequences multiple processes could bring. Then of course could can start experimenting with other strategies once the code was in. But even then, I don't understand how systemd-notify has to do anything with this. You haven't touched systemd through the whole review and now you see this as a stopper.

All technology shifts come with risks, and I admit I do not fully know the extent this may impact. Sounds like nobody truly does and we do not have the confidence in our testing to give a clear yes or no.

That's why I designed this patch in a way that users can switch between webrick and puma. Also the plan was to start with single-process deployment. We could start small and learn in our scale-labs. The idea is that Puma is much more flexible and configurable than Webrick.

What is it about the smart proxy that leads us to run it as we do instead of running it as a more traditional sinatra application + application server?

Nothing. We spawn some background threads in the app which is a bit unusual but Webrick has always has been a multi-threaded server and we haven't experienced any problems so far. For multiple processes would would need to make sure that those background threads (inotify monitors of ISC DHCP leases file) do work correctly, but I don't see any problem with this and IMHO this would work without a change.

ekohl · 2020-10-01T13:01:59Z

For the record, I know I could do better in regard to description, but this was a follow-up PR. The purpose of this is that Webrick seems to be dying under load:

https://bugzilla.redhat.com/show_bug.cgi?id=1614087
https://projects.theforeman.org/issues/24634

Webrick 1.4 or newer however does fix the issue. The Puma migration started before we actually migrated Smart Proxy to SCL Ruby (which was blocking upgrading Webrick from 1.3).

Thank you. That information was very helpful and I certainly didn't know that.

I guess we don't really needed. I was driving this forward since we are changing core to Puma as well and I was hoping for a single stack. You can configure Puma in (multithreaded) singleinstance too to follow Webrick deployment strategy, but you don't want this to continue. Allright.

I think my major concern is that the way Puma is that this dual identity process doesn't appear to fit with the Puma deployment.

Puma can bind to multiple endpoints (both HTTP and HTTPS. I wonder if we should instead embrace this model.

One change would be that we would drop http_rackup_path and https_rackup_path in favor of a generic rackup_path. In all built in plugins, they are all the same. That would already simplify things. I just opened #773.

Then another layer is that a module in its entirety can be disabled, HTTP, HTTPS or both. Perhaps we can hook in the existing sinatra routing layer to look at the HTTP protocol. Maybe it could also be solved in middleware.

I don't understand this point, in the initial phase we would only use a single-process deployment strategy as we are not aware of any consequences multiple processes could bring. Then of course could can start experimenting with other strategies once the code was in. But even then, I don't understand how systemd-notify has to do anything with this. You haven't touched systemd through the whole review and now you see this as a stopper.

The problem that I see is that you start 2 puma launchers if both HTTP and HTTPS are enabled. However, in that PR it assumes there is only one. That means we would need to clean the environment as well. That tells me that we are probably swimming against the stream and we may see more issues in the future.

As I am currently fully occupied with other matters, I don't want to pick up more matters. This PR also changes the SSL cipher configuration. That's technically a change that can live outside of this PR single both webrick and Puma can accept the same format. However, that change also needs an installer change. There's a migration involved. Right now I don't have time for that. If I would merge this, I would feel responsible for those changes as well. That's why I don't want to merge this: I don't have time for the resulting work. This doesn't mean others can't do it.

ekohl · 2020-10-02T14:47:27Z

I opened puma/puma#2404 for some guidance from upstream on how to handle it properly.

ekohl · 2020-10-02T17:11:46Z

quoting puma/puma#2404 (comment)

And I would say multiple instances of anything Puma in the same process at the same time is not currently supported, but it may work, it may not.

I think that's the root concern I had and why I was very hesitant in merging this. I'm working on a proposal based on suggestions in that issue to see if that would work.

ehelms · 2020-10-02T17:34:52Z

I think I know, but I would appreciate someone refreshing me on the architectural reasons behind having a single application running trying to support HTTP and HTTPs based services over alternatives such as:

2 services running on two different ports
2 services running with a reverse proxy in front

ekohl · 2020-10-02T18:05:11Z

#774 is my attempt at supporting Puma. If it works well, we should probably rewrite the launcher to start the Puma process with parameters from settings.yml.

I think I know, but I would appreciate someone refreshing me on the architectural reasons behind having a single application running trying to support HTTP and HTTPs based services over alternatives such as:
1. 2 services running on two different ports

2. 2 services running with a reverse proxy in front

The major reason this exists is that some things must happen over HTTP. For example, unattended installs such as kickstarts and preseeds typically don't support HTTPS at all or it's a pain to set up the trust chain. The Smart Proxy can reverse proxy these calls to the Foreman. In doing so, it also provides the URLs to the Foreman application. This allows Foreman to generate the correct callback URL (which happens after the build is done; by default this is something like http://foreman.example.com/unattended/built but should become http://smart-proxy.example.com:8000/unattended/built).

However, HTTP is insecure. There is no authentication and it's unencrypted. That's why we prefer to run everything over HTTPS when we can. To limit the attack surface, every module can be configured to listen only on HTTPS (the installer does this for us).

In practice we only need HTTP for the template and httpboot modules. If you disable those, you don't really need to listen on HTTP at all. That's why the Proxy also allows you to not listen on HTTP at all. Again, to reduce attack surface. There are also companies with policies that disallow HTTP and this suits them. HTTP is also easy in development.

So effectively the Smart Proxy has 2 virtual hosts where we distinguish on the protocol. That's also the approach I've taken in my PR, based on a suggestion by a Puma developer.

theforeman-bot added Needs testing Not yet reviewed labels Nov 19, 2018

lzap mentioned this pull request Nov 19, 2018

Fixes #25293 - Puma support for smart proxy #613

Closed

ekohl reviewed Nov 19, 2018

View reviewed changes

iNecas reviewed Nov 19, 2018

View reviewed changes

lib/launcher.rb Show resolved Hide resolved

lzap force-pushed the puma branch from df44aad to 4294ddf Compare January 15, 2019 17:16

lzap force-pushed the puma branch from 4294ddf to 66b7ce8 Compare January 24, 2019 12:32

iNecas reviewed Feb 19, 2019

View reviewed changes

config/settings.yml.example Show resolved Hide resolved

mvollmer reviewed Feb 28, 2019

View reviewed changes

lib/launcher.rb Outdated Show resolved Hide resolved

iNecas reviewed Mar 1, 2019

View reviewed changes

bundler.d/puma.rb Outdated Show resolved Hide resolved

iNecas suggested changes Mar 1, 2019

View reviewed changes

theforeman-bot added Waiting on contributor and removed Not yet reviewed labels Mar 1, 2019

iNecas reviewed Mar 1, 2019

View reviewed changes

mvollmer reviewed Mar 4, 2019

View reviewed changes

theforeman-bot added Needs re-review and removed Waiting on contributor labels Mar 7, 2019

ekohl requested changes Mar 8, 2019

View reviewed changes

lib/launcher.rb Outdated Show resolved Hide resolved

theforeman-bot added Waiting on contributor and removed Needs re-review labels Mar 8, 2019

lzap force-pushed the puma branch from 97298b6 to 4365b90 Compare March 19, 2019 13:20

theforeman-bot added Needs re-review and removed Waiting on contributor labels Mar 19, 2019

theforeman-bot added the Needs re-review label Apr 27, 2020

mmoll mentioned this pull request Apr 28, 2020

Fixes #29430 - sync Rubocops to core config #755

Merged

mmoll force-pushed the puma branch from e31f8e0 to d4d3c13 Compare April 28, 2020 18:51

lzap mentioned this pull request Sep 17, 2020

Fixes #30731 - Make smart-proxy use SdNotify library #766

Merged

lzap force-pushed the puma branch from d4d3c13 to eca240e Compare September 17, 2020 10:27

Fixes #25293 - Puma support for smart proxy

f7787f9

Co-authored-by: Ido Kanner <[email protected]>

lzap force-pushed the puma branch from eca240e to f7787f9 Compare September 24, 2020 12:09

lzap closed this Sep 30, 2020

lzap deleted the puma branch September 30, 2020 12:41

ekohl mentioned this pull request Oct 2, 2020

Serve different apps on HTTP and HTTPS puma/puma#2404

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #25293 - Puma support for smart proxy #623

Fixes #25293 - Puma support for smart proxy #623

lzap commented Nov 19, 2018

lzap commented Jan 15, 2019

lzap commented Mar 1, 2019

iNecas left a comment •

edited

Loading

iNecas Mar 1, 2019

lzap Mar 7, 2019

ekohl commented Mar 2, 2019

mvollmer commented Mar 4, 2019

mvollmer Mar 4, 2019

lzap Mar 7, 2019

lzap commented Mar 7, 2019

lzap commented Mar 19, 2019

lzap commented Aug 6, 2019

ekohl commented Apr 27, 2020

ekohl commented Apr 28, 2020

mmoll commented Apr 28, 2020

lzap commented May 14, 2020

ares commented May 19, 2020

lzap commented Sep 17, 2020

ekohl commented Sep 17, 2020

lzap commented Sep 17, 2020

lzap commented Sep 24, 2020

lzap commented Sep 24, 2020

lzap commented Sep 30, 2020

ekohl commented Sep 30, 2020

lzap commented Sep 30, 2020

ehelms commented Sep 30, 2020

lzap commented Oct 1, 2020

ekohl commented Oct 1, 2020

ekohl commented Oct 2, 2020

ekohl commented Oct 2, 2020

ehelms commented Oct 2, 2020

ekohl commented Oct 2, 2020

Fixes #25293 - Puma support for smart proxy #623

Fixes #25293 - Puma support for smart proxy #623

Conversation

lzap commented Nov 19, 2018

lzap commented Jan 15, 2019

lzap commented Mar 1, 2019

iNecas left a comment • edited Loading

Choose a reason for hiding this comment

iNecas Mar 1, 2019

Choose a reason for hiding this comment

lzap Mar 7, 2019

Choose a reason for hiding this comment

ekohl commented Mar 2, 2019

mvollmer commented Mar 4, 2019

mvollmer Mar 4, 2019

Choose a reason for hiding this comment

lzap Mar 7, 2019

Choose a reason for hiding this comment

lzap commented Mar 7, 2019

lzap commented Mar 19, 2019

lzap commented Aug 6, 2019

ekohl commented Apr 27, 2020

ekohl commented Apr 28, 2020

mmoll commented Apr 28, 2020

lzap commented May 14, 2020

ares commented May 19, 2020

lzap commented Sep 17, 2020

ekohl commented Sep 17, 2020

lzap commented Sep 17, 2020

lzap commented Sep 24, 2020

lzap commented Sep 24, 2020

lzap commented Sep 30, 2020

ekohl commented Sep 30, 2020

lzap commented Sep 30, 2020

ehelms commented Sep 30, 2020

lzap commented Oct 1, 2020

ekohl commented Oct 1, 2020

ekohl commented Oct 2, 2020

ekohl commented Oct 2, 2020

ehelms commented Oct 2, 2020

ekohl commented Oct 2, 2020

iNecas left a comment •

edited

Loading