Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multi-branch deployment #223

Open
timperrett opened this issue Mar 18, 2019 · 2 comments
Open

Support for multi-branch deployment #223

timperrett opened this issue Mar 18, 2019 · 2 comments
Labels
area:deployments pertains to the deployment subsystems area:workflow

Comments

@timperrett
Copy link
Member

timperrett commented Mar 18, 2019

For the longest time, Nelson has followed the notion of an unstable master branch and encouraged users to ship code as fast as possible, leveraging supporting tools for experiemtantion and comparative analysis to see if there changes were better or worse than previous revisions - this was the environment and vision which spawned Nelson. However, as the adoption of Nelson has grown, it has become clear that this free-form fast moving environment is not present everywhere, and the notion of a so-called golden master exists in many organizations: changes merged to master should be functional and work "as expected". To continue to drive adoption, Nelson needs to provide improved support for this workflow which means supporting what the author is terming “branch deployments”.

Revisioning

Historically, Nelson has embraced Semantic versioning from the early days of that open specification and the semantics are fundamental in how Nelson's garbage collection subsystems operate. Branch deployments have the potential to explode this complexity as assumptions about increasing patch versions (as one example) would not hold if there are multiple streams of deployment into a given namespace: how do you lexicographically sort these arbitrary versions?

One approach would be to have Nelson support multiple revisioning schemes, for example, all branch deployments could be monotonically revisioned either with user-supplied input (dangerous) or some Nelson-provided epoch. The following revisioning strategies are most common in industry:

  1. Semantic versioning (version 1.x): Many ecosystems - for example, NodeJS, Java etc - use traditional semantic versioning. For example, 1.20.3, 2.5.1 etc. Whilst these revisions can sort such that Nelson know’s total order, semantic challenges abound when we consider non-master (trunk) releases… what would it mean to have different semantic revisions from different branches, which may or may not be intersecting? This could cause havoc with users.

  2. Monotonic versioning: Some users rely on their CI system to globally and monotonically increment the version number over time, regardless of package, compatibility or otherwise.

  3. SCM revisioning: while less common, some organizations use the identifier from their SCM system, for example, Git SHA or revision counts. The challenge with these kinds of revisioning schemes is that they typically do not lexicographically sort, leaving Nelson no way to know which revision came before another (sort order is partial, not total).

The impedance miss-match between these different revisioning strategies causes a fundamental problem: Nelson cannot support them all and retain cogent, accurate garbage collection. With this frame, we need a solution that can satisfy most users, whilst also making a system that is maintainable. With this frame, the author proposes that Nelson adopt Semantic versioning 2.x, which would allow versions to be qualified however users wish. Consider the following examples:

1.0.1
1.0.1-newfeature.1
1.2.3+20130313144700

All of these versions are valid and sortable pertaining to one another. The author proposes that all versions going forward take a branch qualifier; whilst this makes the versioning a little noisier, it has the benefit that any given organization using Nelson can determine for themselves what their branching strategy is, and Nelson becomes less prescriptive overall.

Default Repository Branch

With this change, we are considering adding the notion of a default repository branch for a given repo, such that when an end user enables Nelson on a given repository, they can optionally supply a default branch. This would allow Nelson to retain a notion of root namespaces, whilst doing any per-branch deployments to a subordinate branch-based namespace. For example:

  • master deployment: merge to master, Nelson deploys 1.2.3 to the default-namespace, as defined in the Nelson configuration
  • foo-feature branch has a changeset in an open PR (or any branch, if your CI builds that) and when engineers push to that branch, your CI solution can build the branch and push a container to your registry (version say, 1.0.1-branchname.12345) and then instruct Nelson to deploy that revision to a Nelson namespace like <default-namespace>/<branchname> by using Slipway to launch a Github deployment.
    • it is worth noting that users could - in theory - also launch things from their laptops if they had the appropriate credentials for building and publishing containers. This generally is not recommended and users should ensure that their CI system is setup to support branch deployments if that feature is desired.

Such a scheme would be useful, as it would potentially assist in cleaning up these ephemeral namespaces.

Affected System Components

Implementing this change would sweep throughout the whole codebase, and touch the following systems:

  • Revisioning: Any part of the code that deals with versioning would need adaptation. We presently make some assumptions about how we do revision parsing, and we’d have to be careful to check for any sections where we might be doing string interpolation or otherwise making assumptions about the “shape” of semantic versions.

  • Slipway: Slipway presently makes assumptions about the shape of versions, and would need adaptation accordingly. Consider using this Semantic Versioning library in Golang

  • NDLP protobuf: At present the protobuf definitions have a partial definition of a semantic version - the author proposes that we conduct a breaking change on this definition to correctly implement semantic versioning and remove the pluggable versioning schemes, such that we can go all-in on semantic versioning.

  • Garbage collector & reaper: The garbage collector will be coupled with the revisioning work, but I wanted to call it out as the algorithm used for cleanup relies on some of the semver 1.0 semantics, so would need some careful adaptation.

  • Downstream routing concerns: When Nelson writes out the lighthouse protocol, it does so using the revision for the given system (for example, in the Consul backend for lighthouse). Any control plane implementations that consume this data would need revision in order to correctly interpret and handle the versions in place (for example - to be branch aware).

  • Loadbalancers: Load balancers today are revisioned using major revisions only, within a given namespace. This is done so that the inbound edge LB (e.g. Envoy) has the opportunity to split traffic at the edge. This should continue to work, but is coupled to the routing subsystems and if we are going to allow more arbitrary namespaces, then we may end up with more LBs, so we may want to just consider any implications that might come with that.

Open Questions

  • How does namespace cleanup work? Can we effectively listen for branch deletions, and then delete branch-namespaces on our side? That would possibly work and is the same scheme used by Buildkite and other SaaS products, but I’m not sure if our current query setup is optimal for this potential bloating of the namespaces table. We’d have to implement something and do some testing to see how robust some of these queries are - my suspicion is that it would be mostly fine, but we’d need to do some smarter querying on things like nelson datacetners list so that the user experience didn’t really suffer.

  • Is our H2 database going to be OK if we exponentially explode its size? It should be, but that’s worth a check.

  • During template linting, are we making any assumptions about default master branches when we do template linting on behalf of users?

Nelson Administrator Considerations

  • Branch deployments will vastly increase the amount of storage needed on your docker registry, and it is highly advisable to have a fully automated cleanup system in place so that you do not have to be dealing with the administriva of freeing up space in the registry.
@okoye
Copy link

okoye commented Mar 18, 2019

👀

@adelbertc
Copy link
Member

Overall this looks good to me.

Revisioning I like the adoption of Semantic Versioning 2.x since SemVer seems to be the predominant versioning scheme used. One concern I have that we brought up during our meeting was how we treat the master branch with respect to versioning. With multi-branch deploys we now expect multiple branches to be triggering deployments. However I assume, perhaps incorrectly, that we will have some sort of master branch which are the quasi-stable deployments. We would then need a way to signal to Nelson/Slipway what the master branch for a repo is so that it can ensure branches are treated accordingly. I realize now as I'm typing this the next section is about this so..

Default branches There are some ergonomics and usability questions here that we would need to figure out like "are we going to require users specify a default branch for every repo?" This seems like a reasonable requirement.. but what happens if users want to change the default branch (this has happened on my team a couple times).

Open questions
Namespace cleanup: I think tying it to the lifetime of the branch makes sense. IDK if GitHub has a "branch deleted" event but if so maybe we could tie a webhook to it. Alternatively we can add another background job that periodically scans the repo branches we've deployed off of.. though for a large org that might get really large. Also agree regarding the ergonomics of nelson datacenters list.. time to rewrite in Rust 😛

H2: I'd imagine it depends on the org size.. I can see it bogging down if the org is large enough. Perhaps time to add a Postgres backend or something..

Template linting: I think this goes back to whether we make default branches a requirement or not.

I could've sworn I had a couple other questions but for the life of me cannot remember now. In any case this feature would be useful for us as we already have a couple situations where we've had to work around this problem like creating "dummy" repos which exist purely to cut releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:deployments pertains to the deployment subsystems area:workflow
Projects
None yet
Development

No branches or pull requests

3 participants