Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator crashes when restoring cluster #1749

Open
sagargulabani opened this issue Jul 3, 2024 · 11 comments
Open

Operator crashes when restoring cluster #1749

sagargulabani opened this issue Jul 3, 2024 · 11 comments
Assignees
Labels

Comments

@sagargulabani
Copy link

Report

I am trying to restore a backup to a new percona cluster without specifying the backupName.
Since this is a new kubernetes cluster, I don't have the backup name with me.

More about the problem

2024-06-29T16:24:18.336Z        INFO    backup restore request  {"controller": "pxcrestore-controller", "namespace": "dev", "name": "restore1", "reconcileID": "63b467f2-c684-4227-ae52-8c93d4a005f1"}
2024-06-29T16:24:18.351Z        INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "pxcrestore-controller", "namespace": "dev", "name": "restore1", "reconcileID": "63b467f2-c684-4227-ae52-8c93d4a005f1"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x112d1b0]

goroutine 135 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1a4
panic({0x154a780?, 0x28ca670?})
        /usr/local/go/src/runtime/panic.go:914 +0x218
github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1.(*PXCBackupStatus).GetStorageType(...)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1/pxc_backup_types.go:140
github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup.RestoreJob(0x40017511e0, 0x40017ffd40, 0x4001ff9900, {0x40020c0000, 0x5a}, 0x0)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup/restore.go:140 +0x50
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*s3).Job(0x400039e0c0?)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restorer.go:38 +0x38
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).validate(0x152f160?, {0x1af1e68, 0x4001df3aa0}, 0x40017511e0, 0x3?, 0x4001ff9900?)
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restore.go:80 +0x4c
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).Reconcile(0x400039e0c0, {0x1af1e68, 0x4001df3aa0}, {{{0x4001da56a0?, 0x5?}, {0x4001da5698?, 0x4002dd9cf8?}}})
        /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/controller.go:190 +0xa90
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1af58a8?, {0x1af1e68?, 0x4001df3aa0?}, {{{0x4001da56a0?, 0xb?}, {0x4001da5698?, 0x0?}}})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x40004377c0, {0x1af1ea0, 0x4000408eb0}, {0x15f31c0?, 0x40021829c0?})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316 +0x2e8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x40004377c0, {0x1af1ea0, 0x4000408eb0})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x16c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 40
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x43c

My configuration

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
  namespace: dev
spec:
  pxcCluster: pxc-db-two
  # backupName: backup1
  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1.5"
  backupSource:
    destination: s3://test-backup-bucket/percona-dev-backup/pxc-db-2024-06-29-15:45:32-full/
    s3:
      bucket: s3://test-backup-bucket/percona-dev-backup/
      credentialsSecret: aws-secret
      region: eu-west-1

Steps to reproduce

1.Create a pxc cluster
2.Try to restore the cluster from s3 using the path, not the backup name.
3. watch it crash

Versions

Kubernetes - 1.30
Operator - 1.14
2024-06-29T16:21:47.452Z INFO setup Runs on {"platform": "kubernetes", "version": "v1.30.0-eks-036c24b"}
2024-06-29T16:21:47.452Z INFO setup Manager starting up {"gitCommit": "c85a021f2a21441500b02a2c0b3d17e8a8b25996", "gitBranch": "release-1-14-0", "buildTime": "2024-03-01T09:01:29Z", "goVersion": "go1.21.7", "os": "linux", "arch": "arm64"}

Anything else?

No response

@sagargulabani
Copy link
Author

sagargulabani commented Jul 3, 2024

@hors @cap1984 @tplavcic @nonemax Please can you please check on this one, Thanks. This is a hard blocker for us.

@inelpandzic
Copy link
Contributor

inelpandzic commented Jul 4, 2024

Hey @sagargulabani , thanks for reporting, we'll check it.

@sagargulabani
Copy link
Author

hi @inelpandzic , any update ?

@ydixken
Copy link

ydixken commented Jul 8, 2024

Hi @inelpandzic we can also confirm this bug. This is the resource that was used, please note that it works on some clusters, but not all the time.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: bootstrap
spec:
  pxcCluster: percona-cluster
  backupSource:
    destination: s3://percona-xtrabackup-bootstrap/common/bootstrap
    s3:
      credentialsSecret: percona
      region: ""
      endpointUrl: https://minio.redacted.tld/

Logs:

-4aeb-a212-fb33ccf5e9c7"}
2024-07-08T12:11:21.373Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference    {"controller": "pxcrestore-controller",
"namespace": "default", "name": "bootstrap", "reconcileID": "4ff5700d-ec23-4aeb-a212-fb33ccf5e9c7"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x1634635]

goroutine 77 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1aa5fe0?, 0x2e707b0?})
    /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1.(*PXCBackupStatus).GetStorageType(...)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1/pxc_backup_types.go:140
github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup.RestoreJob(0xc000ee89c0, 0xc000e73b00, 0xc000bcd400, {0xc001441b00, 0x32}, 0x0)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup/restore.go:140 +0x75
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*s3).Job(0xc0006ffda0?)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restorer.go:38 +0x32
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).validate(0x1a8a960?, {0x204fa08, 0xc000d22750}, 0xc000ee89c
0, 0x204f998?, 0xc000bcd400?)
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restore.go:80 +0x4b
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).Reconcile(0xc0006ffda0, {0x204fa08, 0xc000d22750}, {{{0xc00
119fde0?, 0x5?}, {0xc00119fdd6?, 0xc000923d08?}}})
    /go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/controller.go:190 +0xf14
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x2053448?, {0x204fa08?, 0xc000d22750?}, {{{0xc00119fde0?, 0xb?}, {0xc00119fdd6?, 0x0?}}})
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0002890e0, {0x204fa40, 0xc00030cc80}, {0x1b4eb40?, 0xc000491f80?})
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002890e0, {0x204fa40, 0xc00030cc80})
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x1af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()```

@ydixken
Copy link

ydixken commented Jul 8, 2024

FYI As we're under a Percona support contract we've also raised this issue with the Percona support team.
Ticket ID: CS0048052

@ydixken
Copy link

ydixken commented Jul 8, 2024

I've found the issue - you need to have:

    xtradb:
      backup:
        enabled: true
        storages:
          minio:
            type: $your_storage

The important part is that the backup.storages are set.
Anyway this should not segfault but generate a log message.

Edit: Updated resolution advice with the right key.

cc @inelpandzic @sagargulabani

@sagargulabani
Copy link
Author

sagargulabani commented Jul 9, 2024

@ydixken Thank you for the update. Just to clarify the above config belongs to the actual pxc database resource.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster

which looks like

...
spec:
  backup:
    image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
    pitr:
      enabled: false
    schedule:
    - keep: 5
      name: hourly-backup
      schedule: 45 * * * *
      storageName: s3-subdir-eu-west-2
    storages:
      s3-subdir-eu-west-2:
        s3:
          bucket: test-bucket/test
          credentialsSecret: s3-backup-aws-creds
          region: eu-west-2
        schedulerName: default-scheduler
        type: s3

@ydixken
Copy link

ydixken commented Jul 13, 2024

Thanks for the heads-up!

Just to clarify, we've got following configured, before the storage was missing - and I've encountered the behavior you've described:

      backup:
        storages:
          minio-bootstrap:
            type: s3
            s3:
              bucket: "percona-xtrabackup-bootstrap"
              endpointUrl: "https://minio.redacted.tld"
              credentialsSecret: percona

To trigger a restore, I'm using:

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: bootstrap
spec:
  pxcCluster: percona-cluster
  backupSource:
    destination: s3://percona-xtrabackup-bootstrap/prod
    s3:
      credentialsSecret: percona
      region: ""
      endpointUrl: https://minio.redacted.tld/

Maybe this helps out?

@sagargulabani
Copy link
Author

yes after I added the storages section, it did work for me.

@ydixken
Copy link

ydixken commented Jul 14, 2024

glad to hear :-)

@inelpandzic inelpandzic self-assigned this Sep 24, 2024
@inelpandzic
Copy link
Contributor

@sagargulabani @ydixken PR for this fix is ready and will be included in our next 1.16.0 release.
#1828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants