Skip to content

Conversation

@jianzhangbjz
Copy link
Member

@jianzhangbjz jianzhangbjz commented Dec 11, 2025

During cluster upgrades, the operator-lifecycle-manager-packageserver ClusterOperator incorrectly reports Available=False for ~16 seconds, violating the OpenShift contract: A component must not report Available=False during the course of a normal upgrade.

PodAntiAffinity + Single-Node Cluster + Single-Replica Deployment
During rolling updates in single-node control plane environments:

  1. Old pod is running on the only node
  2. New pod attempts to schedule
  3. PodAntiAffinity prevents new pod from co-locating with old pod
  4. New pod becomes Unschedulable (waiting for old pod termination)
  5. APIService is temporarily unavailable
  6. BUG: Unschedulable was incorrectly treated as a real failure
  7. CSV enters Failed phase → ClusterOperator reports Available=False

This is especially problematic in OpenShift SNO (Single Node OpenShift) environments.

Description of the change:
Enhanced pod disruption detection in pkg/controller/operators/olm/apiservices.go:

  1. Improved rollout detection: Added checks for Generation != ObservedGeneration and AvailableReplicas < desired to catch early-phase rollouts
  2. PodAntiAffinity awareness: Treat Unschedulable as expected disruption during single-replica rollouts instead of real failure
  3. Single-replica tolerance: For single-replica deployments during rollout with no real failures, return RetryableError instead of marking CSV as Failed

Motivation for the change:
To address https://issues.redhat.com/browse/OCPBUGS-67210

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Assisted-by: Claude Code

@openshift-ci
Copy link

openshift-ci bot commented Dec 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dtfranz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jianzhangbjz
Copy link
Member Author

/test sanity

@openshift-ci
Copy link

openshift-ci bot commented Dec 11, 2025

@jianzhangbjz: No presubmit jobs available for operator-framework/operator-lifecycle-manager@master

In response to this:

/test sanity

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jianzhangbjz
Copy link
Member Author

/test verify

@openshift-ci
Copy link

openshift-ci bot commented Dec 11, 2025

@jianzhangbjz: No presubmit jobs available for operator-framework/operator-lifecycle-manager@master

In response to this:

/test verify

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jianzhangbjz
Copy link
Member Author

/test verify (pull_request)

@openshift-ci
Copy link

openshift-ci bot commented Dec 11, 2025

@jianzhangbjz: No presubmit jobs available for operator-framework/operator-lifecycle-manager@master

In response to this:

/test verify (pull_request)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jianzhangbjz
Copy link
Member Author

jianzhangbjz commented Dec 11, 2025

Address this failure in this PR.

Run make verify
./scripts/update_codegen.sh
Generating client code for 4 targets
Generating lister code for 4 targets
I1211 07:40:00.841698    4725 main.go:58] Completed successfully.
Generating informer code for 4 targets
I1211 07:40:02.735407    4806 main.go:58] Completed successfully.
Generating openapi code for 3 targets
I1211 07:40:06.508083    5035 openapi.go:759] [github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/apis/operators/v1.CSVDescription] Annotations map[string]string: tag listType on type Map; only allowed on type Slice
I1211 07:40:06.546684    5035 openapi.go:759] [k8s.io/apimachinery/pkg/apis/meta/v1.Status] Details *k8s.io/apimachinery/pkg/apis/meta/v1.StatusDetails: tag listType on type Pointer; only allowed on type Slice
I1211 07:40:06.626928    5035 api_linter.go:43] Assembling file "/tmp/update_codegen.sh.api_violations.MxNXSw"
I1211 07:40:13.553859    5453 main.go:58] Completed successfully.
I1211 07:40:15.957573    5588 main.go:58] Completed successfully.
make diff
make[1]: Entering directory '/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager'
git diff --exit-code
make[1]: Leaving directory '/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager'
# Generate mocks and silence the following warning:
# WARNING: Invoking counterfeiter multiple times from "go generate" is slow.
# Consider using counterfeiter:generate directives to speed things up.
# See https://github.com/maxbrunsfeld/counterfeiter#step-2b---add-counterfeitergenerate-directives for more information.
# Set the "COUNTERFEITER_NO_GENERATE_WARNING" environment variable to suppress this message.
# golang.org/x/tools/imports
Error: ../../../hack/overlays/goimports_vendorlesspath.go:6:6: VendorlessPath redeclared in this block
Error: 	../../../vendor/golang.org/x/tools/imports/forward.go:75:6: other declaration of VendorlessPath
pkg/api/wrappers/deployment_install_client.go:1: running "go": exit status 1
# golang.org/x/tools/imports
Error: ../../../hack/overlays/goimports_vendorlesspath.go:6:6: VendorlessPath redeclared in this block
Error: 	../../../vendor/golang.org/x/tools/imports/forward.go:75:6: other declaration of VendorlessPath
pkg/controller/bundle/bundle_unpacker.go:323: running "go": exit status 1
...
...

…se state running in SNO

Update generated mock: fix import alias for apps/v1
The hack/overlays/goimports_vendorlesspath.go overlay was causing
conflicts with counterfeiter mock generation because the vendored
golang.org/x/tools/imports now includes VendorlessPath function.

This overlay is no longer needed as the function is now available
in the vendored dependency.

Fixes: make verify counterfeiter errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant