Skip to content

Conversation

@junzebao
Copy link
Collaborator

@junzebao junzebao commented Sep 3, 2025

This PR refactors the reclaim action so jobs are considered as the scheduling unit instead of pods. The main changes lie in the following areas:

  1. pkg/scheduler/api/resource_info.go: we ignore non GPU resources for any kind of resource comparison.
  2. pkg/scheduler/plugins/capacity/capacity.go: we block reclaim when the currentAllocation + pendingJobRequest goes beyond deserved.
  3. pkg/scheduler/actions/reclaim/reclaim.go: the main reclaim logic that got refactored.

junzebao and others added 24 commits September 3, 2025 11:59
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Signed-off-by: Junze Bao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants