Request for improvement regarding the issue where Pod CIDR exhausted

### Tell us about your request

In CNI overlay network + NAP enabled cluster, if Pod CIDR is not large enough, it can be exhausted since each node always allocate 256 IPs (/24 bit). In my case, Pod CIDR was /16 bits (65,536 IPs) and the num of nodes scaled out more than 256 nodes and the 257th node ended up being in Not ready state due to Pod CIDR exhaustion. (256x256=65536)
However, the problem is Karpenter has no validation check for this and just creates a VM so the node ends up being not ready. Also, it didn't expose any logs in Karpenter events related to Pod CIDR exhaustion. I could find the root cause in nodeNetworkConfig event instead telling "subnet is full". Since it can be very hard for users to verify the root cause unless they check nodeNetworkConfig events or all events, could we improve either validation part or exposing logs in the Karpenter events so that users can verify themselves and resolve the issue?
In CA(cluster autoscaler), if we scale out more than Pod CIDR like 257th node, it gives an error right away in validation part.

The error in nodeNetworkConfig looks like this :
```
"PreciseTimeStamp": 2026-03-23T06:03:36.8916355Z,
"logPreciseTime": 2026-03-23T06:03:36.8916355Z,
"reason": NCUpdateFailed,
"reportingController": dnc-rc/nnc-reconciler,
"reportingInstance": ,
"namespace": kube-system,
"kind": NodeNetworkConfig,
"name": aks-xx-xxx-xxxxx,
"message": Failed to upsert NC xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx, err = JsonError:[Code:Unknown Status, Text:[dnctl] failed to Get NetworkContainer status with status 400: {"State":"Reserve","Status":"Failed","NICStatus":"N/A","FailureDetail":{"ErrorCode":8,"Text":"primaryIP is nil, failed to allocate address for request {SubnetName:routingdomain_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx_overlaysubnet SubnetType:AzOverlay IPConstraint: NodeConstraint: RequestType: Scope: SecondaryIPCount:0 PrimaryIPPrefixBits:24 IPFamilies:[]}. Subnet is full","HttpStatusCode":400}}
, HTTPStatus:400],
"level": Warning
```

**Ideas for improvement**
	- One approach is to improve validation so that, before a node is created, the Karpenter checks whether there are available IP addresses that can be allocated from the Pod CIDR. If not, new VM is not created and showing Pod CIDR exhaustion error in the Karpenter event.
	- Otherwise, if it's hard to improve the validation part, we can improve Karpenter events that explicitly indicate that Pod CIDR exhaustion has occurred and share documentation so that users can expand Pod CIDR expansion through the errors, resolve the issue on their own.


### Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Improvement request.

### Are you currently working around this issue?

Yes

### Additional Context

_No response_

### Attachments

_No response_

### Community Note

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for improvement regarding the issue where Pod CIDR exhausted #1624

Tell us about your request

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Are you currently working around this issue?

Additional Context

Attachments

Community Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request for improvement regarding the issue where Pod CIDR exhausted #1624

Description

Tell us about your request

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Are you currently working around this issue?

Additional Context

Attachments

Community Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions