December 26, 2022

IPFS error: Resource limits were exceeded

For a while now, my IPFS node has complained about resource limits being exceeded. Sadly, I’ve not had any time to look into the issue. Speaking of, what kind of miserable bastard would spend their Christmas holiday troubleshooting an IPFS node? Ah yes, that would be me.

$ journalctl -u ipfs.service -f

ipfs[1093]: ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 165 times with error "system: cannot reserve inbound connection: resource limit exceeded".
ipfs[1093]: ERROR resourcemanager libp2p/rcmgr_logging.go:57 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md>
ipfs[1093]: ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 174 times with error "system: cannot reserve inbound connection: resource limit exceeded".
ipfs[1093]: ERROR resourcemanager libp2p/rcmgr_logging.go:57 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md>
ipfs[1093]: ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 51 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
ipfs[1093]: ERROR resourcemanager libp2p/rcmgr_logging.go:57 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md>

For some reason, the libp2p Network Resource Manager had started flooding my journal with “resource limit exceeded” errors. And it was not exaggerating. Attempting to pull any content from my IPFS node mostly resulted in timeout errors.

My initial theory was that the error had occurred due to a lack of available memory. I’m running my combined IPFS node and gateway on a Scaleway Stardust instance. If you’re unfamiliar with a Stardust instance, it’s perhaps the lowest of the low-end VPS money can rent. However, it has served well for two years now, so unless Scaleway has implemented some infrastructure changes, I don’t see any reason why it should suddenly fail.

# Scaleway Stardust instance

Processor:    AMD EPYC 7281 16-Core Processor
CPU cores:    1
Frequency:    2096.060 MHz
RAM:          981Mi
Swap:         1.0Gi

Computed default limits are suddenly failing

Following the link provided in the error log, we’ll find instructions on how to raise the resource manager limits. I’m not sure why the computed default limits no longer were sufficient to keep the node functioning, but overriding these defaults seemed like a simple fix.

IPFS network congestion — Hello, there fellow IPFS node! Are you experiencing any kind of network congestion?

However, after restarting the ipfs.service I still got the same system cannot reserve inbound connection: resource limit exceeded errors.

Issuing the command ipfs swarm stats system to see the actual usage didn’t exactly explain why I was hitting the inbound connection limits:

{
  "System": {
    "Conns": 890,
    "ConnsInbound": 350,
    "ConnsOutbound": 540,
    "FD": 223,
    "Memory": 32575616,
    "Streams": 956,
    "StreamsInbound": 376,
    "StreamsOutbound": 580
  }
}

To hopefully get a better understanding of what resources were close to hitting their respective limit, I issued the command ipfs swarm stats --min-used-limit-perc=90 all that I found after rushing through the libp2p Network Resource Manager documentation. However, it just showed peers on the network:

{
  "Peers": {
    "12D3KooWRDLZqDPF2YWG5wSYhfUAuYfrSMgFyASoARRzWnQTcuNf": {
      "Conns": 4,
      "ConnsInbound": 4,
      "ConnsOutbound": 0,
      "FD": 0,
      "Memory": 0,
      "Streams": 3,
      "StreamsInbound": 3,
      "StreamsOutbound": 0
    }
  }
}

Overriding the limits by trial and error

As I mentioned previously, we see the actual usage for the system and transient scope by issuing ipfs swarm stats system and ipfs swarm stats transient. Based on those values I would just increase the limits until I was in the clear. For the system scope, I added the following values:

Conns: 2048
ConnsInbound: 1024
ConnsOutbound: 1024
Streams: 4096
StreamsInbound: 2048
StreamsOutbound: 2048

The transient scope, that according to the live stats used next to nothing (but still threw error messages about hitting resource limits), ended up with the following values:

Conns: 512
ConnsInbound: 256
ConnsOutbound: 256
Streams: 1024
StreamsInbound: 512
StreamsOutbound: 512

After implementing these changes, I was finally able to get rid of all the “resource limits exceeded” errors for the system and transient scope. I found the easiest approach to be to just edit ~/.ipfs/config directly and add the following configuration:

"ResourceMgr": {
      "Limits": {
        "System": {
          "Conns": 2048,
          "ConnsInbound": 1024,
          "ConnsOutbound": 1024,
          "Streams": 4096,
          "StreamsInbound": 2048,
          "StreamsOutbound": 2048
        },
        "Transient": {
          "Conns": 512,
          "ConnsInbound": 256,
          "ConnsOutbound": 256,
          "Streams": 1024,
          "StreamsInbound": 512,
          "StreamsOutbound": 512
        }
      }
    },

Peer resource limits were exceeded

I just can’t catch a break, can I? So the errors have streamed down to the peer scope. Well, I am not raising the limits for individual peers, and my node seems to be functioning alright, so I guess this is as good as it gets for me.

ipfs[9983]: ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 93 times with error "peer:12D3KooWQE3CWA3MJ1YhrYNP8EE3JErGbrCtpKRkFrWgi45nYAMn: cannot reserve inbound stream: resource limit exceeded".
ipfs[9983]: ERROR resourcemanager libp2p/rcmgr_logging.go:57 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
ipfs[9983]: ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 71 times with error "peer:12D3KooWDLYiAdzUdM7iJHhWu5KjmCN62aWd7brQEQGRWbv8QcVb: cannot reserve inbound stream: resource limit exceeded".
ipfs[9983]: ERROR resourcemanager libp2p/rcmgr_logging.go:57 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
ipfs[9983]: ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 3 times with error "peer:12D3KooWGGyvEomXVi5YHqXdfGHx1GKHjVrUo313pWCs5uSfkoHK: cannot reserve inbound stream: resource limit exceeded".
ipfs[9983]: ERROR resourcemanager libp2p/rcmgr_logging.go:57 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr

I’ll admit I found this exercise a bit confusing, especially as I couldn’t make heads or tails of the resource usage reports provided by ipfs swarm stats. Not that I’m any kind of expert on IPFS though. I was not even aware that go-ipfs is now called Kubo. Maybe something will be done about the computed defaults with the upcoming Kubo v0.18.0 release. To quote my users, this stuff used to just work, someone must be to blame for my issues ;)

Affected system: IPFS version 0.17.0 on Ubuntu 22.04.1 LTS.
IPFS node: gateway.paranoidpenguin.net

IPFS error: Resource limits were exceeded

Computed default limits are suddenly failing

Overriding the limits by trial and error

Peer resource limits were exceeded

Related