About self-healing

Self-healing is a pre-configured response and action to specific types of service failures. When a failure occurs, SolarWinds N-central automatically restarts the service or executes a script configured by the administrator to try to resolve the issue. The system then verifies if the problem has been resolved and sends the appropriate notifications.

When creating a Self Healing action that runs a script, and that script name contains spaces, you need to enclose the script name in quotes.

You cannot use self-healing on services that are in service groups or services monitored by the SolarWinds N-central server.

Self-healing at the device level is only available under the following conditions:

  • The device where the service has been added is in Professional mode.
  • The operating system of the device is Microsoft Windows.
  • The device where the service has been added is being monitored by a Windows agent or probe.

Self-Healing does not run for the first 15 minutes after a Windows device starts up to prevent a flood of self-healing tasks for services still in the process of starting. If you configured a self-healing task for a Windows Service, it will trigger during this window, but will not execute on the device as part of this fail-safe. As a result, the service remains in a failed state. Ensure that the number of attempts per hour is increased to ensure that it the task has another chance to run after that 15 minute window.

You can also modify the Execution Timeout to allow more attempts. This timeout is not the amount of time the agent waits for the self-healing action to run. It is a window in which it will try to self-heal before manual intervention would be required. Setting the Execution Timeout to one hour, and only allowing one attempt per hour means only one attempt will ever be made. In the situation where a service has failed to start after a reboot, it triggers during the first 15 minutes and not try again afterward, as it has hit its limit of attempts and is not allowed to try again in the next hour.

The second self-healing attempt and afterward, based on maximum per hour/day per Execution Timeout, waits for the same duration as the scan interval of the service. Ensure that the Execution Timeout and Scan Interval allows for additional attempts in conjunction with the number of attempts per hour/day.

