Configuring Local-Disk Failover for a Forest

Save PDF

Last Updated: April 17, 2026
7 minute read

MarkLogic Server
Version 12.0
Documentation

This section describes the procedure for configuring local-disk failover for a forest. For details about how failover works and the requirements for failover, see High Availability of Data Nodes With Failover. For details on configuring shared-disk failover, see Configuring Shared-Disk Failover for a Forest. This section includes the following sections:

Setting Up Local-Disk Failover for a Forest
Reverting a Failed Over Forest Back to the Primary Host

For other failover administrative procedures that apply to both local-disk and shared-disk failover, see Other Failover Configuration Tasks.

Setting Up Local-Disk Failover for a Forest

Setting up failover for a forest is a relatively simple administrative process. This section describes this procedure. There are two basic parts to the procedure:

Enabling Failover in a Group
Configuring Local-Disk Failover For a Forest

Enabling Failover in a Group

For each group in which you want to host a failover forest, perform the following steps:

Before setting up failover, ensure that you have met all the requirements for failover, as described in Requirements for Local-Disk Failover.
In the groups configuration page for the group in which the failover host belongs, make sure the failover enable button is set to true.

This group-level failover enable button provides global control, at the group level, for enabling and disabling failover for all forests in that group.

You can enable or disable failover at the group level at any time. It does not stop you from configuring failover for forests; it only stops failover from actually occurring.

Configuring Local-Disk Failover For a Forest

To set up local-disk failover on a forest and set up one or more replica forests, perform the following steps:

Before setting up failover, ensure that you have met all the requirements for failover, as described in Requirements for Local-Disk Failover and enable failover for the group, as described in Enabling Failover in a Group.
Create one or more forests on one or more hosts to use as a replica (or use existing forests). If you add a replica forest to an existing master forest, it will take the master forest offline for a period ranging between a few seconds to a few minutes.
Either create a new forest or use an existing forest on a different host from the replica forest(s). This forest will be the primary forest. If you are modifying an existing forest, skip to step 6. To create a new forest, first click the Forests link in the left tree menu, then click the Create tab. The Create Forest page appears. Note the failover enable button and the failover hosts section at the bottom.
Enter a name for the forest.
Specify a data directory.
Select true for failover enable. Note that failover enable must be set to true at both the forest and the group level for failover to be active.
Select the replica forest from the drop down menu for forest replicas. You can set one or more replica forests.
Click OK to create or modify the forest.

The forest is now configured with the specified replica forests. You must attach the primary forest to a database before you can use the forest, but it is ready and set up for failover. You cannot attach the replica forests to a database; they will automatically be kept up-to-date as you update content in the primary forest.

Reverting a Failed Over Forest Back to the Primary Host

If a forest fails over to a failover host, causing a replica forest to take the role of the primary forest, the replica forest will remain in the open state until the host unmounts the forest. If you have a failed over forest and want to revert it back to the original primary host (unfailover the forest), you must either restart the forest that is open or restart the host in which the forest open. You should only do this if the original primary forest has a state of sync replicating, which indicates that it is up-to-date and ready to take over. After restarting the forest that is currently open, the forest will automatically open on the primary host (if the original primary forest is in the state sync replicating). Make sure the primary host is back online and corrected before attempting to unfailover the forest. To check the status of the hosts in the cluster, see the Cluster Status Page in the Admin Interface. To check the status of the forest, see the Forest Status Pages in the Admin Interface.

To restart the forest, perform the following steps:

Navigate to the Status page for the forest that has failed over. For example, if the forest name is myFailoverForest, click Forests > myFailoverForest in the left tree menu, then click the Status tab.
On the Forest Status page, click the restart button.
Click OK on the Restart Forest confirmation page.
When the Forest Status page returns, if the Mount State is unmounted, the forest might not have completed mounting. Refresh the page and the Mount State should indicate that the forest is open.

The forest is restarted, and if the primary host is available, the primary host will mount the forest. If the primary host is not available, the first failover host will try to mount the forest, and so on until there are no more failover hosts to try. If you look in the ErrorLog.txt log file for the primary host, you will see messages similar to the following:

2010-09-13 20:16:47.751 Info: Mounted forest myFailoverForest locally on /space/marklogic/Forests/myFailoverForest
2010-09-13 20:16:47.751 Info: Forest replica accepts forest myFailoverForest as the master with timestamp 2564905551526239330

If you look at the ErrorLog.txt log file for any other host in the cluster, you will see messages similar to the following:

2010-09-13 20:16:47.751 Info: Forest failover1 accepts forest myFailover as the master with timestamp 2564905551526239330
2010-09-14 17:01:29.651 Info: Forest replica1 starting synchronization to forest failover1
2010-09-14 17:01:29.666 Info: Forest replica1 starting bulk replication to forest failover1
2010-09-14 17:01:29.776 Info: Forest replica1 needs to replicate 0 fragments to forest failover1
2010-09-14 17:01:29.776 Info: Forest replica1 finished bulk replicated 0 fragments to forest failover1
2010-09-14 17:01:29.807 Info: Forest replica1 finished bulk replication to forest failover1
2010-09-14 17:01:29.822 Info: Forest replica1 finished synchronizing to replica forest failover1
2010-09-14 17:09:26.638 Info: Forest replica1 accepts forest failover1 as the master with precise time 12845094147890000

Shutdown on Storage Failure Configuration

You can configure these database settings:

Setting	Description
shutdown-on-storage-failure	When set to true, if storage for any primary forest, including replica forests, for the database cannot be accessed for more than the configured timeout, then the MarkLogic Server process will shut itself down. This shutdown triggers a failover of any forests that are managed by the MarkLogic Server host. This setting is disabled by default. These admin functions can be used to enable or disable this behavior: `admin.databaseGetShutdownOnStorageFailure()` `admin.databaseSetShutdownOnStorageFailure()`
storage-failure-timeout	If `shutdown-on-storage-failure` is set to true, and storage cannot be accessed, then MarkLogic Server shuts down after this many seconds. This setting is 60 seconds by default. The following new admin functions can be used to control this setting: `admin.databaseGetStorageFailureTimeout()` `admin.databaseSetStorageFailureTimeout()`

Setting

Description

shutdown-on-storage-failure

When set to true, if storage for any primary forest, including replica forests, for the database cannot be accessed for more than the configured timeout, then the MarkLogic Server process will shut itself down. This shutdown triggers a failover of any forests that are managed by the MarkLogic Server host. This setting is disabled by default.

These admin functions can be used to enable or disable this behavior:

storage-failure-timeout

If shutdown-on-storage-failure is set to true, and storage cannot be accessed, then MarkLogic Server shuts down after this many seconds. This setting is 60 seconds by default.

The following new admin functions can be used to control this setting:

To check for storage failure, hosts in a MarkLogic Server cluster write and read a file named DiskCheck every 5 seconds to these directories:

Default Data Directory: /var/opt/MarkLogic or the configured MARKLOGIC_DATA_DIR in your marklogic.conf file.
Fast Data Directory of all databases, if configured differently from Default Data Directory.
Large Data Directory of all databases, if configured differently from Default Data Directory.

If there is delay in the access of any of these filesystems, then Debug logs appear as part of your ErrorLog.txt:

Debug: Forest::<forest name> read file <forest data directory>/DiskCheck hang for <x> seconds

If this delay persists for more than 10 seconds, then the error level increases to Warning:

Warning: Forest::<forest name> read file <forest data directory>/DiskCheck hang for <x> seconds

If this delay persists for more than 60 seconds, then the error level increases to Error:

Error: Forest::<forest name> read file <forest data directory>/DiskCheck hang for <x> seconds

If this delay persists for more than 180 seconds, then the error level increases to Critical:

Critical: Forest::<forest name> read file <forest data directory>/DiskCheck hang for <x> seconds

If DiskCheck is not delayed, and instead the read or write action fails completely, then Error logs appear as part of the ErrorLog.txt:

2024-10-16 14:04:03.714 Error: XDMP-DISKCHECKERROR: Disk Check Error in Forest forest-2M: Failed to write file /var/opt/MarkLogic/Forests/forest-2M/DiskCheck because of SVC-FILWRT (successBefore: true, sinceFirstError: 15 sec)

If the database has Shutdown on Storage Failure enabled, then failures or delays after the Storage Failure Timeout passes cause MarkLogic Server to shut down on the host with the issue. This triggers the forests to fail over. To guarantee complete failover, MarkLogic Server on this host remains unresponsive for 2 minutes.

A failing database with Storage Failure Timeout set to 30 seconds produces these log entries:

2024-10-16 14:04:18.516 Error: XDMP-DISKCHECKERROR: Disk Check Error in Forest forest-2M: Failed to write file /var/opt/MarkLogic/Forests/forest-2M/DiskCheck because of SVC-FILWRT (successBefore: true, sinceFirstError: 30 sec)
2024-10-16 14:04:18.523 Critical: Forest::forest-2M disk check for /var/opt/MarkLogic/Forests/forest-2M/DiskCheck keep failing for 30 seconds. Shutting down...

To prevent the host from automatically shutting down again before the storage issue is addressed, MarkLogic evaluates DiskCheck on the failed host upon restart. This log shows the state of successBefore before and after shutdown:

2024-10-16 14:04:03.714 Error: XDMP-DISKCHECKERROR: ... SVC-FILWRT (successBefore: true, sinceFirstError: 15 sec)
...
2024-10-16 14:04:18.516 Error: XDMP-DISKCHECKERROR: ... SVC-FILWRT (successBefore: true, sinceFirstError: 30 sec)
2024-10-16 14:04:18.523 Critical: Forest::forest-2M disk check for /var/opt/MarkLogic/Forests/forest-2M/DiskCheck keep failing for 30 seconds. Shutting down...
2024-10-16 14:04:18.585 Info: Stopping XDQPServerConnection, ...
....
2024-10-16 14:04:19.618 Info: Starting domestic XDQPServerConnection, ...
...
2024-10-16 14:04:23.737 Notice: Starting MarkLogic Server 12.0 x86_64 in /opt/MarkLogic with data in /var/opt/MarkLogic
2024-10-16 14:04:23.737 Critical: MarkLogic was shutdown because of disk failure or hanging. Sleep for 120 seconds to allow failover.
...
2024-10-16 14:06:35.709 Error: XDMP-DISKCHECKERROR: ... SVC-FILWRT (successBefore: false, sinceFirstError: 5 sec)

successBefore: true (at lines 1 to 3) indicates that the directory has been working since MarkLogic Server started. This setting considers subsequent storage failures as new issues that can eventually lead to shutdown.

When MarkLogic Server restarts, it will detect that the previous shutdown was caused by storage failure (at lines 10). To guarantee complete failover of forests, MarkLogic Server remains non-responsive for 2 minutes. Subsequently, DiskCheck will then report successBefore is false (at line 12). This setting keeps storage failures from being considered a new issue and prevents MarkLogic Server from shutting down again.

Once the storage issue is addressed, DiskCheck succeeds and sets successBefore to true. After all forests on this host have the status of sync replicating, then revert these forests back to the primary host.

Note

If this feature is enabled in any database, then a single failing forest will cause the entire host to shut down. Ensure that all forests mounted on affected hosts have failover forests configured.

Scalability, Availability, and Failover