Why disable DRBD in Pacemaker cluster












5















The DRBD documentation (in section Integrating DRBD with Pacemaker clusters) recommends that DRBD should be disabled in a Pacemaker cluster:




If you are employing the DRBD OCF resource agent, it is recommended
that you defer DRBD startup, shutdown, promotion, and demotion
exclusively to the OCF resource agent. That means that you should
disable the DRBD init script: chkconfig drbd off.




Under systemd this would amount to systemctl disable drbd.service.



Is there any harm in enabling DRBD despite of this recommendation? The idea would be to enable DRBD, but to disable Corosync and Pacemaker, so that after a cluster node fails and reboots it will continue receiving DRBD-synced data but will otherwise remain "passive". This should allow for the failed node to be analyzed before it can reenter the cluster, but live data still being saved on both cluster nodes in the meantime. What is the rationale behind the recommendation that would outweigh this rationale?










share|improve this question


















  • 1





    The idea would be to enable DRBD, but to disable Corosync and Pacemaker, What controls your cluster, then? What you are describing is usually handled via resource stickiness etc.

    – Lenniey
    Jan 15 at 12:32













  • @Lenny Pacemaker controls the cluster. It is active in normal operation. But when a node fails it does not restart automatically. At least that's the configuration I've inherited :)

    – rookie099
    Jan 15 at 13:30








  • 1





    Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.

    – Lenniey
    Jan 15 at 13:34
















5















The DRBD documentation (in section Integrating DRBD with Pacemaker clusters) recommends that DRBD should be disabled in a Pacemaker cluster:




If you are employing the DRBD OCF resource agent, it is recommended
that you defer DRBD startup, shutdown, promotion, and demotion
exclusively to the OCF resource agent. That means that you should
disable the DRBD init script: chkconfig drbd off.




Under systemd this would amount to systemctl disable drbd.service.



Is there any harm in enabling DRBD despite of this recommendation? The idea would be to enable DRBD, but to disable Corosync and Pacemaker, so that after a cluster node fails and reboots it will continue receiving DRBD-synced data but will otherwise remain "passive". This should allow for the failed node to be analyzed before it can reenter the cluster, but live data still being saved on both cluster nodes in the meantime. What is the rationale behind the recommendation that would outweigh this rationale?










share|improve this question


















  • 1





    The idea would be to enable DRBD, but to disable Corosync and Pacemaker, What controls your cluster, then? What you are describing is usually handled via resource stickiness etc.

    – Lenniey
    Jan 15 at 12:32













  • @Lenny Pacemaker controls the cluster. It is active in normal operation. But when a node fails it does not restart automatically. At least that's the configuration I've inherited :)

    – rookie099
    Jan 15 at 13:30








  • 1





    Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.

    – Lenniey
    Jan 15 at 13:34














5












5








5








The DRBD documentation (in section Integrating DRBD with Pacemaker clusters) recommends that DRBD should be disabled in a Pacemaker cluster:




If you are employing the DRBD OCF resource agent, it is recommended
that you defer DRBD startup, shutdown, promotion, and demotion
exclusively to the OCF resource agent. That means that you should
disable the DRBD init script: chkconfig drbd off.




Under systemd this would amount to systemctl disable drbd.service.



Is there any harm in enabling DRBD despite of this recommendation? The idea would be to enable DRBD, but to disable Corosync and Pacemaker, so that after a cluster node fails and reboots it will continue receiving DRBD-synced data but will otherwise remain "passive". This should allow for the failed node to be analyzed before it can reenter the cluster, but live data still being saved on both cluster nodes in the meantime. What is the rationale behind the recommendation that would outweigh this rationale?










share|improve this question














The DRBD documentation (in section Integrating DRBD with Pacemaker clusters) recommends that DRBD should be disabled in a Pacemaker cluster:




If you are employing the DRBD OCF resource agent, it is recommended
that you defer DRBD startup, shutdown, promotion, and demotion
exclusively to the OCF resource agent. That means that you should
disable the DRBD init script: chkconfig drbd off.




Under systemd this would amount to systemctl disable drbd.service.



Is there any harm in enabling DRBD despite of this recommendation? The idea would be to enable DRBD, but to disable Corosync and Pacemaker, so that after a cluster node fails and reboots it will continue receiving DRBD-synced data but will otherwise remain "passive". This should allow for the failed node to be analyzed before it can reenter the cluster, but live data still being saved on both cluster nodes in the meantime. What is the rationale behind the recommendation that would outweigh this rationale?







drbd pacemaker corosync






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 15 at 12:26









rookie099rookie099

1497




1497








  • 1





    The idea would be to enable DRBD, but to disable Corosync and Pacemaker, What controls your cluster, then? What you are describing is usually handled via resource stickiness etc.

    – Lenniey
    Jan 15 at 12:32













  • @Lenny Pacemaker controls the cluster. It is active in normal operation. But when a node fails it does not restart automatically. At least that's the configuration I've inherited :)

    – rookie099
    Jan 15 at 13:30








  • 1





    Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.

    – Lenniey
    Jan 15 at 13:34














  • 1





    The idea would be to enable DRBD, but to disable Corosync and Pacemaker, What controls your cluster, then? What you are describing is usually handled via resource stickiness etc.

    – Lenniey
    Jan 15 at 12:32













  • @Lenny Pacemaker controls the cluster. It is active in normal operation. But when a node fails it does not restart automatically. At least that's the configuration I've inherited :)

    – rookie099
    Jan 15 at 13:30








  • 1





    Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.

    – Lenniey
    Jan 15 at 13:34








1




1





The idea would be to enable DRBD, but to disable Corosync and Pacemaker, What controls your cluster, then? What you are describing is usually handled via resource stickiness etc.

– Lenniey
Jan 15 at 12:32







The idea would be to enable DRBD, but to disable Corosync and Pacemaker, What controls your cluster, then? What you are describing is usually handled via resource stickiness etc.

– Lenniey
Jan 15 at 12:32















@Lenny Pacemaker controls the cluster. It is active in normal operation. But when a node fails it does not restart automatically. At least that's the configuration I've inherited :)

– rookie099
Jan 15 at 13:30







@Lenny Pacemaker controls the cluster. It is active in normal operation. But when a node fails it does not restart automatically. At least that's the configuration I've inherited :)

– rookie099
Jan 15 at 13:30






1




1





Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.

– Lenniey
Jan 15 at 13:34





Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.

– Lenniey
Jan 15 at 13:34










3 Answers
3






active

oldest

votes


















5














Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.






share|improve this answer































    2














    When using a cluster resource manager, any resource should be controlled by, well, the resource manager. Any resource enabled/disabled from the outside of the cluster resource manager is a potential source of confusion, both for the administrator and the resource manager itself.






    share|improve this answer































      2














      There are two answers already that detail clearly that this is a bad idea and why, but maybe some details as to how it could go wrong for you and how you can use Pacemaker to address these problems would help to convince you and/or others to not do things this way.



      First, Pacemaker logs and accounts for resource failures. The default failure count for a resource before it gets "banned" from a node is three within the resource-failure-timeout window, which by default never times out. So if your DRBD resource (or any other resource for that matter) fails three times in a row, it is banned from its currently active node by using a strong (infinite) "negative location constraint", meaning that the resource can run anywhere BUT its currently active node. Once that rule is in place, the resource either moves elsewhere if it can, or it stops until its failures are addressed.



      So you can see, Pacemaker can be made to handle these failures gracefully on its own.



      You need to understand what Pacemaker is and how it behaves to grok why managing resources it enforces the state of outside of Pacemaker is bad. Pacemaker is a finite state system. It depends on being in complete control of the resources that it manages so that it can gracefully recover from failures and ensure that resources are either stopped or started where they should be.



      Consider a simple resource that should only be run on one node at a time, lest it become "split-brain" and create a divergent dataset - just about the worst thing that could happen, as this will almost certainly cause either data loss or require large amounts of operator attention to prevent data loss.



      Pacemaker controls this resource, and starts an instance of the software on node "Able". A well-meaning administrator finds that the service is started on Able, but that its systemd unit file is "disabled". That admin enables the unit file so that the service will "come back" on reboot, unaware that Pacemaker is handling this already. The systemd unit file is configured to restart the resource on failure, as many are.



      Once Pacemaker tries to migrate this resource away from Able to the second node in the cluster "Baker", the resource encounters a stop failure, as the service was killed but somehow it's still alive and we're in the middle of a zombie apocalypse. Since the resource cannot be stopped, it cannot be started on Baker without causing a split-brain condition. The resource flaps between stopped and started as systemd and Pacemaker battle for control. Eventually, Pacemaker "gives up" on the resource and puts it in "unmanaged mode", meaning that no start or stop operations will be performed on that resource.



      So in that scenario, Systemd won because it was "stupider and more insistent" than Pacemaker. This is extremely difficult for an admin who's not familiar with the behavior of both Pacemaker and Systemd to understand, as it will simply look like Pacemaker is failing all over the place -- when in reality it's doing exactly what it's supposed to do given the conditions at hand.



      Also consider that the above scenario had the best possible ending for that condition. Given the slightest infrastructure failure, the cluster would have become split-brain with that resource active on both nodes.



      As an aside, fencing via STONTIH would prevent the cluster from becoming split-brain in that scenario, but STONITH is a last resort for cluster stability, while the above condition would put it as almost a first resort. And as always, you NEED STONITH to make a cluster production-ready.






      share|improve this answer


























      • This is very helpful, esp. the scenario where Systemd may restart a failed service (when it is enabled). BTW, STONITH is in place. If I understand correctly the motivation was that a failed node would reboot after STONITH and that it should then not rejoin the cluster immediately/automatically (hence corosync and pacemaker disabled) but should restart to DRBD-sync data immediately (hence drbd enabled), the second as a precaution in case the remaining node also failed "somehow" in the meantime.

        – rookie099
        2 days ago













      • systemctl cat drbd.service | grep ^Restart= says Restart=no.

        – rookie099
        2 days ago













      • Yeah, that one shouldn't be set to restart. All the same, if you have the systemd unit enabled, it will be started on boot (which is not acceptable).

        – Spooler
        2 days ago






      • 1





        The default STONITH action is usually the most appropriate for your agent, unless you're solving some kind of issue by defining that action explicitly. As for the rest, yes.

        – Spooler
        2 days ago






      • 1





        @rookie099 it really depends on your use-case. For example: Most of my PCMK clusters are set to STONITH = poweroff via IPMI, because then I have the time to investigate in a controlled manner before the node rejoins the cluster or starts. On one other cluster it is set to restart, but this one controls far less resources than the others, so there should not be anything breaking because of STONITH.

        – Lenniey
        yesterday











      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "2"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f949153%2fwhy-disable-drbd-in-pacemaker-cluster%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      5














      Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.






      share|improve this answer




























        5














        Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.






        share|improve this answer


























          5












          5








          5







          Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.






          share|improve this answer













          Well the intention of disabling the DRBD service on the OS level is that everything is controlled by pacemaker. If two services (PCMK and your OS, for example) are trying to start / stop / promote / demote etc., you are risking split-brains. For a controlled cluster-environment, everything should be handled by your cluster resource manager, in this case pacemaker, to avoid confusion between the cluster nodes. In a case of split-brain or similar, you CRM will either STONITH or fence or use the configured quorum on the other nodes to resolve it.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 15 at 15:25









          LennieyLenniey

          2,74421023




          2,74421023

























              2














              When using a cluster resource manager, any resource should be controlled by, well, the resource manager. Any resource enabled/disabled from the outside of the cluster resource manager is a potential source of confusion, both for the administrator and the resource manager itself.






              share|improve this answer




























                2














                When using a cluster resource manager, any resource should be controlled by, well, the resource manager. Any resource enabled/disabled from the outside of the cluster resource manager is a potential source of confusion, both for the administrator and the resource manager itself.






                share|improve this answer


























                  2












                  2








                  2







                  When using a cluster resource manager, any resource should be controlled by, well, the resource manager. Any resource enabled/disabled from the outside of the cluster resource manager is a potential source of confusion, both for the administrator and the resource manager itself.






                  share|improve this answer













                  When using a cluster resource manager, any resource should be controlled by, well, the resource manager. Any resource enabled/disabled from the outside of the cluster resource manager is a potential source of confusion, both for the administrator and the resource manager itself.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 15 at 15:12









                  shodanshokshodanshok

                  25.3k34084




                  25.3k34084























                      2














                      There are two answers already that detail clearly that this is a bad idea and why, but maybe some details as to how it could go wrong for you and how you can use Pacemaker to address these problems would help to convince you and/or others to not do things this way.



                      First, Pacemaker logs and accounts for resource failures. The default failure count for a resource before it gets "banned" from a node is three within the resource-failure-timeout window, which by default never times out. So if your DRBD resource (or any other resource for that matter) fails three times in a row, it is banned from its currently active node by using a strong (infinite) "negative location constraint", meaning that the resource can run anywhere BUT its currently active node. Once that rule is in place, the resource either moves elsewhere if it can, or it stops until its failures are addressed.



                      So you can see, Pacemaker can be made to handle these failures gracefully on its own.



                      You need to understand what Pacemaker is and how it behaves to grok why managing resources it enforces the state of outside of Pacemaker is bad. Pacemaker is a finite state system. It depends on being in complete control of the resources that it manages so that it can gracefully recover from failures and ensure that resources are either stopped or started where they should be.



                      Consider a simple resource that should only be run on one node at a time, lest it become "split-brain" and create a divergent dataset - just about the worst thing that could happen, as this will almost certainly cause either data loss or require large amounts of operator attention to prevent data loss.



                      Pacemaker controls this resource, and starts an instance of the software on node "Able". A well-meaning administrator finds that the service is started on Able, but that its systemd unit file is "disabled". That admin enables the unit file so that the service will "come back" on reboot, unaware that Pacemaker is handling this already. The systemd unit file is configured to restart the resource on failure, as many are.



                      Once Pacemaker tries to migrate this resource away from Able to the second node in the cluster "Baker", the resource encounters a stop failure, as the service was killed but somehow it's still alive and we're in the middle of a zombie apocalypse. Since the resource cannot be stopped, it cannot be started on Baker without causing a split-brain condition. The resource flaps between stopped and started as systemd and Pacemaker battle for control. Eventually, Pacemaker "gives up" on the resource and puts it in "unmanaged mode", meaning that no start or stop operations will be performed on that resource.



                      So in that scenario, Systemd won because it was "stupider and more insistent" than Pacemaker. This is extremely difficult for an admin who's not familiar with the behavior of both Pacemaker and Systemd to understand, as it will simply look like Pacemaker is failing all over the place -- when in reality it's doing exactly what it's supposed to do given the conditions at hand.



                      Also consider that the above scenario had the best possible ending for that condition. Given the slightest infrastructure failure, the cluster would have become split-brain with that resource active on both nodes.



                      As an aside, fencing via STONTIH would prevent the cluster from becoming split-brain in that scenario, but STONITH is a last resort for cluster stability, while the above condition would put it as almost a first resort. And as always, you NEED STONITH to make a cluster production-ready.






                      share|improve this answer


























                      • This is very helpful, esp. the scenario where Systemd may restart a failed service (when it is enabled). BTW, STONITH is in place. If I understand correctly the motivation was that a failed node would reboot after STONITH and that it should then not rejoin the cluster immediately/automatically (hence corosync and pacemaker disabled) but should restart to DRBD-sync data immediately (hence drbd enabled), the second as a precaution in case the remaining node also failed "somehow" in the meantime.

                        – rookie099
                        2 days ago













                      • systemctl cat drbd.service | grep ^Restart= says Restart=no.

                        – rookie099
                        2 days ago













                      • Yeah, that one shouldn't be set to restart. All the same, if you have the systemd unit enabled, it will be started on boot (which is not acceptable).

                        – Spooler
                        2 days ago






                      • 1





                        The default STONITH action is usually the most appropriate for your agent, unless you're solving some kind of issue by defining that action explicitly. As for the rest, yes.

                        – Spooler
                        2 days ago






                      • 1





                        @rookie099 it really depends on your use-case. For example: Most of my PCMK clusters are set to STONITH = poweroff via IPMI, because then I have the time to investigate in a controlled manner before the node rejoins the cluster or starts. On one other cluster it is set to restart, but this one controls far less resources than the others, so there should not be anything breaking because of STONITH.

                        – Lenniey
                        yesterday
















                      2














                      There are two answers already that detail clearly that this is a bad idea and why, but maybe some details as to how it could go wrong for you and how you can use Pacemaker to address these problems would help to convince you and/or others to not do things this way.



                      First, Pacemaker logs and accounts for resource failures. The default failure count for a resource before it gets "banned" from a node is three within the resource-failure-timeout window, which by default never times out. So if your DRBD resource (or any other resource for that matter) fails three times in a row, it is banned from its currently active node by using a strong (infinite) "negative location constraint", meaning that the resource can run anywhere BUT its currently active node. Once that rule is in place, the resource either moves elsewhere if it can, or it stops until its failures are addressed.



                      So you can see, Pacemaker can be made to handle these failures gracefully on its own.



                      You need to understand what Pacemaker is and how it behaves to grok why managing resources it enforces the state of outside of Pacemaker is bad. Pacemaker is a finite state system. It depends on being in complete control of the resources that it manages so that it can gracefully recover from failures and ensure that resources are either stopped or started where they should be.



                      Consider a simple resource that should only be run on one node at a time, lest it become "split-brain" and create a divergent dataset - just about the worst thing that could happen, as this will almost certainly cause either data loss or require large amounts of operator attention to prevent data loss.



                      Pacemaker controls this resource, and starts an instance of the software on node "Able". A well-meaning administrator finds that the service is started on Able, but that its systemd unit file is "disabled". That admin enables the unit file so that the service will "come back" on reboot, unaware that Pacemaker is handling this already. The systemd unit file is configured to restart the resource on failure, as many are.



                      Once Pacemaker tries to migrate this resource away from Able to the second node in the cluster "Baker", the resource encounters a stop failure, as the service was killed but somehow it's still alive and we're in the middle of a zombie apocalypse. Since the resource cannot be stopped, it cannot be started on Baker without causing a split-brain condition. The resource flaps between stopped and started as systemd and Pacemaker battle for control. Eventually, Pacemaker "gives up" on the resource and puts it in "unmanaged mode", meaning that no start or stop operations will be performed on that resource.



                      So in that scenario, Systemd won because it was "stupider and more insistent" than Pacemaker. This is extremely difficult for an admin who's not familiar with the behavior of both Pacemaker and Systemd to understand, as it will simply look like Pacemaker is failing all over the place -- when in reality it's doing exactly what it's supposed to do given the conditions at hand.



                      Also consider that the above scenario had the best possible ending for that condition. Given the slightest infrastructure failure, the cluster would have become split-brain with that resource active on both nodes.



                      As an aside, fencing via STONTIH would prevent the cluster from becoming split-brain in that scenario, but STONITH is a last resort for cluster stability, while the above condition would put it as almost a first resort. And as always, you NEED STONITH to make a cluster production-ready.






                      share|improve this answer


























                      • This is very helpful, esp. the scenario where Systemd may restart a failed service (when it is enabled). BTW, STONITH is in place. If I understand correctly the motivation was that a failed node would reboot after STONITH and that it should then not rejoin the cluster immediately/automatically (hence corosync and pacemaker disabled) but should restart to DRBD-sync data immediately (hence drbd enabled), the second as a precaution in case the remaining node also failed "somehow" in the meantime.

                        – rookie099
                        2 days ago













                      • systemctl cat drbd.service | grep ^Restart= says Restart=no.

                        – rookie099
                        2 days ago













                      • Yeah, that one shouldn't be set to restart. All the same, if you have the systemd unit enabled, it will be started on boot (which is not acceptable).

                        – Spooler
                        2 days ago






                      • 1





                        The default STONITH action is usually the most appropriate for your agent, unless you're solving some kind of issue by defining that action explicitly. As for the rest, yes.

                        – Spooler
                        2 days ago






                      • 1





                        @rookie099 it really depends on your use-case. For example: Most of my PCMK clusters are set to STONITH = poweroff via IPMI, because then I have the time to investigate in a controlled manner before the node rejoins the cluster or starts. On one other cluster it is set to restart, but this one controls far less resources than the others, so there should not be anything breaking because of STONITH.

                        – Lenniey
                        yesterday














                      2












                      2








                      2







                      There are two answers already that detail clearly that this is a bad idea and why, but maybe some details as to how it could go wrong for you and how you can use Pacemaker to address these problems would help to convince you and/or others to not do things this way.



                      First, Pacemaker logs and accounts for resource failures. The default failure count for a resource before it gets "banned" from a node is three within the resource-failure-timeout window, which by default never times out. So if your DRBD resource (or any other resource for that matter) fails three times in a row, it is banned from its currently active node by using a strong (infinite) "negative location constraint", meaning that the resource can run anywhere BUT its currently active node. Once that rule is in place, the resource either moves elsewhere if it can, or it stops until its failures are addressed.



                      So you can see, Pacemaker can be made to handle these failures gracefully on its own.



                      You need to understand what Pacemaker is and how it behaves to grok why managing resources it enforces the state of outside of Pacemaker is bad. Pacemaker is a finite state system. It depends on being in complete control of the resources that it manages so that it can gracefully recover from failures and ensure that resources are either stopped or started where they should be.



                      Consider a simple resource that should only be run on one node at a time, lest it become "split-brain" and create a divergent dataset - just about the worst thing that could happen, as this will almost certainly cause either data loss or require large amounts of operator attention to prevent data loss.



                      Pacemaker controls this resource, and starts an instance of the software on node "Able". A well-meaning administrator finds that the service is started on Able, but that its systemd unit file is "disabled". That admin enables the unit file so that the service will "come back" on reboot, unaware that Pacemaker is handling this already. The systemd unit file is configured to restart the resource on failure, as many are.



                      Once Pacemaker tries to migrate this resource away from Able to the second node in the cluster "Baker", the resource encounters a stop failure, as the service was killed but somehow it's still alive and we're in the middle of a zombie apocalypse. Since the resource cannot be stopped, it cannot be started on Baker without causing a split-brain condition. The resource flaps between stopped and started as systemd and Pacemaker battle for control. Eventually, Pacemaker "gives up" on the resource and puts it in "unmanaged mode", meaning that no start or stop operations will be performed on that resource.



                      So in that scenario, Systemd won because it was "stupider and more insistent" than Pacemaker. This is extremely difficult for an admin who's not familiar with the behavior of both Pacemaker and Systemd to understand, as it will simply look like Pacemaker is failing all over the place -- when in reality it's doing exactly what it's supposed to do given the conditions at hand.



                      Also consider that the above scenario had the best possible ending for that condition. Given the slightest infrastructure failure, the cluster would have become split-brain with that resource active on both nodes.



                      As an aside, fencing via STONTIH would prevent the cluster from becoming split-brain in that scenario, but STONITH is a last resort for cluster stability, while the above condition would put it as almost a first resort. And as always, you NEED STONITH to make a cluster production-ready.






                      share|improve this answer















                      There are two answers already that detail clearly that this is a bad idea and why, but maybe some details as to how it could go wrong for you and how you can use Pacemaker to address these problems would help to convince you and/or others to not do things this way.



                      First, Pacemaker logs and accounts for resource failures. The default failure count for a resource before it gets "banned" from a node is three within the resource-failure-timeout window, which by default never times out. So if your DRBD resource (or any other resource for that matter) fails three times in a row, it is banned from its currently active node by using a strong (infinite) "negative location constraint", meaning that the resource can run anywhere BUT its currently active node. Once that rule is in place, the resource either moves elsewhere if it can, or it stops until its failures are addressed.



                      So you can see, Pacemaker can be made to handle these failures gracefully on its own.



                      You need to understand what Pacemaker is and how it behaves to grok why managing resources it enforces the state of outside of Pacemaker is bad. Pacemaker is a finite state system. It depends on being in complete control of the resources that it manages so that it can gracefully recover from failures and ensure that resources are either stopped or started where they should be.



                      Consider a simple resource that should only be run on one node at a time, lest it become "split-brain" and create a divergent dataset - just about the worst thing that could happen, as this will almost certainly cause either data loss or require large amounts of operator attention to prevent data loss.



                      Pacemaker controls this resource, and starts an instance of the software on node "Able". A well-meaning administrator finds that the service is started on Able, but that its systemd unit file is "disabled". That admin enables the unit file so that the service will "come back" on reboot, unaware that Pacemaker is handling this already. The systemd unit file is configured to restart the resource on failure, as many are.



                      Once Pacemaker tries to migrate this resource away from Able to the second node in the cluster "Baker", the resource encounters a stop failure, as the service was killed but somehow it's still alive and we're in the middle of a zombie apocalypse. Since the resource cannot be stopped, it cannot be started on Baker without causing a split-brain condition. The resource flaps between stopped and started as systemd and Pacemaker battle for control. Eventually, Pacemaker "gives up" on the resource and puts it in "unmanaged mode", meaning that no start or stop operations will be performed on that resource.



                      So in that scenario, Systemd won because it was "stupider and more insistent" than Pacemaker. This is extremely difficult for an admin who's not familiar with the behavior of both Pacemaker and Systemd to understand, as it will simply look like Pacemaker is failing all over the place -- when in reality it's doing exactly what it's supposed to do given the conditions at hand.



                      Also consider that the above scenario had the best possible ending for that condition. Given the slightest infrastructure failure, the cluster would have become split-brain with that resource active on both nodes.



                      As an aside, fencing via STONTIH would prevent the cluster from becoming split-brain in that scenario, but STONITH is a last resort for cluster stability, while the above condition would put it as almost a first resort. And as always, you NEED STONITH to make a cluster production-ready.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Jan 15 at 18:05

























                      answered Jan 15 at 16:20









                      SpoolerSpooler

                      5,877927




                      5,877927













                      • This is very helpful, esp. the scenario where Systemd may restart a failed service (when it is enabled). BTW, STONITH is in place. If I understand correctly the motivation was that a failed node would reboot after STONITH and that it should then not rejoin the cluster immediately/automatically (hence corosync and pacemaker disabled) but should restart to DRBD-sync data immediately (hence drbd enabled), the second as a precaution in case the remaining node also failed "somehow" in the meantime.

                        – rookie099
                        2 days ago













                      • systemctl cat drbd.service | grep ^Restart= says Restart=no.

                        – rookie099
                        2 days ago













                      • Yeah, that one shouldn't be set to restart. All the same, if you have the systemd unit enabled, it will be started on boot (which is not acceptable).

                        – Spooler
                        2 days ago






                      • 1





                        The default STONITH action is usually the most appropriate for your agent, unless you're solving some kind of issue by defining that action explicitly. As for the rest, yes.

                        – Spooler
                        2 days ago






                      • 1





                        @rookie099 it really depends on your use-case. For example: Most of my PCMK clusters are set to STONITH = poweroff via IPMI, because then I have the time to investigate in a controlled manner before the node rejoins the cluster or starts. On one other cluster it is set to restart, but this one controls far less resources than the others, so there should not be anything breaking because of STONITH.

                        – Lenniey
                        yesterday



















                      • This is very helpful, esp. the scenario where Systemd may restart a failed service (when it is enabled). BTW, STONITH is in place. If I understand correctly the motivation was that a failed node would reboot after STONITH and that it should then not rejoin the cluster immediately/automatically (hence corosync and pacemaker disabled) but should restart to DRBD-sync data immediately (hence drbd enabled), the second as a precaution in case the remaining node also failed "somehow" in the meantime.

                        – rookie099
                        2 days ago













                      • systemctl cat drbd.service | grep ^Restart= says Restart=no.

                        – rookie099
                        2 days ago













                      • Yeah, that one shouldn't be set to restart. All the same, if you have the systemd unit enabled, it will be started on boot (which is not acceptable).

                        – Spooler
                        2 days ago






                      • 1





                        The default STONITH action is usually the most appropriate for your agent, unless you're solving some kind of issue by defining that action explicitly. As for the rest, yes.

                        – Spooler
                        2 days ago






                      • 1





                        @rookie099 it really depends on your use-case. For example: Most of my PCMK clusters are set to STONITH = poweroff via IPMI, because then I have the time to investigate in a controlled manner before the node rejoins the cluster or starts. On one other cluster it is set to restart, but this one controls far less resources than the others, so there should not be anything breaking because of STONITH.

                        – Lenniey
                        yesterday

















                      This is very helpful, esp. the scenario where Systemd may restart a failed service (when it is enabled). BTW, STONITH is in place. If I understand correctly the motivation was that a failed node would reboot after STONITH and that it should then not rejoin the cluster immediately/automatically (hence corosync and pacemaker disabled) but should restart to DRBD-sync data immediately (hence drbd enabled), the second as a precaution in case the remaining node also failed "somehow" in the meantime.

                      – rookie099
                      2 days ago







                      This is very helpful, esp. the scenario where Systemd may restart a failed service (when it is enabled). BTW, STONITH is in place. If I understand correctly the motivation was that a failed node would reboot after STONITH and that it should then not rejoin the cluster immediately/automatically (hence corosync and pacemaker disabled) but should restart to DRBD-sync data immediately (hence drbd enabled), the second as a precaution in case the remaining node also failed "somehow" in the meantime.

                      – rookie099
                      2 days ago















                      systemctl cat drbd.service | grep ^Restart= says Restart=no.

                      – rookie099
                      2 days ago







                      systemctl cat drbd.service | grep ^Restart= says Restart=no.

                      – rookie099
                      2 days ago















                      Yeah, that one shouldn't be set to restart. All the same, if you have the systemd unit enabled, it will be started on boot (which is not acceptable).

                      – Spooler
                      2 days ago





                      Yeah, that one shouldn't be set to restart. All the same, if you have the systemd unit enabled, it will be started on boot (which is not acceptable).

                      – Spooler
                      2 days ago




                      1




                      1





                      The default STONITH action is usually the most appropriate for your agent, unless you're solving some kind of issue by defining that action explicitly. As for the rest, yes.

                      – Spooler
                      2 days ago





                      The default STONITH action is usually the most appropriate for your agent, unless you're solving some kind of issue by defining that action explicitly. As for the rest, yes.

                      – Spooler
                      2 days ago




                      1




                      1





                      @rookie099 it really depends on your use-case. For example: Most of my PCMK clusters are set to STONITH = poweroff via IPMI, because then I have the time to investigate in a controlled manner before the node rejoins the cluster or starts. On one other cluster it is set to restart, but this one controls far less resources than the others, so there should not be anything breaking because of STONITH.

                      – Lenniey
                      yesterday





                      @rookie099 it really depends on your use-case. For example: Most of my PCMK clusters are set to STONITH = poweroff via IPMI, because then I have the time to investigate in a controlled manner before the node rejoins the cluster or starts. On one other cluster it is set to restart, but this one controls far less resources than the others, so there should not be anything breaking because of STONITH.

                      – Lenniey
                      yesterday


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Server Fault!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f949153%2fwhy-disable-drbd-in-pacemaker-cluster%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      An IMO inspired problem

                      Management

                      Investment