Monster VM Study: Moves 3 RAC Nodes Under 180 Seconds! Part 2

    By: George O'toole on Jan 21, 2014

    Continued from Part 1.

    Getting back to the Principled Technologies study, we tested the ability to use vMotion to move multiple VMs configured with Oracle Enterprise Linux (OEL), 156 GB of virtual RAM, and 16 vCPUs. For the Oracle DBA reading this article, The SGA was sized at 145 GB with 3,686 MB for the Process Global Area (PGA). The PGA is a memory area which stores user and control data for the exclusive use of a given Oracle server process. There is thus a PGA memory area for each server process. The sum of all the PGA memory is referred to as the total instance PGA. Several best practices techniques were used (e.g., huge pages) in the study to optimize database performance. In appendix A called, “System Configuration Information” the study provides great detail on most of the steps used to configure the tests, including the implementation of huge pages. The first of the three tests involved a single vMotion of one virtualized RAC node on server A to Server B.

    Figure 1: Diagram of our single vMotion test, where we migrated a single VM from host A to host B

    fig1.png

    As you can see from the Figure 1 above, the final state was server A with no virtual machines and Server B with two virtual machines. The results for this test included:

    Core Utilization for VMware vSphere host:

     

    fig2.png

     

    Looking at the above table we can see the vMotion of the RAC node took 130 seconds and added approximately 14% for Host A and 10% more CPU utilization for Host B. Quick definition: Bandwidth is the maximum throughput of a logical or physical communication path. During the 130 seconds of vMotion, Host A transmitted 10,618 Megabits per second (Mbps) over the network to Host B.  This test showed that a single very large virtualized RAC node supporting significant workload completed in just over 2 minutes with no data loss, and CPU utilization peaking at around 14%. There was an additional load on the server during the vMotion window, but this settled back down 90 seconds after the vMotion had completed.

    This test is a good example of using a rolling maintenance window to work on servers. The scenario involves moving a virtualized RAC node to a standby server so that maintenance can be performed on the production server. Once completed, the same process would be followed for the other two RAC nodes. Using the rolling maintenance window approach would mean approximately 260 seconds of vMotion, from the standby server and back to production server. Across all three servers, this totals to 780 seconds or 13 minutes of vMotion time without loss of availability of the RAC nodes to the business. Very impressive!

    The next test in the study vMotioned two of the three virtual machines as shown in the Figure below.

    Figure 2: Diagram of our double vMotion test, where we swapped two VMs from host B to host C

    fig3.png

    As shown in the above picture, the VM on server B was moved to server C and the VM on server C was moved to server B. Then final state was server B had the VM originally from server C and server C had the VM from server B. The results from the second test include:

    Core Utilization for VMware vSphere host:

     fig4.png

    The above table shows that the simultaneous vMotion of the two VMs took 155 seconds and added approximately additional CPU utilization of 22% for Host B and 21% for Host C. Let’s use a table to look at the network throughput in Mbps.

     

    fig5.png

    There seems to be a trend between test one and test two in that the throughput is peaking at approximately 10,500 Mbps. More bandwidth will increase throughput and hopefully reduce the amount of time for these vMotion tests. This test showed that moving two very large virtualized RAC nodes supporting significant OLTP workloads could be completed in just over 2.5 minutes with no data loss or unavailability. CPU utilization peaked at 22% additional load and settled back down 90 seconds after the vMotion had completed.

    Moving two virtualized Oracle RAC nodes would reduce a rolling window maintenance process from three steps down to two. In a two-step rolling window upgrade the following process is proposed:

    • Move two VMs simultaneous to the standby servers and back to product servers which would total approximately 310 seconds
    • Move the final VM to the standby server and back for 260 seconds

    Using this two-step rolling window approach reduces the total vMotion time to 570 seconds or 9.5 minutes. Compare this to the three step rolling window approach of moving each virtual machine which was estimated to take 13 minutes. Moving multiple virtual machines could thus offer substantial time savings to the business.

    The final test involves moving all three virtual machines as shown below.

    Figure 3: Diagram of our triple vMotion test, where we migrated the VMs from one host to another

    fig6.png

    As Figure 3 shows, The VM on server C moves to server A and the VM on server A moves to server B and the VM on server B moves to server C. We have nicknamed this the “merry-go-round” scenario. The results of this test are below:

    Core Utilization for VMware vSphere host:

     fig7.png

    The above table shows that simultaneous vMotion of the three VMs took 180 seconds, and added a maximum of 21% more CPU utilization across all the servers. Let’s use a table to look at the network throughput in Mbps.

     fig8.png

    The trend continues as the final triple vMotion test did not break the 10,500 Mbps ceiling. Moving all the VMs at once could apply to proactively avoiding a disaster or doing maintenance without deferring to the rolling window approach. In this case moving all three virtualized RAC nodes to three standby servers and back to production would require approximately 360 seconds of vMotion time or just 6 minutes. The advantage to the business is very little time is spent using vMotion and there is no loss of availability or data.

    Check back soon for the 3rd installment of this blog series, coming later this week. 

    Released: January 21, 2014, 12:47 pm | Updated: January 21, 2014, 12:56 pm
    Keywords: Announcements | Oracle Enterprise Linux | RAC | virtualizaion


    Independent Oracle Users Group
    330 N. Wabash Ave., Suite 2000, Chicago, IL 60611
    phone: 312-245-1579 | email: ioug@ioug.org

    Copyright © 1993-2017 by the Independent Oracle Users Group
    Terms of Use | Privacy Policy