Scaling Oracle SLOB to 7M IOPS and 55.4GB/s Throughput with Datrium – and the Latency FAKE NEWS!

My previous Oracle article (here) was focused on single-instance performance, and at the end of the post I alluded to the scale-up potential. How many Oracle instances can you run on the same platform?

In this article, I demonstrate how we can keep adding Oracle instances, continuously, without disrupting performance.

With Datrium Oracle application datasets are stored on Flash devices on the host where the Oracle VM is running, and a secondary copy of the data is synchronously 3-way committed to nodes dedicated to persistently storing data, aka datanodes. That means all Read IO is local to the host, and only the Write IO traffic goes over the network, therefore decreasing Read latencies.

SLOB 80/20 – 3 instances and 3 servers
The picture below demonstrates Oracle SLOB instances running with an 80% Read, and 20% Write profile, scaling to three instances and three hosts. In this example, the system delivers 165K IOPS and 1.3GB/s throughput, with an average VM-Level latency of 1.7ms. (I copied this test profile from a Pure Storage paper)

[Update] I have been privately asked about the server hardware used for this test. I used three Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz with 2 regular Intel SSDs each.

Oracle SLOB test with 3 instances on Datrium

Upper Limits
However, considering Datrium upper system limit of 128 hosts, and assuming the exact same workload, we can estimate that Datrium can handle approximately 7M IOPS and 55.4GB/s throughput with the same VM-Level latency of 1.7ms.

You: – Yeah! But, you are showing us the system handling only 3 servers.
Me: – You will need to take my word for that.

Just joking, because despite not having 128 servers to demonstrate this particular Oracle SLOB workload we have previously partnered with IOMark, StorageReview, Dell, and Intel to demonstrate the ridiculous high-scale results Datrium can achieve.

Datrium system limits

Latency FAKE NEWS

I also want to address another important point; the application storage latency, and how the storage industry continues to manipulate results and mislead customers.

First of all, as you can see we openly publish the VM-level latency being delivered by the system because at the end of the day the latency that matters is the latency that applications and users experience.

The problem is that storage vendors, especially SAN vendors, continue to demonstrate latencies at the SAN Array controller – and this is the reason why I decided to mimic this Pure Storage benchmark, where they claim to provide 0.21ms latency, without qualifying where the latency is measured.

Pure Storage Oracle SLOB benchmark

However, in another public post where they talk about their Pure1 application we can clearly see how a 1.08ms latency measured at the SAN array controller translate to 7.40ms at the VM disk-level. Consequently, using the same behaviour, their Oracle SLOB benchmark latency is likely neighboring 6.53ms.

Pure1 demonstrates the latency increase across layers

I’m glad to see vendors starting to address application latency, and not what is seen by SAN array controllers, and hopefully, we will see more vendors moving in this direction.

It’s very telling how even the most modern Flash arrays cannot match Datrium performance, scalability and latency – and I am not even mentioning additional high-valued features like integrated data protection and orchestrated disaster recovery.

Lastly, as always, Datrium runs with all data services turned on, including Checksumming, inline Erasure Coding, inline Compression, and inline Deduplication.

Reference paper from Pure storage (here)

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Datrium AllFlash + Oracle measured with SLOB

Datrium is the perfect platform for running Oracle databases. Solutions can start with a single server and a single datanode and scale up to 10 data nodes and 128 servers as part of a single management domain, and single namespace.

For this test, I am using SLOB (Silly Little Oracle Benchmark) create by Kevin Closson, a tool focused on generating Oracle I/O workloads to stress the storage infrastructure.

Instead of trying to pursue complex VM and database configurations as I have seen other vendors do, I have opted for a more simplistic approach that doesn’t involve dozens of disks, LUNS or in-guest iSCSI protocol, which in production environments would be difficult to support. I even dropped Oracle ASM disk groups.

VM and OS Configuration

  • 32GB RAM and 12 vCPU
  • Oracle 12.2.0.1 Enterprise Edition – Single Instance
  • VMware ESXi, 6.7.0, 8941472
  • 1 x 100GB vDISK for Linux CentOS/7
  • 6 x 250GB vDISK PVSCSI w/ LVM aggregation w/ XFS for the Oracle database
  • 1 Terabyte SLOB database

Hardware Configuration

  • PowerEdge R930 – E7-8890 v4 @ 2.20GHz / 2016
  • 8 x SATA Samsung GC57 (MZ7LM240HMHQ0D3/2016)
  • 1 x Datrium Datanode F12X2 2x25G-23TB

The configurations above represent in its totality one server, one datanode, and one virtual machine. Datanodes are responsible for receiving write IOs and storing data permanently, while a copy of the data is left, in a de-duplicated and compressed form, in each server for future local data read. Further, this is an old 2016 hardware configuration, like what most organizations would already have in its data centers.

SLOB

SLOB provides a multitude of possible configurations, but I decided to use what it seems to be the most common configuration used by vendors – 70:30 Read/Write ratio, running for 2 consecutive hours.

slob.conf
 PDATE_PCT=30
 RUN_TIME=7200
 SCAN_TABLE_SZ=1M
 WORK_UNIT=64
 REDO_STRESS=LITE
 THREADS_PER_SCHEMA=1

(Update) In the original post I forgot to mention the SLOB execution config. 24 users were used to run the stress test and I used the following command: sh ./runit.sh 24

Oracle

Here are the few Oracle initialization parameters that I changed for this test. All other settings remained the same as the default installation process.

filesystemio_options=setall
db_files=2000
processes=8000
shared_pool_size=4G
db_cache_size=1536M
parallel_max_servers=0
pga_aggregate_target=9G
db_block_checksum=false

Linux Kernel

Here are the Linux kernel parameters that I changed or added for this test.

fs.aio-max-nr = 1048576
fs.file-max = 6815744
kernel.shmall = 1073741824
kernel.shmmax = 4398046511104
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
kernel.panic_on_oops = 1
net.ipv4.ip_local_port_range = 9000 65500
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
vm.swappiness = 0

Results

Below are the VM level metrics as seen by the hypervisor. At the peak, we saw SLOB/Oracle producing 68,336 IOPS with an average latency of 0.99ms. We also recognize that the number of IOPS is fairly constant over the two-hour runtime.

Towards the end of the workload, we also notice a couple of latency spikes, with the maximum average being only 3ms, likely when Oracle starts to pre-fetch data. I noticed that some vendors like to disable the native and hidden pre-fetch function in their tests to obtain lower latency results, but I decided to not game the numbers because this is more likely to be the real-world scenario, where pre-fetching is enabled.

During the test, the host CPU utilization remained at only 30%, that being inclusive of both Oracle and Datrium software execution.

I am planning to run an extra SLOB test with newer hardware and better SSDs (or even NVMe) to understand how this workload would benefit – both Oracle and Datrium software are likely to improve with the added CPU and SSD performance.

Only the Beginning

To me, the most impressive isn’t the number of IOPS that already beat public SLOB/HCI benchmarks I have seen online, but rather the picture below illustrating that I could add another 61 similar servers with a similar workload to the DVX system while mantaining the same performance, before I need to add another datanode. Remember, up to 10 datanodes!

This picture demonstrates that maintaining local reads to each server and only sending write IOs to data nodes, in a north-south fashion, yields excellent advantages.

Lastly, as always, Datrium runs with all data services turned on, including Checksumming, inline Erasure Coding, inline Compression, and inline Deduplication.

Here is the Oracle EM screenshot

(Update) Out of curiosity I decided to run the workload with the Oracle initialization parameter db_block_checksum set to it’s default option (typical). db_block_checksum determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. The results were fairly similar, being 67,589 IOPS with an average latency of 1.0ms.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Year in Review 2018

1) Long Thread – For me, this was a good year for blogging, especially for getting deeper into particular technologies. I use my blogging as a way to share, but most importantly as I way to enable me to educate myself.

2) Database benchmarking was a big part of my blogging, including breakthroughs with PostgreSQL and achieving 4.3 Million TPS with 1 GB RAM https://bit.ly/2BLqput and being able to show how SDS architectures provide the ability to improve and scale quickly.

3) I also dabbed once again into Healthcare at @HIMSS timeframe and created the Maslow’s Hierarchy Of Healthcare IT Infrastructure https://bit.ly/2VgT8zZ where data security lies at the bottom of the pyramid, followed by system resiliency.

4) I also set the framework to demonstrate that unless one can beat the speed of light, it’s not possible to defeat the data gravity hypothesis. I explained how Datrium is the data anti-gravity agent. https://bit.ly/2QS1zTS

5) Coming from EUC world, as of now, I would say there’s no better platform for VDI than Datrium. I wrote a 2018 review for my Hands Down the Ultimate VDI Platform https://bit.ly/2CE10EE but also worked with @SimonLong_ on a deep whitepaper.

6) Later in the year, I started working intensely with Docker and Kubernetes, and wrote many articles on the subject; including running with VMware https://bit.ly/2GWOvbB and running bare-metal with Datrium.

7) On the competitive realm, I felt the need to address FUD posted by vendors, including Datrium’s ability to scale to 138 nodes https://bit.ly/2SqaLeW and the Unique and Original Datrium Data Locality https://bit.ly/2ESDbuj and this last one is my favorite for the year.

8) Some articles were spun-up from customer interactions, like the “Shocking datacenter reality?” https://bit.ly/2QXTSvF that opened my eyes to the pain IT organizations are going through with a lack of standards and integrations.

9) There were many more articles, but all in all, it was a great year! I’ll finish this thread with a quote from Mark Twain to incentivize others to learn and share.

The secret of getting ahead is getting started

– Mark Twain

10) Happy New Year !!!
(EOM)

Load more