You are here

Feed aggregator

Too much disk IO on sda in RAID10 setup - part 2

blog.windfluechter.net - Tue, 2019-01-08 20:07

Some days ago I blogged about my issue with one of the disks in my server having a high utilization and latency. There have been several ideas and guesses what the reason might be, but I think I found the root cause today:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     50113         60067424
# 2  Short offline       Completed without error       00%      3346   

After the RAID-Sync-Sunday last weekend I removed sda from my RAIDs today and started a "smartctl -t long /dev/sda". This test was quickly aborted because it already ran into an read error after just a few minutes. Currently I'm still running a "badblocks -w" test and this is the result so far:

# badblocks -s -c 65536 -w /dev/sda
Testing with pattern 0xaa: done
Reading and comparing: 42731372done, 4:57:27 elapsed. (0/0/0 errors)
42731373done, 4:57:30 elapsed. (1/0/0 errors)
42731374done, 4:57:33 elapsed. (2/0/0 errors)
42731375done, 4:57:36 elapsed. (3/0/0 errors)
 46.82% done, 6:44:41 elapsed. (4/0/0 errors)

Long story short: I already ordered a replacement disk!

But what's also interesting is this:

I removed the disk today at approx. 12:00 and you can see the immediate effect on the other disks/LVs (the high blue graph from sda shows the badblocks test), although the RAID10 is now in degraded mode. Interesting what effect (currently) 4 defect blocks can have to a RAID10 performance without smartctl taking notice of this. Smartctl only reported an issue after issueing the selftest. It's also strange that the latency and high utilization slowly increased over time, like 6 months or so.

Kategorie: 
 

Too much disk IO on sda in RAID10 setup

blog.windfluechter.net - Sat, 2019-01-05 17:37

I have a RAID10 setup with 4x 2 TB WD Red disks in my server. Although the setup works fairly well and has enough throughput there is one strange issue with that setup: /dev/sda has more utilzation/load than the other 3 disks. See the blue line in the following graph which represents utilization by week for sda:

As you can see from the graphs and from the numbers below sda has a 2-3 times higher utilization than sdb, sdc or sdd, especially when looking at disk latency graph by Munin:

Although the graphs are a little confusing you can easily spot the big difference from the below values. And it's not only Munin showing this strange behaviour of sda, but also atop: 

Here you see that sda is 94% busy although the writes to the disks are a little bit lower than on the other disks. The screenshot of atop was before I moved MySQL/MariaDB to my NVMe disk 4 weeks ago. But you can also spot that sda is slowing down the RAID10.

So the big question is: why is utilization and latency of sda that high? it's the same disk model as the other disks. All disks are connected to a Supermicro X9SRi-F mainboard. The first two SATA ports are 6 Gbit/s, the other 4 ports are 3 Gbit/s ports:

sda sdb sdc sdd
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301414725
LU WWN Device Id: 5 0014ee 65887fe2c
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jan  5 17:24:58 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301414372
LU WWN Device Id: 5 0014ee 65887f374
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jan  5 17:27:01 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301411045
LU WWN Device Id: 5 0014ee 603329112
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jan  5 17:30:15 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301391344
LU WWN Device Id: 5 0014ee 60332a8aa
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jan  5 17:30:26 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

There is even the same firmware version of the disks. Usually I would have expected a slower disk IO from the 3 Gbit/s disks (sdc & sdd), but not from sda. All disks are configured in BIOS to use AHCI.

I cannot explain why sda has higher latency and more utilization than the other disks. Any ideas are welcome and appreciated. You can also reach me in the Fediverse (Friendica: https://nerdica.net/profile/ij & Mastodon: @ij">https://nerdculture.de/) or via XMPP at ij@nerdica.net.

Kategorie: 
 

Adding NVMe to Server

blog.windfluechter.net - Sat, 2018-12-15 18:45

My server runs on a RAID10 of 4x WD RAID 2 TB disks. Basically those disks are fast enough to cope with the disk load of the virtual machines (VMs). But since many users moved away from Facebook and Google, my Friendica installation on Nerdica.Net has a growing user count putting a large disk I/O load with many small reads & writes on the disks, resulting a slowing down the general disk I/O for all the VMs and the server itself. On mdraid-sync-Sunday this month the server needed two full days to sync its RAID10.

So the idea was to remove the high disk I/O load from the rotational disks the something different. For that reason I bought a Samsung Pro 970 512 GB NVMe disk and a matching PCIe 3.0 card to be put into my server in the colocation. On Thursday the Samsung has been installed by the rrbone staff in the colocation. I moved the PostgreSQL and MySQL databases from the RAID10 to the NVMe disk and restarted services again.

Here are some results from Munin monitoring: 

Disk Utilization

Here you can see how the disk utilization dropped after NVMe installation. The red coloured bar symbolizes the average utilization on RAID10 disks and the green bar symbolizes the same RAID10 after the databases were moved to the NVMe disk. There's roughly 20% less utilization now, whch is good.

Disk Latency

Here you can see the same coloured bars for the disk latency. As you can see the latency dropped by 1/3 now.

CPU I/O wait

The most significant graph is maybe the CPU graph where you can see a large portion of iowait of the CPUs. This is no longer true as there is apparently no significant iowait anymore thanks to the low latency and high IOPS nature of SSD/NVMe disks.

Overall I cannot confirm that adding the NVMe disk results in a significant faster page load of Friendica or Mastodon, maybe because other measurements like Redis/Memcached or pgbouncer already helped a lot before the NVMe disk, but it helps a lot with general disk I/O load and improving disk speeds inside of the VMs, like for my regular backups and such.

Ah, one thing to report is: in a quick test pgbench reported >2200 tps on NVMe now. That at least is a real speed improvement, maybe by order of 10 or so.

Kategorie: 

Xen & Databases

blog.windfluechter.net - Sat, 2018-10-13 21:21

I'm running PostgreSQL and MySQL on my server that both serve different databases to Wordpress, Drupal, Piwigo, Friendica, Mastodon, whatever...

In the past the databases where colocated in my mailserver VM whereas the webserver was running on a different VM. Somewhen I moved the databases from domU to dom0, maybe because I thought that the databases would be faster running on direct disk I/O in the dom0 environment, but can't remember the exact rasons anymore.

However, in the meantime the size of the databases grew and the number of the VMs did, too. MySQL and PostgreSQL are both configured/optimized to run with 16 GB of memory in dom0, but in the last months I experienced high disk I/O especially for MySQL and slow I/O performance in all the domU VMs because of that.

Currently iotop shows something like this:

Total DISK READ :     131.92 K/s | Total DISK WRITE :    1546.42 K/s
Actual DISK READ:     131.92 K/s | Actual DISK WRITE:       2.40 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 6424 be/4 mysql       0.00 B/s    0.00 B/s  0.00 % 60.90 % mysqld
18536 be/4 mysql      43.97 K/s   80.62 K/s  0.00 % 35.59 % mysqld
 6499 be/4 mysql       0.00 B/s   29.32 K/s  0.00 % 13.18 % mysqld
20117 be/4 mysql       0.00 B/s    3.66 K/s  0.00 % 12.30 % mysqld
 6482 be/4 mysql       0.00 B/s    0.00 B/s  0.00 % 10.04 % mysqld
 6495 be/4 mysql       0.00 B/s    3.66 K/s  0.00 % 10.02 % mysqld
20144 be/4 postgres    0.00 B/s   73.29 K/s  0.00 %  4.87 % postgres: hubzilla hubzi~
 2920 be/4 postgres    0.00 B/s 1209.28 K/s  0.00 %  3.52 % postgres: wal writer process
11759 be/4 mysql       0.00 B/s   25.65 K/s  0.00 %  0.83 % mysqld
18736 be/4 mysql       0.00 B/s   14.66 K/s  0.00 %  0.17 % mysqld
21768 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.02 % [kworker/1:0]
 2922 be/4 postgres    0.00 B/s   69.63 K/s  0.00 %  0.00 % postgres: stats collector process

MySQL data site is below configured max memory size for MySQL, so everything should more or less fit into memory. Yet, there is still a large amount of disk I/O by MySQL, much more than by PostgreSQL. Of course there is much I/O done by writes to the database.

However, I'm thinking of changing my setup again back to domU based database setup again, maybe one dedicated VM for both DBMS' or even two dedicated VMs for each of them? I'm not quite sure how Xen reacts to the current work load?

Back in the days when I did 3D computer graphic I did a lot of testing with different settings in regards of priorities and such. Basically one would think that giving the renderer more CPU time would speed of the rendering, but this turned out to be wrong: the higher the render tasks priority was, the slower the rendering got, because disk I/O (and other tasks that were necessary for the render task to work) got slowed down. When running the render task at lowest priority all the other necessary tasks could run on higher speed and return the CPU more quickly, which resulted in shorter render times.

So, maybe I experience something similar with the databases on dom0 here as well: dom0 is busy doing database work and this slows down all the other tasks (== domU VMs). When I would move databases back to domU this would enable dom0 again to better do its basic job of taking care of the domUs?

Of course, this is also a quite philosophical question, but what is the recommended setup? Is it better to separate the databases in two different VMs or just one? Or is running the databases on dom0 the best option?

I'm interested in your feedback, so please comment! :-)

Kategorie: 
 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer