Tags


RAID data scrubbing

First written onJuly 8, 2019
Last updated onOctober 8, 2022

Introduction #

RAID scrubbing checks and fixes errors in RAID arrays.

Please note that the source code and the steps described here are included in the automated-tasks repository.

While I was reading an Arch Wiki page, I found an AUR package that claims to run periodic RAID scrubbings on the hard drives. The original script and configuration is quite confusing so I decided to write my own. It lacks some features compared to the original one, but it does the job.

Script #

The only copyright notice I found is the one I have included in the script and I took it from the CentOS package at /usr/share/doc/mdadm-4.1/mdcheck. The original script that I translated into python is at /usr/sbin/raid-check.

The license of the original scripts and programs is undoubtedly GPL2+. See also the COPYING file in the rpm package.

#!/usr/bin/env python3

# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de>
#
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; either version 2 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    Author: Neil Brown
#    Email: <neilb@suse.com>
#
# Copyright (C) 2019 Franco Masotti <franco.masotti@live.com>

import configparser
import sys
import time
import os
import multiprocessing
import pathlib
import collections

# Constants.
STATUS_CLEAN='clean'
STATUS_ACTIVE='active'
STATUS_IDLE='idle'

class UserNotRoot(Exception):
    """The user running the script is not root."""

class NoAvailableArrays(Exception):
    """No available arrays."""

class NoSelectedArraysPresent(Exception):
    """None of the arrays in the configuration file exists."""

def get_active_arrays():
    active_arrays=list()
    with open('/proc/mdstat', 'r') as f:
        line = f.readline()
        while line:
            if STATUS_ACTIVE in line:
                active_arrays.append(line.split()[0])
            line = f.readline()

    return active_arrays

def get_array_state(array: str):
    return open('/sys/block/' + array + '/md/array_state', 'r').read().rstrip()

def get_sync_action(array: str):
    return open('/sys/block/' + array + '/md/sync_action', 'r').read().rstrip()

def run_action(array: str, action: str):
    with open('/sys/block/' + array + '/md/sync_action', 'w') as f:
        f.write(action)

def main_action(array:str):
    action=devices[array]
    go = True
    while go:
        if get_sync_action(array) == STATUS_IDLE:
            print ('running ' + action + ' on /dev/' + array + '. pid: ' + str(os.getpid>
            run_action(array,action)
            print ('finished pid: ' + str(os.getpid()))
            go = False
        if go == True:
            print ('waiting ' + array + ' to be idle...')
            time.sleep(timeout_idle_check)

if __name__ == '__main__':
    if os.getuid() != 0:
        raise UserNotRoot

    configuration_file = sys.argv[1]
    config = configparser.ConfigParser()
    config.read(configuration_file)
    max_concurrent_checks = int(config['DEFAULT']['max concurrent checks'])
    timeout_idle_check = int(config['DEFAULT']['timeout idle check'])
    devices = dict()
    for dev in config['devices']:
        devices[dev]=config['devices'][dev]

    active_arrays=get_active_arrays()
    dev_queue=collections.deque()
    if len(active_arrays) > 0:
        for dev in active_arrays:
            if pathlib.Path('/sys/block/' + dev + '/md/sync_action').is_file():
                state = get_array_state(dev)
                if state == STATUS_CLEAN or state == STATUS_ACTIVE or state == STATUS_IDLE :
                    try:
                        if devices[dev] != 'ignore' and dev in devices:
                            dev_queue.append(dev)
                    except KeyError:
                        pass

    if len(active_arrays) == 0:
        raise NoAvailableArrays
    if len(dev_queue) == 0:
        raise NoSelectedArraysPresent

    while len(dev_queue) > 0:
        for i in range(0,max_concurrent_checks):
            if len(dev_queue) > 0:
                ready = dev_queue.popleft()
                p = multiprocessing.Process(target=main_action, args=(ready,))
                p.start()
        p.join()

Configuration file #

This should be very clear.

[DEFAULT]
# The maximum number of concurrent processes.
max concurrent checks = 2

# In seconds.
timeout idle check = 10

# key = RAID device name without the '/dev/' prefix.
# value = 'check', 'repair', 'idle', 'ignore'.
# The special value of 'ignore' will make the script skip the device.
# Absent devices are ignored.
[devices]
md1 = check
md2 = ignore
md3 = check
md4 = check
md5 = check
md6 = check
md10 = check
md21 = ignore

Systemd #

The python script needs to be run by root. Follow the instructions reported in the previous post. Save the python script as /home/jobs/scripts/by-user/root/mdadm_check.py and its configuration file as /home/jobs/scripts/by-user/root/mdadm_check.conf.

Service unit file #

[Unit]
Description=mdadm check

[Service]
Type=simple
ExecStart=-/home/jobs/scripts/by-user/root/mdadm_check.py /home/jobs/scripts/by-user/root/mdadm_check.conf
User=root
Group=root

[Install]
WantedBy=multi-user.target

Timer unit file #

See the previous post.

[Unit]
Description=Once a month check mdadm arrays

[Timer]
OnCalendar=Monthly
Persistent=true

[Install]
WantedBy=timers.target

~

Have fun :)