Tags


Automatic synchronization of media files from SD cards

First written onOctober 2, 2018
Last updated onOctober 8, 2022

Backups, yet again #

Continuing to talk about backups I want to share a script I have written for the sole purpose of copying the content of mass storage devices that contain media files. This applies for example to SD cards used in digital cameras, but it can be used for any removable block device.

Automation #

Use of metadata #

The script that I called auto_media_backup.sh works on a simple principle: it uses the metadata of the original file to compute the destination directory of the copied file, like this:

I opted for the ${DST_PATH}/${uuid}/${year}/${month}/${filename} scheme.

Block device UUID monitoring #

Another important aspect is that when I insert the card in the reader the synchronization starts automatically:

[...] udevadm monitor --udev -s block [...]

To avoid synchronizing every removable block device I opted for a whitelisting method based on the UUID we computed earlier. A code extract should clarify the situation:

[...]
for uuid in ${WHITELIST_UUID}; do
    if [ "${uuid}" = "$(get_uuid "${devname}")" ]; then
        sync_started_message "${uuid}"
        [...]

Logging #

There are four possible ways to know if the operation started and how it finished. Each logging system is independent from the other so it possibile to activate or deactivate them singularly. These systems are:

If you use the mail facility make sure for it to be already configured and working. In my case I used the s-nail package along with msmtp as the SMTP client.

Listening for beeps is a very immediate action and it does not require to be in front of the computer’s monitor or checking the email. Think about how microwaves, fridges, ovens, washing machines and other domestic appliances work.

The script logs when a synchronization starts, finishes or fails and does not know about its progress.

Loop #

The script will continue to monitor for new devices when in idle and does not quit even if the previous synchronization has failed. If you need to edit the configuration file be sure to reload the script.

Synchronization #

Rsync was an obvious choice for this case because it provides fast, incremental and secure file copying just with a bunch of options.

The script #

I saved the following code listing as auto_media_backup.sh. Due to the fact that a small part of the script was took and modified from the Arch Wiki, this code is licensed under the GNU Free Documentation License 1.3 or later. Other small parts were took from a github gist which was put in the public domain.

#!/bin/bash

# auto_media_backup.sh
#
# Copyright (C)  2018  Franco Masotti <franco.masotti@live.com>.
# Permission is granted to copy, distribute and/or modify this document
# under the terms of the GNU Free Documentation License, Version 1.3
# or any later version published by the Free Software Foundation;
# with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
# A copy of the license is included in the section entitled "GNU
# Free Documentation License".

# Put the full path if you are calling the script from a different directory.
. ./auto_media_backup.conf

# Check that all the software is installed.
# All the tools used in this script should be pretty standard.
check_tools()
{
    which \
bash \
mail \
lsblk \
grep \
awk \
udevadm \
udisksctl \
stat \
stdbuf \
rsync \
rm \
beep \
exiftool
} 1>&2

get_uuid()
{
    local kernel_device_name="${1}"

    lsblk -o name,uuid | grep "${kernel_device_name/\/dev\//}" \
        | awk '{print $2}'
}

pathtoname()
{
    local kernel_device_name="${1}"

    udevadm info -p /sys/"${kernel_device_name}" \
        | awk -v FS== '/DEVNAME/ {print $2}'
}

get_mountpoint()
{
    local kernel_device_name="${1}"

    udisksctl info --block-device "${kernel_device_name}" \
        | grep "MountPoints" | awk '{$1=""; print $0}' | sed -e 's/^[ \t]*//'
}

log()
{
    local message="${1}"
    local log_file_preamble="${2}"
    local log_email_subject="${3}"
    local log_beep_args="${4}"

    if [ "${LOG_TO_BEEP}" = "true" ]; then
        beep $log_beep_args
    fi
    if [ "${LOG_TO_STDOUT}" = "true" ]; then
        echo -e "${message}"
    fi
    if [ "${LOG_TO_FILE}" = "true" ]; then
        echo -e "${log_file_preamble} ${message}" >> "${LOG_FILE}"
    fi
    if [ "${LOG_TO_EMAIL}" = "true" ]; then
        echo "${message}" | mail -r "${EMAIL_SENDER}" -s "${log_email_subject}" "${EMAIL_RECIPIENT}"
    fi
}

sync_started_message()
{
    local uuid="${1}"

    message="Starting sync for "${uuid}""
    log "${message}" \
        "[$(date '+%Y-%m-%d, %T') START auto_media_backup]" \
        "Media transfer starting" \
        "-f 1000 -l 400 -D 20"
}

sync_completed_message()
{
    local uuid="${1}"
    local start="${2}"
    local end="${3}"
    local number_of_files="${4}"

    message="Sync successful for "${uuid}", in $(( ($end-$start) ))s. $synced_files synced files"
    log "${message}" \
        "[$(date '+%Y-%m-%d, %T') OK auto_media_backup]" \
        "Media transfer complete" \
        "-r 3 -f 2000 -l 400 -D 20"
}

sync_failed_message()
{
    local uuid="${1}"
    local retval="${2}"

    message="Sync for "${uuid}" failed with ${retval}"
    log "${message}" \
        "[$(date '+%Y-%m-%d, %T') FAILED auto_media_backup]" \
        "Media transfer failed" \
        "-f 500 -r 4 -d 200"
}

# Since we are going to use the shoot date to organize the directories
# it needs to be computed using EXIF data.
# If the file does not have EXIF data with coherent date field. Use
# filesystem timestamps instead.
# See https://en.wikipedia.org/wiki/Comparison_of_file_systems#Metadata
# for a list of supported timestamps for the most common filesystems.
# 0. Use EXIF data DateTimeOriginal
# 1. Use EXIF data MediaCreateDate
# 2. Use last modification time
# 3. Use last access time
# 4. Bail out
compute_media_date()
{
    local media="${1}"

    computed_date="$(exiftool -quiet -s -s -s -tab -dateformat "%Y-%m-%d" \
        -DateTimeOriginal "${media}")"
    if [ -z "${computed_date}" ]; then
        computed_date="$(exiftool -quiet -s -s -s -tab -dateformat "%Y-%m-%d" \
        -MediaCreateDate "${media}")"
    fi
    if [ -z "${computed_date}" ]; then
        computed_date="$(stat -c%y "${media}" | awk '{print $1}')"
    fi
    if [ -z "${computed_date}" ]; then
        computed_date="$(stat -c%x "${media}" | awk '{print $1}')"
    fi
    if [ -z "${computed_date}" ]; then
        return 1
    fi

    echo "${computed_date}"
}

compute_media_dst_top()
{
    local computed_date="${1}"
    local dst_base_path="${2}"

    local month="$(date -d ${computed_date} "+%m")"
    local year="$(date -d ${computed_date} "+%Y")"

    echo ""${dst_base_path}"/"${year}"/"${month}""
}

compute_media_dst()
{
    local media_base_dst="${1}"

    echo "${media_base_dst}/"$(basename "${media}")""
}

rsync_media()
{
    local src="${1}"
    local dst="${2}"

    mkdir -p "${dst_top}"
    rsync -ab --backup-dir="${dst_top}"_backup "${src}" "${dst}"
}

# See https://en.wikipedia.org/wiki/Design_rule_for_Camera_File_system
get_media()
{
    local src="${1}"
    local dst_base_path="${2}"

    # See https://gist.github.com/jvhaarst/2343281
    # which was put in public domain.
    num_of_files=0
    for media in $(find "${src}" -not -wholename "*._*" -iname "*.JPG" \
-or -iname "*.JPEG" -or -iname "*.CRW" -or -iname "*.THM" -or -iname "*.RW2" \
-or -iname '*.ARW' -or -iname "*AVI" -or -iname "*MOV" -or -iname "*MP4" \
-or -iname "*MTS" -or -iname "*PNG"); do
        dst_top="$(compute_media_dst_top "$(compute_media_date "${media}")" "${dst_base_path}")"
        dst="$(compute_media_dst "${dst_top}" "${media}")"
        rsync_media "${media}" "${dst}" || return 1
        num_of_files=$(($num_of_files+1))
    done
    echo "${num_of_files}"
}

# DANGEROUS. Disabled by default.
delete_media()
{
    local src="${1}"

    echo "NOOP. Source file deletion is disabled because the author does not \
want to be liable for any data loss"
    echo "Edit the \"delete_media\" function to enable it"

    # rm -rf "${src}"/*
}

loop()
{
    local event="${1}"
    local devpath="${2}"

    if [ "${event}" = add ]; then
        devname=$(pathtoname "${devpath}")
        for uuid in ${WHITELIST_UUID}; do
            if [ "${uuid}" = "$(get_uuid "${devname}")" ]; then
                sync_started_message "${uuid}"
                start=$(date +%s)
                udisksctl mount --block-device "${devname}" --no-user-interaction \
                    || { sync_failed_message "${uuid}" "$?"; return 1; }
                mountpoint="$(get_mountpoint "${devname}")"
                final_dst_path=""${DST_PATH}"/"${uuid}""
                synced_files=$(get_media "${mountpoint}" "${final_dst_path}") \
                    || { sync_failed_message "${uuid}" "$?"; return 1; }

                # Just in case.
                sync

                if [ "${DELETE_SRC_MEDIA_ON_SYNC_SUCCESS}" = "true" ]; then
                    delete_media "${mountpoint}"
                fi

                udisksctl unmount --block-device "${devname}" --no-user-interaction
                end=$(date +%s)
                sync_completed_message "${uuid}" "${start}" "${end}" "${synced_files}"
            fi
        done
    fi
}

# Original source for the automount part:
# https://wiki.archlinux.org/index.php/Udisks#udevadm_monitor
check_tools || exit 1

printf "%s\n" "Monitoring for these UUIDs:"
printf "%s\n" "==========================="
for uuid in ${WHITELIST_UUID}; do
    printf "%s\n" "${uuid}"
done

# Note that this is a loop.
stdbuf -oL -- udevadm monitor --udev -s block | while read -r -- _ _ event devpath _; do
    loop "${event}" "${devpath}"
done

The configuration file #

You must save and edit the following as auto_media_backup.conf. This file must be placed in the same directory as the script. Variables should be self explanatory.

Please note that the script does not contain any parsing concerning these variables. You, as the user, must take care of this.

# Put your UUIDs separated by a white space character
WHITELIST_UUID="0000-0001 0000-0002"

DST_PATH="/home/user/media_backup"

LOG_TO_STDOUT="true"

LOG_TO_FILE="true"
LOG_FILE="/var/log/auto_media_backup.log"

LOG_TO_EMAIL="true"
EMAIL_SENDER="Computer <your.email@address.com>"
# You can use aliases if your mailing system is set up correctly.
EMAIL_RECIPIENT="all"

LOG_TO_BEEP="true"

# If you want to "recycle" your devices set the following to true.
# See the script for a disclaimer and how to really enable it.
DELETE_SRC_MEDIA_ON_SYNC_SUCCESS="false"

Running #

To run the script in the background simply do:

$ chmod +x auto_media_backup.sh
$ ./auto_media_backup.sh &

Future steps #

~

Till next time.