March 27, 2013

A bashism a week: substrings (dynamic offset and/or length)

Last week I talked about the substring expansion bashism and left writing a portable replacement of dynamic offset and/or length substring expansion as an exercise for the readers.

The following was part of the original blog post, but it was too long to have everything in one blog post. So here is one way to portably replace said code.

Let's consider that you have the file name foo_1.23-1.dsc of a given Debian source package; you could easily find its location under the pool/ directory with the following non-portable code:
file=foo_1.23-1.dsc
echo ${file:0:1}/${file%%_*}/$file

Which can be re-written with the following, portable, code:
file=foo_1.23-1.dsc
echo ${file%${file#?}}/${file%%_*}/$file

Now, in the Debian archive source packages with names with the lib prefix are further split, so the code would need to take that into consideration if file is libbar_3.2-1.dsc.

Here's a non-portable way to do it:
file=libbar_3.2-1.dsc
if [ lib = "${file:0:3}" ]; then
    length=4
else
    length=1
fi

# Note the use of a dynamic length:
echo ${file:0:$length}/${file%%_*}/$file

While here's one portable way to do it:
file=libbar_3.2-1.dsc
case "$file" in
    lib*)
        length=4
    ;;
    *)
        length=1
    ;;
esac

length_pattern=
while [ 0 -lt $length ]; do
    length_pattern="${length_pattern}?"
    length=$(($length-1))
done

echo ${file%${file#$length_pattern}}/${file%%_*}/$file

The idea is to compute the number of interrogation marks needed and use them where needed. Here are two functions that can replace substring expansion as long as values are not negative (which are also supported by bash.)

genpattern() {
    local pat=
    local i="${1:-0}"

    while [ 0 -lt $i ]; do
        pat="${pat}?"
        i=$(($i-1))
    done
    printf %s "$pat"
}

substr() {
    local str="${1:-}"
    local offset="${2:-0}"
    local length="${3:-0}"

    if [ 0 -lt $offset ]; then
        str="${str#$(genpattern $offset)}"
        length="$((${#str} - $length))"
    fi

    printf %s "${str%${str#$(genpattern $length)}}"
}

Note that it uses local variables to avoid polluting global variables. Local variables are not required by POSIX:2001.

Enough about substrings!

Remember, if you rely on non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time.

No comments:

Post a Comment