Correct Alphanumeric Sort

If you’re in a node based program like Nuke, Houdini, or Katana, you’ll notice when you create a new node the program tries to name the node with a number at the end, thus avoiding namespace issues. The more nodes of the same name that get created, the higher the number tacked on at the end becomes. So when developing tools for these programs, there is definitely going to be a point where you will have to list nodes in order.

On the surface, this doesn’t seem like that big of an issue. They are just alphanumeric strings, after all. If you do a normal sort with alphanumeric strings, the sort doesn’t behave the way you would expect.

lst = ["var1", "var2", "var3", "var10", "var11"]
lst.sort()
print(lst)
['var1', 'var10', 'var11', 'var2', 'var3']

We humans know 10 is greater than 2, but if that number is in an alphanumeric string, the sort doesn’t care; it’s going to compile all the strings one character at a time, regardless what those characters “mean”.

So when you’re in a situation that requires you to pay attention to concurrent numbers, you need to supply a key to the sort that takes numbers into account.

import re


def alphanum_key(string, case_sensitive=False):
    """Turn a string into a list of string and number chunks.
    Using this key for sorting will order alphanumeric strings properly.
    
    "ab123cd" -> ["ab", 123, "cd"]
    
    Args:
        string (str): Given string.
        case_sensitive (bool): If True, capital letters will go first.
    Returns:
        list[int|str]: Mixed list of strings and integers in the order they
            occurred in the given string.
    """
    def try_int(str_chunk, cs_snstv):
        """Convert string to integer if it's a number.
        In certain cases, ordering filenames takes case-sensitivity 
        into consideration.
        
        Windows default sorting behavior:
        ["abc1", "ABC2", "abc3"]
            
        Linux, Mac, PySide, & Python default sorting behavior:
        ["ABC2", "abc1", "abc3"]
            
        Case-sensitivity is off by default but is provided if the developer
        wishes to use it.
        
        Args:
            str_chunk (str): Given string chunk.
            cs_snstv (bool): If True, capital letters will go first.
        Returns:
            int|str: If the string represents a number, return the number.
                Otherwise, return the string.
        """
        try:
            return int(str_chunk)
        except ValueError:
            if cs_snstv:
                return str_chunk
            # Make it lowercase so the ordering is no longer case-sensitive.
            return str_chunk.lower()
            
    return [try_int(chunk, case_sensitive)
            for chunk in re.split('([0-9]+)', string)]

Now passing alphanum_key() as the sorting key gets us exactly what we were expecting. We’ll mix it up to make sure it’s working.

lst = ["var2", "var1", "var10", "var3", "var11"]
lst.sort(key=lambda s: alphanum_key(s))
print(lst)
['var1', 'var2', 'var3', 'var10', 'var11']