Skip to content

Module ciocore.file_utils

Functions

separate_path

Source
def separate_path(path, no_extension=False):

    dirpath, filename = os.path.split(path)

    if no_extension:
        basename = filename
        extension = ""
    else:
        basename, extension = os.path.splitext(filename)
    return dirpath, basename, extension
separate_path(path, no_extension=False)

Seperate the given path into three pieces:

1. The directory (if any)
2. The base filename (mandatory)
3. The file extension (if any)

For example, given this path: "/animals/fuzzy_cat.jpg" return "/animals", "fuzzy_cat", ".jpg"

The path argument may be a full filepath (including the directory) or just the name of the file.

Note that there is no way to know with 100% certainty that if a file name has a period in it that the characters that follow that period are the file extension. By default, this function will assume that all files that are passed into it have a file extension, and the extension is identified by the last period in the file. In cases where a file does not have a file extension, this must be indicated to this function by setting the no_extension argument to be True.

An example that illustrates the issue: A file named "2942_image.10312". The "10312" could either represent a frame number or an extension. There is no way to know for sure. By default, the function will assume that the "10312" is an extension. Override this behavior by setting the no_extensio arg to True.

process_dependencies

Source
def process_dependencies(paths):

    dependencies = {}
    for path in paths:

        try:
            process_upload_filepath(path)
            dependencies[path] = None
        except exceptions.InvalidPathException as e:
            logger.debug("%s", e)
            dependencies[path] = str(e)

    return dependencies
process_dependencies(paths)
For the given lists of dependency paths, return a dictionary where the keys are the depenency filepaths and the values are paths, and the values are a a string, describing what is wrong with the path (if anything). If the path is valid, the value will be None

process_upload_filepaths

Source
def process_upload_filepaths(paths):

    processed_paths = []
    for path in paths:
        processed_paths.extend(process_upload_filepath(path))

    return processed_paths
process_upload_filepaths(paths)
Given the list of paths, process each one, ultimately returning a flattened list of all processed paths

process_upload_filepath

Source
def process_upload_filepath(path, strict=True):


    paths = []

    if path:

        # If the path is a file (and it exits)
        if os.path.isfile(path):
            # Condition the path to conform to Conductor's expectations
            filepath = conform_platform_filepath(path)

            # Validate that the path meets all expectations
            error_msg = validate_path(filepath)
            if error_msg:
                logger.warning(error_msg)
                if strict:
                    raise exceptions.InvalidPathException(error_msg)
            paths.append(filepath)

        # If the path is a directory
        elif os.path.isdir(path):
            for filepath in get_files(path, recurse=True):
                # when recursing a directory, don't be strict about whether
                # any of it's enclosed files are "missing" (e.g. broken symlinks)
                paths.extend(process_upload_filepath(filepath, strict=False))

        # If the path is a symlink (which must be broken bc os.path.isfile or
        # os.path.isdir would have been True if it existed)
        elif os.path.islink(path):
            message = "No file(s) found for path: %s" % path
            # If we're being strict, then raise an exception
            if strict:
                raise exceptions.InvalidPathException(message)
            # otherwise warn
            logger.warning(message)

        # If we've gotten here, then we know that that the path string is not a literal file or directory.
        # Therefore attempt to resolve the path string from any any variables/expressions
        else:
            # First, try to resolve any environment variables found in the path.
            path = os.path.expandvars(path)
            if os.path.exists(path):
                paths.extend(process_upload_filepath(path))

            # Lastly, try to resolve any expressions found in the path
            else:
                filepaths = get_files_from_path_expression(path)

                if filepaths:
                    # if there are matching frames/files for the given path  expression (e.g image
                    # sequence), treat each frame as a dependency (adding it to the dependency
                    # dictionary and running it through validation)
                    for filepath in filepaths:
                        paths.extend(process_upload_filepath(filepath))
                else:
                    # if there are no matching frames/files found on disk for the given
                    # path expression(e.g image sequence) and we're being strict,
                    # then raise an exception
                    if not filepaths:
                        message = "No files found for path: %s" % path
                        if strict:
                            raise exceptions.InvalidPathException(message)
                        logger.warning(message)

    return paths
process_upload_filepath(path, strict=True)

Process the given path to ensure that the path is valid (exists on disk), and return any/all files which the path may represent.

For example, if the path is a directory or an image sequence, then explicitly list and return all files that that path represents/contains.

strict: bool. When True and the give path does not exist on disk, raise an exception. Note that when this function is given a directory path, and and it finds any broken symlinks within the directory, the

This function should be able to handle various types of paths:

  1. Directory path
  2. File path
  3. Image sequence path

Process the path by doing the following:

  1. If the path is an image sequence notation, "explode" it and return each frame's filepath. This relies on the file actually being on disk, as the underlying call is to glob.glob(regex). Validate that there is at least one frame on disk for the image sequence. There is no 100% reliable way to know how many frames should actually be part of the image sequence, but we can at least validate that there is a single frame.

  2. If the path is a directory then recursively add all file/dir paths contained within it

  3. If the path is a file then ensure that it exists on disk and that it conforms to Conductor's expectations.

get_common_dirpath

Source
def get_common_dirpath(paths):

    # Using os.path.commonprefix only gets us so far, as it merely matches as
    # many characters as possible, but doesn't ensure those characters clearly end
    # on a directory. For example, given these two paths
    #    r"c:\catfood\rats.txt"
    #    r"c:\catmood\happy.txt"
    # it will return "c:\\cat"  - which is not actually a directory
    #
    # Or worse, given these three directories:
    #    r"c:\catfood\rats.txt"
    #    r"c:\catmood\happy.txt"
    #    r"c:\cat\properties.txt"
    # it will return "c:\\cat", which is a directory, BUT it's not actually a
    # common across thre three paths! Misleading/dangerous!
    output_path = os.path.commonprefix(paths)

    if output_path:
        # if the output "path" ends with a slash, then we know it's actually a directory path, and
        # can return it
        if output_path.endswith(os.sep) and _is_valid_path(output_path):
            return output_path.rstrip(os.sep)  # strip of the trailing path separator

        # Otherwise ask for the directory of the output "path"
        dirpath = os.path.dirname(output_path)

        # IF the directory is NOT considered a root directory (such as "/" or "G:\\" then return it
        if _is_valid_path(dirpath):
            return dirpath
get_common_dirpath(paths)

Find the common directory between all of the filepaths (essentially find the lowest common denominator of all of the given paths). If thers is no common directory shared between the paths, return None

For example, given these three filepaths:

'/home/cat/names/fred.txt'
'/home/cat/names/sally.txt
'/home/cat/games/chase.txt

return '/home/cat'

Exclude the root symbol ("/" or a lettered drive in the case of windows) as a valid common directory.

get_files

Source
def get_files(dirpath, recurse=True):

    files = []

    if not os.path.isdir(dirpath):
        raise Exception("Directory does not exist: '%s'" % dirpath)

    # If operating recursively, use os.walk to grab sub files
    if recurse:
        for sub_dirpath, _, filenames in os.walk(dirpath):
            for filename in filenames:
                filepath = os.path.join(sub_dirpath, filename)
                files.append(filepath)
    else:
        files = []
        for filename in os.listdir(dirpath):
            if os.path.isfile(os.path.join(dirpath, filename)):
                files.append(os.path.join(dirpath, filename))

    return files
get_files(dirpath, recurse=True)

Return all files found in the given directory.

Optionally recurse the directory to also include files that are located in subdirectories as well

conform_platform_filepath

Source
def conform_platform_filepath(filepath):

    platform = sys.platform

    # If the platform is windows, then run specific Windows rules
    if platform.startswith("win"):
        filepath = conform_win_path(filepath)

    return filepath
conform_platform_filepath(filepath)
For the given path, ensure that the path conforms to the standards that Conductor expects. Each platform may potentially have different rules that it follows in order to achieve this.

conform_win_path

Source
def conform_win_path(filepath):

    exp_file = os.path.abspath(os.path.expandvars(filepath))
    return os.path.normpath(exp_file).replace("\\", "/")
conform_win_path(filepath)
For the given filepath, resolve any environment variables in the path and convert all backlashes to forward slashes

validate_path

Source
def validate_path(filepath):

    # Strip the lettered drive portion of the filepath (if there is one).
    # This is only going to affect a path with a lettered drive on Windows filesystem
    filepath = os.path.splitdrive(filepath)[-1]

    # Validate against any forbidden characters
    forbidden_chars = (":",)
    for char in forbidden_chars:
        if char in filepath:
            return "Forbidden character %r found in filepath: %r" % (char, filepath)

    # Ensure filepath begins with a slash
    if not filepath.startswith("/"):
        return "Filepath does not begin with expected %r. Got %r" % ("/", filepath)
validate_path(filepath)

Validate that the given filepath:

  1. Does not contain colons. This is docker path limitation
  2. Starts with a "/". Otherwise the path cannot be mounted in a linux filesystem

If the filepath is valid, return None. Otherwise return a message that describes why the filepath is invalid

quote_path

Source
def quote_path(filepath):

    return '"%s"' % filepath.replace('"', '\\"')
quote_path(filepath)
Wrap the given filepath in double quotes and escape its content.

get_files_from_path_expression

Source
def get_files_from_path_expression(path_expression):


    logger.debug("Evaluating path expression: %s", path_expression)
    # Cycle through all regexes and replace each match with a * so that we can glob file system.
    # Note that a single path expression may contain more than one expression, as well as more than
    # one expression format (e.g. containing both  #### and %04d).
    rxs = get_rx_matches(path_expression, PATH_EXPRESSIONS)
    glob_path = path_expression
    for rx in rxs:
        logger.debug("Matched path regular expression: %s", rx)
        glob_path = re.sub(rx, "*", glob_path, flags=re.I)

    logger.debug("glob_path: %r", glob_path)
    return glob.glob(glob_path)
get_files_from_path_expression(path_expression)

Given a path expression (such as an image sequence path), seek out all files that are part of that path expression (e.g all of the files that are part of that image sequence) and return a list of their explicit paths. This function relies on what is actually on disk, so only files that are found on disk will be returned. If no files are found, return an empty list.

Supports a variety of path expressions. Here are a few examples:

"image.####.exr"   # Hash syntax
"image.####"       # no extension
"image.%04d.exr"   # printf format
"image<UDIM>.exr   # Udim
"image.$F.exr      # Houdini

In addition to matching against the file name/root, this function also matches expressions found against the directory name/path, e.g.

/data/shot-###/image.exr
/data/shot-###/image.####.exr
/data/shot-###/image.%04d.exr
/data/shot-###/camera-$F/image.%04d.exr

get_rx_matches

Source
def get_rx_matches(path_expression, expressions, limit=0):

    matches = []
    for rx in expressions:
        if re.findall(rx, path_expression, flags=re.I):
            matches.append(rx)
            if limit and len(matches) == limit:
                break
    return matches
get_rx_matches(path_expression, expressions, limit=0)
Loop through the given list of expressions (regexes), and return those that match the given path_expression. If a limit is provided, return the first n expressions that match.

create_file

Source
def create_file(filepath):

    umask_original = os.umask(0)
    try:
        handle = os.fdopen(os.open(filepath, os.O_WRONLY | os.O_CREAT, 0o666), "w")
    finally:
        os.umask(umask_original)
    handle.write("")
    handle.close()
create_file(filepath)
Create an empty file with the given permissions (octal)

get_tx_paths

Source
def get_tx_paths(filepaths, existing_only=False):

    return [get_tx_path(path, existing_only=existing_only) for path in filepaths]
get_tx_paths(filepaths, existing_only=False)
Return the tx filepaths for the given filepaths

get_tx_path

Source
def get_tx_path(filepath, existing_only=False):

    filepath_base, _ = os.path.splitext(filepath)
    tx_filepath = filepath_base + ".tx"
    if existing_only and not os.path.isfile(tx_filepath):
        return ""
    return tx_filepath
get_tx_path(filepath, existing_only=False)
For the given filepath, construct a parallel *.tx filepath residing in the same directory (same name, different extension). If existing_only is True, only return the tx filepath if it exists on disk, otherwise return an empty string.

strip_drive_letter

Source
def strip_drive_letter(filepath):

    rx_drive = r"^[a-z]:"
    return re.sub(rx_drive, "", filepath, flags=re.I)
strip_drive_letter(filepath)

If the given filepath has a drive letter, remove it and return the rest of the path

C:\cat.txt         -->    \cat.txt
Z:\cat.txt         -->    \cat.txt
c:/cat.txt         -->    /cat.txt
z:/cat.txt         -->    /cat.txt
//cat.txt          -->    //cat.txt
\cat.txt           -->    \cat.txt
\cat\c:\dog.txt   -->    \cat\c:\dog.txt
/cat/c:/dog.txt    -->    /cat/c:/dog.txt
c:\cat\z:\dog.txt  -->    \cat\z:\dog.txt

Note that os.path.splitdrive should not be used (anymore), due to a change in behavior that was implemented somewhere between python 2.7.6 vs 2.7.11

Back to top