Filesystem

class blocks.filesystem.DataFile

Bases: tuple

Attributes:
handle

Alias for field number 1

path

Alias for field number 0

Methods

count()
index() Raises ValueError if the value is not present.
handle

Alias for field number 1

path

Alias for field number 0

class blocks.filesystem.FileSystem[source]

Bases: object

The required interface for any filesystem implementation

See GCSFileSystem for a full implementation. This FileSystem is intended to be extendable to support cloud file systems, encryption strategies, etc…

Methods

access(self, paths) Access multiple paths as file-like objects
ls(self, path) List files correspond to path, including glob wildcards
store(self, bucket, files) Store multiple data objects
access(self, paths)[source]

Access multiple paths as file-like objects

This allows for optimization like parallel downloads

Parameters:
paths: list of str

The paths of the files to access

Returns:
files: list of DataFile

A list of datafile instances, one for each input path

ls(self, path)[source]

List files correspond to path, including glob wildcards

Parameters:
path : str

The path to the file or directory to list; supports wildcards

store(self, bucket, files)[source]

Store multiple data objects

This allows for optimizations when storing several files

Parameters:
bucket : str

The GCS bucket to use to store the files

files : list of str

The file names to store

Returns:
datafiles : contextmanager

A contextmanager that will yield datafiles and place them on the filesystem when finished

class blocks.filesystem.GCSFileSystem(parallel=True, quiet=True)[source]

Bases: blocks.filesystem.FileSystem

File system interface that supports both local and GCS files

This implementation uses subprocess and gsutil, which has excellent performance. However this can lead to problems in very multi-threaded applications and might not be as portable. For a python native implementation use GCSNativeFileSystem

Methods

access(self, paths) Access multiple paths as file-like objects
cp(self, sources, dest[, recursive]) Copy the files in sources to dest
local(self, path) Check if the path is available as a local file
ls(self, path) List files correspond to path, including glob wildcards
open(\*args, \*\*kwds) Access path as a file-like object
rm(self, paths[, recursive]) Remove the files at paths
store(\*args, \*\*kwds) Create file stores that will be written to the filesystem on close
GCS = 'gs://'
access(self, paths)[source]

Access multiple paths as file-like objects

This allows for optimization like parallel downloads

Parameters:
paths: list of str

The paths of the files to access

Returns:
files: list of DataFile

A list of datafile instances, one for each input path

cp(self, sources, dest, recursive=False)[source]

Copy the files in sources to dest

Parameters:
sources : list of str

The list of paths to copy

dest : str

The destination for the copy of source(s)

recursive : bool

If true, recursively copy any directories

local(self, path)[source]

Check if the path is available as a local file

ls(self, path)[source]

List files correspond to path, including glob wildcards

Parameters:
path : str

The path to the file or directory to list; supports wildcards

open(*args, **kwds)[source]

Access path as a file-like object

Parameters:
path: str

The path of the file to access

mode: str

The file mode for the opened file

Returns:
file: file

A python file opened to the provided path (uses a local temporary copy that is removed)

rm(self, paths, recursive=False)[source]

Remove the files at paths

Parameters:
paths : list of str

The paths to remove

recursive : bool, default False

If true, recursively remove any directories

store(*args, **kwds)[source]

Create file stores that will be written to the filesystem on close

This allows for optimizations when storing several files

Parameters:
bucket : str

The path of the bucket (on GCS) or folder (local) to store the data in

files : list of str

The filenames to create

Returns:
datafiles : contextmanager

A context manager that yields datafiles and when the context is closed they are written to GCS

class blocks.filesystem.GCSNativeFileSystem(*args, **kwargs)[source]

Bases: blocks.filesystem.GCSFileSystem

File system interface that supports GCS and local files

This uses the native python cloud storage library for read and write, rather than gsutil. The performance is significantly slower when doing any operations over several files (especially copy), but is thread-safe for applications which are already parallelized. It stores the files entirely in memory rather than using tempfiles.

Methods

access(self, paths) Access multiple paths as file-like objects
cp(self, sources, dest[, recursive]) Copy the files in sources (recursively) to dest
local(self, path) Check if the path is available as a local file
ls(self, path) List all files at the specified path, supports globbing
open(\*args, \*\*kwds) Access paths as a file-like object
rm(self, paths[, recursive]) Remove the files at paths
store(\*args, \*\*kwds) Create file stores that will be written to the filesystem on close
client  
copy_single  
is_dir  
rm_single  
access(self, paths)[source]

Access multiple paths as file-like objects

This allows for optimization like parallel downloads. To help track which files came from which objects, this returns instances of Datafile

Parameters:
paths: list of str

The paths of the files to access

Returns:
files: list of DataFile

A list of datafile instances, one for each input path

client(self)[source]
copy_single(self, source, dest)[source]
cp(self, sources, dest, recursive=False)[source]

Copy the files in sources (recursively) to dest

Parameters:
sources : list of str

The list of paths to copy, which can be directories

dest : str

The destination for the copy of source(s)

recursive : bool, default False

If true, recursively copy directories

is_dir(self, path)[source]
ls(self, path)[source]

List all files at the specified path, supports globbing

open(*args, **kwds)[source]

Access paths as a file-like object

Parameters:
path: str

The path of the file to access

mode: str

The file mode for the opened file

Returns:
file: BytesIO

A BytesIO handle for the specified path, works like a file object

rm(self, paths, recursive=False)[source]

Remove the files at paths

Parameters:
paths : list of str

The paths to remove

recursive : bool, default False

If true, recursively remove any directories

rm_single(self, path)[source]
store(*args, **kwds)[source]

Create file stores that will be written to the filesystem on close

This allows for optimizations when storing several files

Parameters:
bucket : str

The path of the bucket (on GCS) or folder (local) to store the data in

files : list of str

The filenames to create

Returns:
datafiles : contextmanager

A context manager that yields datafiles and when the context is closed they are written to GCS