Filesystem¶
-
class
blocks.filesystem.
DataFile
¶ Bases:
tuple
Attributes: Methods
count
()index
()Raises ValueError if the value is not present. -
handle
¶ Alias for field number 1
-
path
¶ Alias for field number 0
-
-
class
blocks.filesystem.
FileSystem
[source]¶ Bases:
object
The required interface for any filesystem implementation
See GCSFileSystem for a full implementation. This FileSystem is intended to be extendable to support cloud file systems, encryption strategies, etc…
Methods
access
(self, paths)Access multiple paths as file-like objects ls
(self, path)List files correspond to path, including glob wildcards store
(self, bucket, files)Store multiple data objects -
access
(self, paths)[source]¶ Access multiple paths as file-like objects
This allows for optimization like parallel downloads
Parameters: - paths: list of str
The paths of the files to access
Returns: - files: list of DataFile
A list of datafile instances, one for each input path
-
ls
(self, path)[source]¶ List files correspond to path, including glob wildcards
Parameters: - path : str
The path to the file or directory to list; supports wildcards
-
store
(self, bucket, files)[source]¶ Store multiple data objects
This allows for optimizations when storing several files
Parameters: - bucket : str
The GCS bucket to use to store the files
- files : list of str
The file names to store
Returns: - datafiles : contextmanager
A contextmanager that will yield datafiles and place them on the filesystem when finished
-
-
class
blocks.filesystem.
GCSFileSystem
(parallel=True, quiet=True)[source]¶ Bases:
blocks.filesystem.FileSystem
File system interface that supports both local and GCS files
This implementation uses subprocess and gsutil, which has excellent performance. However this can lead to problems in very multi-threaded applications and might not be as portable. For a python native implementation use GCSNativeFileSystem
Methods
access
(self, paths)Access multiple paths as file-like objects cp
(self, sources, dest[, recursive])Copy the files in sources to dest local
(self, path)Check if the path is available as a local file ls
(self, path)List files correspond to path, including glob wildcards open
(\*args, \*\*kwds)Access path as a file-like object rm
(self, paths[, recursive])Remove the files at paths store
(\*args, \*\*kwds)Create file stores that will be written to the filesystem on close -
GCS
= 'gs://'¶
-
access
(self, paths)[source]¶ Access multiple paths as file-like objects
This allows for optimization like parallel downloads
Parameters: - paths: list of str
The paths of the files to access
Returns: - files: list of DataFile
A list of datafile instances, one for each input path
-
cp
(self, sources, dest, recursive=False)[source]¶ Copy the files in sources to dest
Parameters: - sources : list of str
The list of paths to copy
- dest : str
The destination for the copy of source(s)
- recursive : bool
If true, recursively copy any directories
-
ls
(self, path)[source]¶ List files correspond to path, including glob wildcards
Parameters: - path : str
The path to the file or directory to list; supports wildcards
-
open
(*args, **kwds)[source]¶ Access path as a file-like object
Parameters: - path: str
The path of the file to access
- mode: str
The file mode for the opened file
Returns: - file: file
A python file opened to the provided path (uses a local temporary copy that is removed)
-
rm
(self, paths, recursive=False)[source]¶ Remove the files at paths
Parameters: - paths : list of str
The paths to remove
- recursive : bool, default False
If true, recursively remove any directories
-
store
(*args, **kwds)[source]¶ Create file stores that will be written to the filesystem on close
This allows for optimizations when storing several files
Parameters: - bucket : str
The path of the bucket (on GCS) or folder (local) to store the data in
- files : list of str
The filenames to create
Returns: - datafiles : contextmanager
A context manager that yields datafiles and when the context is closed they are written to GCS
-
-
class
blocks.filesystem.
GCSNativeFileSystem
(*args, **kwargs)[source]¶ Bases:
blocks.filesystem.GCSFileSystem
File system interface that supports GCS and local files
This uses the native python cloud storage library for read and write, rather than gsutil. The performance is significantly slower when doing any operations over several files (especially copy), but is thread-safe for applications which are already parallelized. It stores the files entirely in memory rather than using tempfiles.
Methods
access
(self, paths)Access multiple paths as file-like objects cp
(self, sources, dest[, recursive])Copy the files in sources (recursively) to dest local
(self, path)Check if the path is available as a local file ls
(self, path)List all files at the specified path, supports globbing open
(\*args, \*\*kwds)Access paths as a file-like object rm
(self, paths[, recursive])Remove the files at paths store
(\*args, \*\*kwds)Create file stores that will be written to the filesystem on close client copy_single is_dir rm_single -
access
(self, paths)[source]¶ Access multiple paths as file-like objects
This allows for optimization like parallel downloads. To help track which files came from which objects, this returns instances of Datafile
Parameters: - paths: list of str
The paths of the files to access
Returns: - files: list of DataFile
A list of datafile instances, one for each input path
-
cp
(self, sources, dest, recursive=False)[source]¶ Copy the files in sources (recursively) to dest
Parameters: - sources : list of str
The list of paths to copy, which can be directories
- dest : str
The destination for the copy of source(s)
- recursive : bool, default False
If true, recursively copy directories
-
open
(*args, **kwds)[source]¶ Access paths as a file-like object
Parameters: - path: str
The path of the file to access
- mode: str
The file mode for the opened file
Returns: - file: BytesIO
A BytesIO handle for the specified path, works like a file object
-
rm
(self, paths, recursive=False)[source]¶ Remove the files at paths
Parameters: - paths : list of str
The paths to remove
- recursive : bool, default False
If true, recursively remove any directories
-
store
(*args, **kwds)[source]¶ Create file stores that will be written to the filesystem on close
This allows for optimizations when storing several files
Parameters: - bucket : str
The path of the bucket (on GCS) or folder (local) to store the data in
- files : list of str
The filenames to create
Returns: - datafiles : contextmanager
A context manager that yields datafiles and when the context is closed they are written to GCS
-