lephare.data_retrieval
This module provides functionality for downloading and managing data files using pooch.
Functions
|
Returns all lines in a file that contain any of the target prefixes. |
|
Fetch the contents of a file from a GitHub repository. |
|
Reads file names from a list file and returns a list of file paths. |
Create a retriever with the default settings. |
|
|
Create a retriever for downloading files. |
|
Download a file using the retriever, optionally ignoring the registry. |
|
Download all files in the given list using the retriever. |
Module Contents
- filter_files_by_prefix(file_path, target_prefixes)[source]
Returns all lines in a file that contain any of the target prefixes.
- Parameters:
file_path (str) – The path to the file.
target_prefixes (list) – A list of target prefixes to check for in each line.
- Returns:
A list of lines that contain one of the target prefixes.
- Return type:
list
- download_registry_from_github(url='', outfile='')[source]
Fetch the contents of a file from a GitHub repository.
- Parameters:
url (str) – The URL of the registry file. Defaults to a “data_registry.txt” file at DEFAULT_BASE_DATA_URL.
outfile (str) – The path where the file will be saved. Defaults to DEFAULT_REGISTRY_FILE.
- Raises:
Exception – If there is any problem fetching the registry hash file or full registry file, including network issues, server errors, or other HTTP errors.
- read_list_file(list_file, prefix='')[source]
Reads file names from a list file and returns a list of file paths.
- Parameters:
list_file (str) – The name of the file containing the list of filenames. Can be local or a URL.
prefix (str) –
Optional prefix to add to all file names. When downloaded, file paths must be relative to the “base url,” which is the top-level directory.
Prefixes will be inferred from list_file paths or urls that contain “sed” or “filt”; otherwise; they should be manually specified.
- Returns:
A list of file paths read from the list file.
- Return type:
list of str
- make_retriever(base_url=DEFAULT_BASE_DATA_URL, registry_file=DEFAULT_REGISTRY_FILE, data_path=DEFAULT_LOCAL_DATA_PATH)[source]
Create a retriever for downloading files.
- Parameters:
base_url (str, optional) – The base URL for the data files.
registry_file (str, optional) – The path to the registry file that lists the files and their hashes.
data_path (str, optional) – The local path where the files will be downloaded.
- Returns:
The retriever object for downloading files.
- Return type:
pooch.Pooch
- download_file(retriever, file_name, ignore_registry=False, downloader=None)[source]
Download a file using the retriever, optionally ignoring the registry.
- Parameters:
retriever (pooch.Pooch) – The retriever object for downloading files.
file_name (str) – The name of the file to download.
ignore_registry (bool) – If True, download the file without checking its hash against the registry.
downloader (pooch.HTTPDownloader) – The downloader is required to set the user for building on readthedocs
- Returns:
The path to the downloaded file.
- Return type:
str
- download_all_files(retriever, file_names, ignore_registry=False, retry=MAX_RETRY_ATTEMPTS)[source]
Download all files in the given list using the retriever.
- Parameters:
retriever (pooch.Pooch) – The retriever object for downloading files.
file_names (list of str) – List of file names to download.
ignore_registry (bool) – If True, download the files without checking their hashes against the registry.
retry (int) – Number of times to retry downloading a file if first attempt fails.