Skip to content

Included Databases

International Soundscape Database (ISD)

Module for handling the International Soundscape Database (ISD).

This module provides functions for loading, validating, and analyzing data from the International Soundscape Database. It includes utilities for data retrieval, quality checks, and basic analysis operations.

Notes

The ISD is a large-scale database of soundscape surveys and recordings collected across multiple cities. This module is designed to work with the specific structure and content of the ISD.

Examples:

>>> import soundscapy.databases.isd as isd
>>> df = isd.load()
>>> isinstance(df, pd.DataFrame)
True
>>> 'PAQ1' in df.columns
True

load

load()

Load the example "ISD" csv file to a DataFrame.

RETURNS DESCRIPTION
DataFrame

DataFrame containing ISD data.

Notes

This function loads the ISD data from a local CSV file included with the soundscapy package.

References

Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M., Lionello, M., & Kang, J. (2022). The International Soundscape Database: An integrated multimedia database of urban soundscape surveys -- questionnaires with acoustical and contextual information (0.2.4) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6331810

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load()
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True
Source code in soundscapy/databases/isd.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def load() -> pd.DataFrame:
    """
    Load the example "ISD" csv file to a DataFrame.

    Returns
    -------
    pd.DataFrame
        DataFrame containing ISD data.

    Notes
    -----
    This function loads the ISD data from a local CSV file included
    with the soundscapy package.

    References
    ----------
    Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M.,
    Lionello, M., & Kang, J. (2022). The International Soundscape Database:
    An integrated multimedia database of urban soundscape surveys --
    questionnaires with acoustical and contextual information (0.2.4) [Data set].
    Zenodo. https://doi.org/10.5281/zenodo.6331810

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS
    >>> df = load()
    >>> isinstance(df, pd.DataFrame)
    True
    >>> set(PAQ_IDS).issubset(df.columns)
    True
    """
    isd_resource = resources.files("soundscapy.data").joinpath("ISD v1.0 Data.csv")
    with resources.as_file(isd_resource) as f:
        data = pd.read_csv(f)
    data = rename_paqs(data, _PAQ_ALIASES)
    logger.info("Loaded ISD data from Soundscapy's included CSV file.")
    return data

validate

validate(df, paq_aliases=_PAQ_ALIASES, allow_paq_na=False, val_range=(1, 5))

Perform data quality checks and validate that the dataset fits the expected format.

PARAMETER DESCRIPTION
df

ISD style dataframe, including PAQ data.

TYPE: DataFrame

paq_aliases

List of PAQ names (in order) or dict of PAQ names with new names as values.

TYPE: Union[List, Dict] DEFAULT: _PAQ_ALIASES

allow_paq_na

If True, allow NaN values in PAQ data, by default False.

TYPE: bool DEFAULT: False

val_range

Min and max range of the PAQ response values, by default (1, 5).

TYPE: Tuple[int, int] DEFAULT: (1, 5)

RETURNS DESCRIPTION
Tuple[DataFrame, Optional[DataFrame]]

Tuple containing the cleaned dataframe and optionally a dataframe of excluded samples.

Notes

This function renames PAQ columns, checks PAQ data quality, and optionally removes rows with invalid or missing PAQ values.

Examples:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
... })
>>> clean_df, excl_df = validate(df, allow_paq_na=True)
>>> clean_df.shape[0]
2
>>> excl_df.shape[0]
2
Source code in soundscapy/databases/isd.py
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def validate(
    df: pd.DataFrame,
    paq_aliases: List | Dict = _PAQ_ALIASES,
    allow_paq_na: bool = False,
    val_range: Tuple[int, int] = (1, 5),
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
    """
    Perform data quality checks and validate that the dataset fits the expected format.

    Parameters
    ----------
    df : pd.DataFrame
        ISD style dataframe, including PAQ data.
    paq_aliases : Union[List, Dict], optional
        List of PAQ names (in order) or dict of PAQ names with new names as values.
    allow_paq_na : bool, optional
        If True, allow NaN values in PAQ data, by default False.
    val_range : Tuple[int, int], optional
        Min and max range of the PAQ response values, by default (1, 5).

    Returns
    -------
    Tuple[pd.DataFrame, Optional[pd.DataFrame]]
        Tuple containing the cleaned dataframe and optionally a dataframe of excluded samples.

    Notes
    -----
    This function renames PAQ columns, checks PAQ data quality, and optionally
    removes rows with invalid or missing PAQ values.

    Examples
    --------
    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({
    ...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
    ...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
    ...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
    ... })
    >>> clean_df, excl_df = validate(df, allow_paq_na=True)
    >>> clean_df.shape[0]
    2
    >>> excl_df.shape[0]
    2
    """
    logger.info("Validating ISD data")
    df = rename_paqs(df, paq_aliases)

    invalid_indices = likert_data_quality(
        df, allow_na=allow_paq_na, val_range=val_range
    )

    if invalid_indices:
        excl_df = df.iloc[invalid_indices]
        df = df.drop(df.index[invalid_indices])
        logger.info(f"Removed {len(invalid_indices)} rows with invalid PAQ data")
    else:
        excl_df = None
        logger.info("All PAQ data passed quality checks")

    return df, excl_df

Soundscape Attributes Translation Project (SATP)

Module for handling the Soundscape Attributes Translation Project (SATP) database.

This module provides functions for loading and processing data from the Soundscape Attributes Translation Project database. It includes utilities for data retrieval from Zenodo and basic data loading operations.

Examples:

>>> import soundscapy.databases.satp as satp
>>> df = satp.load_zenodo()
>>> isinstance(df, pd.DataFrame)
True
>>> 'Language' in df.columns
True
>>> participants = satp.load_participants()
>>> isinstance(participants, pd.DataFrame)
True
>>> 'Country' in participants.columns
True

load_participants

load_participants(version='latest')

Load the SATP participants dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version of the dataset to load. The default is "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing the SATP participants dataset.

Source code in soundscapy/databases/satp.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def load_participants(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP participants dataset from Zenodo.

    Parameters
    ----------
    version : str, optional
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    pd.DataFrame
        DataFrame containing the SATP participants dataset.
    """
    url = _url_fetch(version)
    df = pd.read_excel(url, engine="openpyxl", sheet_name="Participants")
    df = df.drop(columns=["Unnamed: 3", "Unnamed: 4"])
    logger.info(f"Loaded SATP participants dataset version {version} from Zenodo")
    return df

load_zenodo

load_zenodo(version='latest')

Load the SATP dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version of the dataset to load. The default is "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing the SATP dataset.

Source code in soundscapy/databases/satp.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def load_zenodo(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP dataset from Zenodo.

    Parameters
    ----------
    version : str, optional
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    pd.DataFrame
        DataFrame containing the SATP dataset.
    """
    url = _url_fetch(version)
    df = pd.read_excel(url, engine="openpyxl", sheet_name="Main Merge")
    logger.info(f"Loaded SATP dataset version {version} from Zenodo")
    return df