Skip to content

Included Databases

International Soundscape Database (ISD)

Module for handling the International Soundscape Database (ISD).

This module provides functions for loading, validating, and analyzing data from the International Soundscape Database. It includes utilities for data retrieval, quality checks, and basic analysis operations.

Notes

The ISD is a large-scale database of soundscape surveys and recordings collected across multiple cities. This module is designed to work with the specific structure and content of the ISD.

Examples:

>>> import soundscapy.databases.isd as isd
>>> df = isd.load()
>>> isinstance(df, pd.DataFrame)
True
>>> 'PAQ1' in df.columns
True
FUNCTION DESCRIPTION
describe_location

Return a summary of the data for a specific location.

likert_categorical_from_data

Get the Likert labels for a specific column in the DataFrame.

load

Load the example "ISD" csv file to a DataFrame.

load_zenodo

Automatically fetch and load the ISD dataset from Zenodo.

match_col_to_likert_scale

Match a column in the DataFrame to the Likert scale.

select_group_ids

Filter the dataframe by GroupID.

select_location_ids

Filter the dataframe by LocationID.

select_record_ids

Filter the dataframe by RecordID.

select_session_ids

Filter the dataframe by SessionID.

soundscapy_describe

Return a summary of the data grouped by a specified column.

validate

Perform data quality checks and validate that the dataset fits the expected format.

describe_location

describe_location(data, location, calc_type='percent', pl_threshold=0, ev_threshold=0)

Return a summary of the data for a specific location.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

location

Location to describe.

TYPE: str

calc_type

Type of summary, either "percent" or "count", by default "percent".

TYPE: str DEFAULT: 'percent'

pl_threshold

Pleasantness threshold, by default 0.

TYPE: float DEFAULT: 0

ev_threshold

Eventfulness threshold, by default 0.

TYPE: float DEFAULT: 0

RETURNS DESCRIPTION
Dict[str, Union[int, float]]

Summary of the data for the specified location.

Examples:

>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'PAQ1': [4, 2, 3, 5],
...     'PAQ2': [3, 5, 2, 4],
...     'PAQ3': [2, 4, 1, 3],
...     'PAQ4': [1, 3, 4, 2],
...     'PAQ5': [5, 1, 5, 1],
...     'PAQ6': [4, 2, 3, 5],
...     'PAQ7': [3, 5, 2, 4],
...     'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = describe_location(df, 'L1')
>>> set(result.keys()) == {
...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
...     'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result['count']
2
Source code in soundscapy/databases/isd.py
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
def describe_location(
    data: pd.DataFrame,
    location: str,
    calc_type: str = "percent",
    pl_threshold: float = 0,
    ev_threshold: float = 0,
) -> dict[str, int | float]:
    """
    Return a summary of the data for a specific location.

    Parameters
    ----------
    data : pd.DataFrame
        ISD dataframe.
    location : str
        Location to describe.
    calc_type : str, optional
        Type of summary, either "percent" or "count", by default "percent".
    pl_threshold : float, optional
        Pleasantness threshold, by default 0.
    ev_threshold : float, optional
        Eventfulness threshold, by default 0.

    Returns
    -------
    Dict[str, Union[int, float]]
        Summary of the data for the specified location.

    Examples
    --------
    >>> from soundscapy.surveys.processing import add_iso_coords
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'PAQ1': [4, 2, 3, 5],
    ...     'PAQ2': [3, 5, 2, 4],
    ...     'PAQ3': [2, 4, 1, 3],
    ...     'PAQ4': [1, 3, 4, 2],
    ...     'PAQ5': [5, 1, 5, 1],
    ...     'PAQ6': [4, 2, 3, 5],
    ...     'PAQ7': [3, 5, 2, 4],
    ...     'PAQ8': [2, 4, 1, 3],
    ... })
    >>> df = add_iso_coords(df)
    >>> result = describe_location(df, 'L1')
    >>> set(result.keys()) == {
    ...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
    ...     'vibrant', 'chaotic', 'monotonous', 'calm'
    ... }
    True
    >>> result['count']
    2

    """
    loc_df = select_location_ids(data, location_ids=location)
    count = len(loc_df)

    if "ISOPleasant" not in loc_df.columns or "ISOEventful" not in loc_df.columns:
        iso_pleasant, iso_eventful = calculate_iso_coords(loc_df)
        loc_df = loc_df.assign(ISOPleasant=iso_pleasant, ISOEventful=iso_eventful)

    pl_count = (loc_df["ISOPleasant"] > pl_threshold).sum()
    ev_count = (loc_df["ISOEventful"] > ev_threshold).sum()
    vibrant_count = (
        (loc_df["ISOPleasant"] > pl_threshold) & (loc_df["ISOEventful"] > ev_threshold)
    ).sum()
    chaotic_count = (
        (loc_df["ISOPleasant"] < pl_threshold) & (loc_df["ISOEventful"] > ev_threshold)
    ).sum()
    mono_count = (
        (loc_df["ISOPleasant"] < pl_threshold) & (loc_df["ISOEventful"] < ev_threshold)
    ).sum()
    calm_count = (
        (loc_df["ISOPleasant"] > pl_threshold) & (loc_df["ISOEventful"] < ev_threshold)
    ).sum()

    res = {
        "count": count,
        "ISOPleasant": loc_df["ISOPleasant"].mean(),
        "ISOEventful": loc_df["ISOEventful"].mean(),
    }

    if calc_type == "percent":
        res.update(
            {
                "pleasant": pl_count / count,
                "eventful": ev_count / count,
                "vibrant": vibrant_count / count,
                "chaotic": chaotic_count / count,
                "monotonous": mono_count / count,
                "calm": calm_count / count,
            }
        )
    elif calc_type == "count":
        res.update(
            {
                "pleasant": pl_count,
                "eventful": ev_count,
                "vibrant": vibrant_count,
                "chaotic": chaotic_count,
                "monotonous": mono_count,
                "calm": calm_count,
            }
        )
    else:
        msg = "Type must be either 'percent' or 'count'"
        raise ValueError(msg)

    return {k: round(v, 3) if isinstance(v, float) else v for k, v in res.items()}

likert_categorical_from_data

likert_categorical_from_data(data)

Get the Likert labels for a specific column in the DataFrame.

PARAMETER DESCRIPTION
data

Series containing the data.

TYPE: Series

RETURNS DESCRIPTION
Series

Series with Likert labels.

RAISES DESCRIPTION
ValueError

If the column does not match any known Likert scale.

Source code in soundscapy/databases/isd.py
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
def likert_categorical_from_data(
    data: pd.Series,
) -> pd.Categorical:
    """
    Get the Likert labels for a specific column in the DataFrame.

    Parameters
    ----------
    data : pd.Series
        Series containing the data.

    Returns
    -------
    pd.Series
        Series with Likert labels.

    Raises
    ------
    ValueError
        If the column does not match any known Likert scale.

    """
    likert_scale = match_col_to_likert_scale(str(data.name))
    if isinstance(data, pd.Categorical):
        return data

    data = data.astype("int") - 1  # Convert to zero-based index
    codes = data.to_list()

    return pd.Categorical.from_codes(
        codes,
        dtype=CategoricalDtype(categories=likert_scale, ordered=True),
    )

load

load(locations=None)

Load the example "ISD" csv file to a DataFrame.

RETURNS DESCRIPTION
DataFrame

DataFrame containing ISD data.

Notes

This function loads the ISD data from a local CSV file included with the soundscapy package.

References

Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M., Lionello, M., & Kang, J. (2022). The International Soundscape Database: An integrated multimedia database of urban soundscape surveys -- questionnaires with acoustical and contextual information (0.2.4) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6331810

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load()
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True
Source code in soundscapy/databases/isd.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def load(locations: list[str] | None = None) -> pd.DataFrame:
    """
    Load the example "ISD" csv file to a DataFrame.

    Returns
    -------
    pd.DataFrame
        DataFrame containing ISD data.

    Notes
    -----
    This function loads the ISD data from a local CSV file included
    with the soundscapy package.

    References
    ----------
    Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M.,
    Lionello, M., & Kang, J. (2022). The International Soundscape Database:
    An integrated multimedia database of urban soundscape surveys --
    questionnaires with acoustical and contextual information (0.2.4) [Data set].
    Zenodo. https://doi.org/10.5281/zenodo.6331810

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS
    >>> df = load()
    >>> isinstance(df, pd.DataFrame)
    True
    >>> set(PAQ_IDS).issubset(df.columns)
    True

    """
    isd_resource = resources.files("soundscapy.data").joinpath("ISD v1.0 Data.csv")
    with resources.as_file(isd_resource) as f:
        data = pd.read_csv(f)
    data = rename_paqs(data, _PAQ_ALIASES)
    logger.info("Loaded ISD data from Soundscapy's included CSV file.")

    if locations is not None:
        data = select_location_ids(data, locations)

    return data

load_zenodo

load_zenodo(version='latest')

Automatically fetch and load the ISD dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version number of the dataset to fetch, by default "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing ISD data.

RAISES DESCRIPTION
ValueError

If the specified version is not recognized.

Notes

This function fetches the ISD data directly from Zenodo, allowing access to different versions of the dataset.

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load_zenodo("v1.0.1")
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True
Source code in soundscapy/databases/isd.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
def load_zenodo(version: str = "latest") -> pd.DataFrame:
    """
    Automatically fetch and load the ISD dataset from Zenodo.

    Parameters
    ----------
    version : str, optional
        Version number of the dataset to fetch, by default "latest".

    Returns
    -------
    pd.DataFrame
        DataFrame containing ISD data.

    Raises
    ------
    ValueError
        If the specified version is not recognized.

    Notes
    -----
    This function fetches the ISD data directly from Zenodo, allowing
    access to different versions of the dataset.

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS  # doctest: +SKIP
    >>> df = load_zenodo("v1.0.1")  # doctest: +SKIP
    >>> isinstance(df, pd.DataFrame)  # doctest: +SKIP
    True
    >>> set(PAQ_IDS).issubset(df.columns)  # doctest: +SKIP
    True

    """
    version = version.lower()
    version = "v1.0.1" if version == "latest" else version

    url_mapping = {
        "v0.2.0": "https://zenodo.org/record/5578573/files/SSID%20Lockdown%20Database%20VL0.2.1.xlsx",
        "v0.2.1": "https://zenodo.org/record/5578573/files/SSID%20Lockdown%20Database%20VL0.2.1.xlsx",
        "v0.2.2": "https://zenodo.org/record/5705908/files/SSID%20Lockdown%20Database%20VL0.2.2.xlsx",
        "v0.2.3": "https://zenodo.org/record/5914762/files/SSID%20Lockdown%20Database%20VL0.2.2.xlsx",
        "v1.0.0": "https://zenodo.org/records/10639661/files/ISD%20v1.0%20Data.csv",
        "v1.0.1": "https://zenodo.org/records/10639661/files/ISD%20v1.0%20Data.csv",
    }

    if version not in url_mapping:
        msg = f"Version {version} not recognised."
        raise ValueError(msg)

    url = url_mapping[version]
    file_type = "csv" if version in ["v1.0.0", "v1.0.1"] else "excel"

    data = (
        pd.read_csv(url)
        if file_type == "csv"
        else pd.read_excel(url, engine="openpyxl")
    )
    data = rename_paqs(data, _PAQ_ALIASES)

    logger.info(f"Loaded ISD data version {version} from Zenodo")
    return data

match_col_to_likert_scale

match_col_to_likert_scale(col)

Match a column in the DataFrame to the Likert scale.

PARAMETER DESCRIPTION
col

Column name to match.

TYPE: str

likert_scale

Likert scale to match against.

TYPE: LikertScale

RETURNS DESCRIPTION
Scale

Likert scale object.

Source code in soundscapy/databases/isd.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
def match_col_to_likert_scale(col: str | None) -> Scale:  # noqa: PLR0911
    """
    Match a column in the DataFrame to the Likert scale.

    Parameters
    ----------
    col : str
        Column name to match.
    likert_scale : LikertScale
        Likert scale to match against.

    Returns
    -------
    Scale
        Likert scale object.

    """
    if col in PAQ_IDS or col in PAQ_LABELS:
        return LIKERT_SCALES.paq
    if col in ["traffic_noise", "other_noise", "human_sounds", "natural_sounds"]:
        return LIKERT_SCALES.source
    if col in ["overall_sound_environment"]:
        return LIKERT_SCALES.overall
    if col in ["appropriate"]:
        return LIKERT_SCALES.appropriate
    if col in ["perceived_loud"]:
        return LIKERT_SCALES.loud
    if col in ["visit_often"]:
        return LIKERT_SCALES.often
    if col in ["like_to_visit"]:
        return LIKERT_SCALES.visit

    msg = f"Column {col} does not match any known Likert scale."
    raise ValueError(msg)

select_group_ids

select_group_ids(data, group_ids)

Filter the dataframe by GroupID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

group_ids

GroupID(s) to filter by.

TYPE: Union[str, int, List, Tuple]

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'GroupID': ['G1', 'G1', 'G2', 'G2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_group_ids(df, 'G1')
  GroupID  Value
0      G1      1
1      G1      2
Source code in soundscapy/databases/isd.py
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
def select_group_ids(
    data: pd.DataFrame, group_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by GroupID.

    Parameters
    ----------
    data : pd.DataFrame
        ISD dataframe.
    group_ids : Union[str, int, List, Tuple]
        GroupID(s) to filter by.

    Returns
    -------
    pd.DataFrame
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'GroupID': ['G1', 'G1', 'G2', 'G2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_group_ids(df, 'G1')
      GroupID  Value
    0      G1      1
    1      G1      2

    """
    return _isd_select(data, "GroupID", group_ids)

select_location_ids

select_location_ids(data, location_ids)

Filter the dataframe by LocationID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

location_ids

LocationID(s) to filter by.

TYPE: Union[str, int, List, Tuple]

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_location_ids(df, 'L2')
  LocationID  Value
2         L2      3
3         L2      4
Source code in soundscapy/databases/isd.py
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
def select_location_ids(
    data: pd.DataFrame, location_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by LocationID.

    Parameters
    ----------
    data : pd.DataFrame
        ISD dataframe.
    location_ids : Union[str, int, List, Tuple]
        LocationID(s) to filter by.

    Returns
    -------
    pd.DataFrame
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_location_ids(df, 'L2')
      LocationID  Value
    2         L2      3
    3         L2      4

    """
    return _isd_select(data, "LocationID", location_ids)

select_record_ids

select_record_ids(data, record_ids)

Filter the dataframe by RecordID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

record_ids

RecordID(s) to filter by.

TYPE: Union[str, int, List, Tuple]

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'RecordID': ['A', 'B', 'C', 'D'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_record_ids(df, ['A', 'C'])
  RecordID  Value
0        A      1
2        C      3
Source code in soundscapy/databases/isd.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
def select_record_ids(
    data: pd.DataFrame, record_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by RecordID.

    Parameters
    ----------
    data : pd.DataFrame
        ISD dataframe.
    record_ids : Union[str, int, List, Tuple]
        RecordID(s) to filter by.

    Returns
    -------
    pd.DataFrame
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'RecordID': ['A', 'B', 'C', 'D'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_record_ids(df, ['A', 'C'])
      RecordID  Value
    0        A      1
    2        C      3

    """
    return _isd_select(data, "RecordID", record_ids)

select_session_ids

select_session_ids(data, session_ids)

Filter the dataframe by SessionID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

session_ids

SessionID(s) to filter by.

TYPE: Union[str, int, List, Tuple]

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'SessionID': ['S1', 'S1', 'S2', 'S2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_session_ids(df, ['S1', 'S2'])
  SessionID  Value
0        S1      1
1        S1      2
2        S2      3
3        S2      4
Source code in soundscapy/databases/isd.py
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
def select_session_ids(
    data: pd.DataFrame, session_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by SessionID.

    Parameters
    ----------
    data : pd.DataFrame
        ISD dataframe.
    session_ids : Union[str, int, List, Tuple]
        SessionID(s) to filter by.

    Returns
    -------
    pd.DataFrame
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'SessionID': ['S1', 'S1', 'S2', 'S2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_session_ids(df, ['S1', 'S2'])
      SessionID  Value
    0        S1      1
    1        S1      2
    2        S2      3
    3        S2      4

    """
    return _isd_select(data, "SessionID", session_ids)

soundscapy_describe

soundscapy_describe(df, group_by='LocationID', calc_type='percent')

Return a summary of the data grouped by a specified column.

PARAMETER DESCRIPTION
df

ISD dataframe.

TYPE: DataFrame

group_by

Column to group by, by default "LocationID".

TYPE: str DEFAULT: 'LocationID'

type

Type of summary, either "percent" or "count", by default "percent".

TYPE: str

RETURNS DESCRIPTION
DataFrame

Summary of the data.

Examples:

>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'PAQ1': [4, 2, 3, 5],
...     'PAQ2': [3, 5, 2, 4],
...     'PAQ3': [2, 4, 1, 3],
...     'PAQ4': [1, 3, 4, 2],
...     'PAQ5': [5, 1, 5, 1],
...     'PAQ6': [4, 2, 3, 5],
...     'PAQ7': [3, 5, 2, 4],
...     'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = soundscapy_describe(df)
>>> isinstance(result, pd.DataFrame)
True
>>> result.index.tolist()
['L1', 'L2']
>>> set(result.columns) == {
...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
...     'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result = soundscapy_describe(df, calc_type="count")
>>> result.loc['L1', 'count']
2
Source code in soundscapy/databases/isd.py
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
def soundscapy_describe(
    df: pd.DataFrame, group_by: str = "LocationID", calc_type: str = "percent"
) -> pd.DataFrame:
    """
    Return a summary of the data grouped by a specified column.

    Parameters
    ----------
    df : pd.DataFrame
        ISD dataframe.
    group_by : str, optional
        Column to group by, by default "LocationID".
    type : str, optional
        Type of summary, either "percent" or "count", by default "percent".

    Returns
    -------
    pd.DataFrame
        Summary of the data.

    Examples
    --------
    >>> from soundscapy.surveys.processing import add_iso_coords
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'PAQ1': [4, 2, 3, 5],
    ...     'PAQ2': [3, 5, 2, 4],
    ...     'PAQ3': [2, 4, 1, 3],
    ...     'PAQ4': [1, 3, 4, 2],
    ...     'PAQ5': [5, 1, 5, 1],
    ...     'PAQ6': [4, 2, 3, 5],
    ...     'PAQ7': [3, 5, 2, 4],
    ...     'PAQ8': [2, 4, 1, 3],
    ... })
    >>> df = add_iso_coords(df)
    >>> result = soundscapy_describe(df)
    >>> isinstance(result, pd.DataFrame)
    True
    >>> result.index.tolist()
    ['L1', 'L2']
    >>> set(result.columns) == {
    ...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
    ...     'vibrant', 'chaotic', 'monotonous', 'calm'
    ... }
    True
    >>> result = soundscapy_describe(df, calc_type="count")
    >>> result.loc['L1', 'count']
    2

    """
    res = {
        location: describe_location(df, location, calc_type=calc_type)
        for location in df[group_by].unique()
    }
    return pd.DataFrame.from_dict(res, orient="index")

validate

validate(df, paq_aliases=_PAQ_ALIASES, val_range=(1, 5), *, allow_paq_na=False)

Perform data quality checks and validate that the dataset fits the expected format.

PARAMETER DESCRIPTION
df

ISD style dataframe, including PAQ data.

TYPE: DataFrame

paq_aliases

List of PAQ names (in order) or dict of PAQ names with new names as values.

TYPE: Union[List, Dict] DEFAULT: _PAQ_ALIASES

allow_paq_na

If True, allow NaN values in PAQ data, by default False.

TYPE: bool DEFAULT: False

val_range

Min and max range of the PAQ response values, by default (1, 5).

TYPE: Tuple[int, int] DEFAULT: (1, 5)

RETURNS DESCRIPTION
Tuple[DataFrame, DataFrame | None]

Tuple containing the cleaned dataframe and optionally a dataframe of excluded samples.

Notes

This function renames PAQ columns, checks PAQ data quality, and optionally removes rows with invalid or missing PAQ values.

Examples:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
... })
>>> clean_df, excl_df = validate(df, allow_paq_na=True)
>>> clean_df.shape[0]
2
>>> excl_df.shape[0]
2
Source code in soundscapy/databases/isd.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
def validate(
    df: pd.DataFrame,
    paq_aliases: list | dict = _PAQ_ALIASES,
    val_range: tuple[int, int] = (1, 5),
    *,
    allow_paq_na: bool = False,
) -> tuple[pd.DataFrame, pd.DataFrame | None]:
    """
    Perform data quality checks and validate that the dataset fits the expected format.

    Parameters
    ----------
    df : pd.DataFrame
        ISD style dataframe, including PAQ data.
    paq_aliases : Union[List, Dict], optional
        List of PAQ names (in order) or dict of PAQ names with new names as values.
    allow_paq_na : bool, optional
        If True, allow NaN values in PAQ data, by default False.
    val_range : Tuple[int, int], optional
        Min and max range of the PAQ response values, by default (1, 5).

    Returns
    -------
    Tuple[pd.DataFrame, pd.DataFrame | None]
        Tuple containing the cleaned dataframe
        and optionally a dataframe of excluded samples.

    Notes
    -----
    This function renames PAQ columns, checks PAQ data quality, and optionally
    removes rows with invalid or missing PAQ values.

    Examples
    --------
    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({
    ...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
    ...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
    ...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
    ... })
    >>> clean_df, excl_df = validate(df, allow_paq_na=True)
    >>> clean_df.shape[0]
    2
    >>> excl_df.shape[0]
    2

    """
    logger.info("Validating ISD data")
    data = rename_paqs(df, paq_aliases)

    invalid_indices = likert_data_quality(
        data, val_range=val_range, allow_na=allow_paq_na
    )

    if invalid_indices:
        excl_data = data.iloc[invalid_indices]
        data = data.drop(data.index[invalid_indices])
        logger.info(f"Removed {len(invalid_indices)} rows with invalid PAQ data")
    else:
        excl_data = None
        logger.info("All PAQ data passed quality checks")

    return data, excl_data

options: filters: ["!ISDAccessor", "!_"]

Soundscape Attributes Translation Project (SATP)

Module for handling the Soundscape Attributes Translation Project (SATP) database.

This module provides functions for loading and processing data from the Soundscape Attributes Translation Project database. It includes utilities for data retrieval from Zenodo and basic data loading operations.

Examples:

>>> import soundscapy.databases.satp as satp
>>> df = satp.load_zenodo()
>>> isinstance(df, pd.DataFrame)
True
>>> 'Language' in df.columns
True
>>> participants = satp.load_participants()
>>> isinstance(participants, pd.DataFrame)
True
>>> 'Country' in participants.columns
True
FUNCTION DESCRIPTION
load_participants

Load the SATP participants dataset from Zenodo.

load_zenodo

Load the SATP dataset from Zenodo.

load_participants

load_participants(version='latest')

Load the SATP participants dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version of the dataset to load. The default is "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing the SATP participants dataset.

Source code in soundscapy/databases/satp.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def load_participants(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP participants dataset from Zenodo.

    Parameters
    ----------
    version : str, optional
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    pd.DataFrame
        DataFrame containing the SATP participants dataset.

    """
    url = _url_fetch(version)
    data = pd.read_excel(url, engine="openpyxl", sheet_name="Participants")
    data = data.drop(columns=["Unnamed: 3", "Unnamed: 4"])
    logger.info(f"Loaded SATP participants dataset version {version} from Zenodo")
    return data

load_zenodo

load_zenodo(version='latest')

Load the SATP dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version of the dataset to load. The default is "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing the SATP dataset.

Source code in soundscapy/databases/satp.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def load_zenodo(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP dataset from Zenodo.

    Parameters
    ----------
    version : str, optional
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    pd.DataFrame
        DataFrame containing the SATP dataset.

    """
    url = _url_fetch(version)
    data = pd.read_excel(url, engine="openpyxl", sheet_name="Main Merge")
    logger.info(f"Loaded SATP dataset version {version} from Zenodo")
    return data