Skip to content

Survey Analysis

This section provides an overview of the survey instruments used in soundscape research. It includes a brief description of each instrument, as well as information on how to access and use them.

Soundscape survey data processing module.

This module contains functions for processing and analyzing soundscape survey data, including ISO coordinate calculations, data quality checks, and SSM metrics.

Notes

The functions in this module are designed to be fairly general and can be used with any dataset in a similar format to the ISD. The key to this is using a simple dataframe/sheet with the following columns: Index columns: e.g. LocationID, RecordID, GroupID, SessionID Perceptual attributes: PAQ1, PAQ2, ..., PAQ8 Independent variables: e.g. Laeq, N5, Sharpness, etc.

The key functions of this module are designed to clean/validate datasets, calculate ISO coordinate values or SSM metrics, filter on index columns. Functions and operations which are specific to a particular dataset are located in their own modules under soundscape.databases.

ISOCoordinates dataclass

ISOCoordinates(pleasant, eventful)

Dataclass for storing ISO coordinates.

SSMMetrics dataclass

SSMMetrics(amplitude, angle, elevation, displacement, r_squared)

Dataclass for storing Structural Summary Method (SSM) metrics.

add_iso_coords

add_iso_coords(data, val_range=(1, 5), names=('ISOPleasant', 'ISOEventful'), overwrite=False, angles=EQUAL_ANGLES)

Calculate and add ISO coordinates as new columns in the DataFrame.

PARAMETER DESCRIPTION
data

Input DataFrame containing PAQ data

TYPE: DataFrame

val_range

(min, max) range of original PAQ responses, by default (1, 5)

TYPE: Tuple[int, int] DEFAULT: (1, 5)

names

Names for new coordinate columns, by default ("ISOPleasant", "ISOEventful")

TYPE: Tuple[str, str] DEFAULT: ('ISOPleasant', 'ISOEventful')

overwrite

Whether to overwrite existing ISO coordinate columns, by default False

TYPE: bool DEFAULT: False

angles

Angles for each PAQ in degrees, by default EQUAL_ANGLES

TYPE: Tuple[int, ...] DEFAULT: EQUAL_ANGLES

RETURNS DESCRIPTION
DataFrame

DataFrame with new ISO coordinate columns added

RAISES DESCRIPTION
Warning

If ISO coordinate columns already exist and overwrite is False

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'PAQ1': [4, 2], 'PAQ2': [3, 5], 'PAQ3': [2, 4], 'PAQ4': [1, 3],
...     'PAQ5': [5, 1], 'PAQ6': [3, 2], 'PAQ7': [4, 3], 'PAQ8': [2, 5]
... })
>>> df_with_iso = add_iso_coords(df)
>>> df_with_iso[['ISOPleasant', 'ISOEventful']].round(2)
   ISOPleasant  ISOEventful
0        -0.03        -0.28
1         0.47         0.18
Source code in soundscapy/surveys/processing.py
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
def add_iso_coords(
    data: pd.DataFrame,
    val_range: Tuple[int, int] = (1, 5),
    names: Tuple[str, str] = ("ISOPleasant", "ISOEventful"),
    overwrite: bool = False,
    angles: Tuple[int, ...] = EQUAL_ANGLES,
) -> pd.DataFrame:
    """
    Calculate and add ISO coordinates as new columns in the DataFrame.

    Parameters
    ----------
    data : pd.DataFrame
        Input DataFrame containing PAQ data
    val_range : Tuple[int, int], optional
        (min, max) range of original PAQ responses, by default (1, 5)
    names : Tuple[str, str], optional
        Names for new coordinate columns, by default ("ISOPleasant", "ISOEventful")
    overwrite : bool, optional
        Whether to overwrite existing ISO coordinate columns, by default False
    angles : Tuple[int, ...], optional
        Angles for each PAQ in degrees, by default EQUAL_ANGLES

    Returns
    -------
    pd.DataFrame
        DataFrame with new ISO coordinate columns added

    Raises
    ------
    Warning
        If ISO coordinate columns already exist and overwrite is False

    Examples
    --------
    >>> import pandas as pd
    >>> df = pd.DataFrame({
    ...     'PAQ1': [4, 2], 'PAQ2': [3, 5], 'PAQ3': [2, 4], 'PAQ4': [1, 3],
    ...     'PAQ5': [5, 1], 'PAQ6': [3, 2], 'PAQ7': [4, 3], 'PAQ8': [2, 5]
    ... })
    >>> df_with_iso = add_iso_coords(df)
    >>> df_with_iso[['ISOPleasant', 'ISOEventful']].round(2)
       ISOPleasant  ISOEventful
    0        -0.03        -0.28
    1         0.47         0.18
    """
    for name in names:
        if name in data.columns:
            if overwrite:
                data = data.drop(name, axis=1)
            else:
                raise Warning(
                    f"{name} already in dataframe. Use `overwrite=True` to replace it."
                )

    iso_pleasant, iso_eventful = calculate_iso_coords(
        data, val_range=val_range, angles=angles
    )
    data = data.assign(**{names[0]: iso_pleasant, names[1]: iso_eventful})

    logger.info(f"Added ISO coordinates to DataFrame with column names: {names}")
    return data

calculate_iso_coords

calculate_iso_coords(results_df, val_range=(5, 1), angles=EQUAL_ANGLES)

Calculate the projected ISOPleasant and ISOEventful coordinates.

PARAMETER DESCRIPTION
results_df

DataFrame containing PAQ data.

TYPE: DataFrame

val_range

(max, min) range of original PAQ responses, by default (5, 1)

TYPE: Tuple[int, int] DEFAULT: (5, 1)

angles

Angles for each PAQ in degrees, by default EQUAL_ANGLES

TYPE: Tuple[int, ...] DEFAULT: EQUAL_ANGLES

RETURNS DESCRIPTION
Tuple[Series, Series]

ISOPleasant and ISOEventful coordinate values

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'PAQ1': [4, 2], 'PAQ2': [3, 5], 'PAQ3': [2, 4], 'PAQ4': [1, 3],
...     'PAQ5': [5, 1], 'PAQ6': [3, 2], 'PAQ7': [4, 3], 'PAQ8': [2, 5]
... })
>>> iso_pleasant, iso_eventful = calculate_iso_coords(df)
>>> iso_pleasant.round(2)
0   -0.03
1    0.47
dtype: float64
>>> iso_eventful.round(2)
0   -0.28
1    0.18
dtype: float64
Source code in soundscapy/surveys/processing.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
def calculate_iso_coords(
    results_df: pd.DataFrame,
    val_range: Tuple[int, int] = (5, 1),
    angles: Tuple[int, ...] = EQUAL_ANGLES,
) -> Tuple[pd.Series, pd.Series]:
    """
    Calculate the projected ISOPleasant and ISOEventful coordinates.

    Parameters
    ----------
    results_df : pd.DataFrame
        DataFrame containing PAQ data.
    val_range : Tuple[int, int], optional
        (max, min) range of original PAQ responses, by default (5, 1)
    angles : Tuple[int, ...], optional
        Angles for each PAQ in degrees, by default EQUAL_ANGLES

    Returns
    -------
    Tuple[pd.Series, pd.Series]
        ISOPleasant and ISOEventful coordinate values

    Examples
    --------
    >>> import pandas as pd
    >>> df = pd.DataFrame({
    ...     'PAQ1': [4, 2], 'PAQ2': [3, 5], 'PAQ3': [2, 4], 'PAQ4': [1, 3],
    ...     'PAQ5': [5, 1], 'PAQ6': [3, 2], 'PAQ7': [4, 3], 'PAQ8': [2, 5]
    ... })
    >>> iso_pleasant, iso_eventful = calculate_iso_coords(df)
    >>> iso_pleasant.round(2)
    0   -0.03
    1    0.47
    dtype: float64
    >>> iso_eventful.round(2)
    0   -0.28
    1    0.18
    dtype: float64
    """
    scale = max(val_range) - min(val_range)

    paq_df = return_paqs(results_df, incl_ids=False)

    iso_pleasant = paq_df.apply(lambda row: _adj_iso_pl(row, angles, scale), axis=1)
    iso_eventful = paq_df.apply(lambda row: _adj_iso_ev(row, angles, scale), axis=1)

    logger.info(f"Calculated ISO coordinates for {len(results_df)} samples")
    return iso_pleasant, iso_eventful

likert_data_quality

likert_data_quality(df, allow_na=False, val_range=(1, 5))

Perform basic quality checks on PAQ (Likert scale) data.

PARAMETER DESCRIPTION
df

DataFrame containing PAQ data

TYPE: DataFrame

allow_na

Whether to allow NaN values in PAQ data, by default False

TYPE: bool DEFAULT: False

val_range

Valid range for PAQ values, by default (1, 5)

TYPE: Tuple[int, int] DEFAULT: (1, 5)

RETURNS DESCRIPTION
Optional[List[int]]

List of indices to be removed, or None if no issues found

Examples:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
... })
>>> likert_data_quality(df)
[0, 1, 2]
>>> likert_data_quality(df, allow_na=True)
[1, 2]
Source code in soundscapy/surveys/processing.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
def likert_data_quality(
    df: pd.DataFrame, allow_na: bool = False, val_range: Tuple[int, int] = (1, 5)
) -> Optional[List[int]]:
    """
    Perform basic quality checks on PAQ (Likert scale) data.

    Parameters
    ----------
    df : pd.DataFrame
        DataFrame containing PAQ data
    allow_na : bool, optional
        Whether to allow NaN values in PAQ data, by default False
    val_range : Tuple[int, int], optional
        Valid range for PAQ values, by default (1, 5)

    Returns
    -------
    Optional[List[int]]
        List of indices to be removed, or None if no issues found

    Examples
    --------
    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({
    ...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
    ...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
    ...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
    ... })
    >>> likert_data_quality(df)
    [0, 1, 2]
    >>> likert_data_quality(df, allow_na=True)
    [1, 2]
    """
    paqs = return_paqs(df, incl_ids=False)
    invalid_indices = []

    for i, row in paqs.iterrows():
        if not allow_na and row.isna().any():
            invalid_indices.append(i)
        elif row.notna().all():
            if row.min() < min(val_range) or row.max() > max(val_range):
                invalid_indices.append(i)
            elif row.nunique() == 1 and row.iloc[0] != np.mean(val_range):
                invalid_indices.append(i)

    if invalid_indices:
        logger.info(f"Found {len(invalid_indices)} samples with data quality issues")
        return invalid_indices

    logger.info("PAQ data quality check passed")
    return None

simulation

simulation(n=3000, val_range=(1, 5), incl_iso_coords=False, **coord_kwargs)

Generate random PAQ responses for simulation purposes.

PARAMETER DESCRIPTION
n

Number of samples to simulate, by default 3000

TYPE: int DEFAULT: 3000

val_range

Range of values for PAQ responses, by default (1, 5)

TYPE: Tuple[int, int] DEFAULT: (1, 5)

add_iso_coords

Whether to add calculated ISO coordinates, by default False

TYPE: bool

**coord_kwargs

Additional keyword arguments to pass to add_iso_coords function

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
DataFrame

DataFrame of randomly generated PAQ responses

Examples:

>>> df = simulation(n=5, incl_iso_coords=True)
>>> df.shape
(5, 10)
>>> list(df.columns)
['PAQ1', 'PAQ2', 'PAQ3', 'PAQ4', 'PAQ5', 'PAQ6', 'PAQ7', 'PAQ8', 'ISOPleasant', 'ISOEventful']
Source code in soundscapy/surveys/processing.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
def simulation(
    n: int = 3000,
    val_range: Tuple[int, int] = (1, 5),
    incl_iso_coords: bool = False,
    **coord_kwargs,
) -> pd.DataFrame:
    """
    Generate random PAQ responses for simulation purposes.

    Parameters
    ----------
    n : int, optional
        Number of samples to simulate, by default 3000
    val_range : Tuple[int, int], optional
        Range of values for PAQ responses, by default (1, 5)
    add_iso_coords : bool, optional
        Whether to add calculated ISO coordinates, by default False
    **coord_kwargs : dict
        Additional keyword arguments to pass to add_iso_coords function

    Returns
    -------
    pd.DataFrame
        DataFrame of randomly generated PAQ responses

    Examples
    --------
    >>> df = simulation(n=5, incl_iso_coords=True)
    >>> df.shape
    (5, 10)
    >>> list(df.columns)
    ['PAQ1', 'PAQ2', 'PAQ3', 'PAQ4', 'PAQ5', 'PAQ6', 'PAQ7', 'PAQ8', 'ISOPleasant', 'ISOEventful']
    """
    np.random.seed(42)
    df = pd.DataFrame(
        np.random.randint(min(val_range), max(val_range) + 1, size=(n, 8)),
        columns=PAQ_IDS,
    )

    if incl_iso_coords:
        df = add_iso_coords(df, val_range=val_range, **coord_kwargs)

    logger.info(f"Generated simulated PAQ data with {n} samples")
    return df

ssm_cosine_fit

ssm_cosine_fit(y, angles=EQUAL_ANGLES, bounds=([0, 0, 0, -np.inf], [np.inf, 360, np.inf, np.inf]))

Fit a cosine model to the PAQ data for SSM analysis.

PARAMETER DESCRIPTION
y

Series of PAQ values

TYPE: Series

angles

Angles for each PAQ in degrees, by default EQUAL_ANGLES

TYPE: Tuple[int, ...] DEFAULT: EQUAL_ANGLES

bounds

Bounds for the optimization parameters, by default ([0, 0, 0, -np.inf], [np.inf, 360, np.inf, np.inf])

TYPE: Tuple[List[float], List[float]] DEFAULT: ([0, 0, 0, -inf], [inf, 360, inf, inf])

RETURNS DESCRIPTION
SSMMetrics

Calculated SSM metrics

Examples:

>>> # xdoctest: +SKIP
>>> import pandas as pd
>>> y = pd.Series([4, 3, 2, 1, 5, 3, 4, 2])
>>> metrics = ssm_cosine_fit(y)
>>> [round(v, 2) if isinstance(v, float) else v for v in metrics.table()]
[0.68, 263.82, 10.57, -7.57, 0.15]
Source code in soundscapy/surveys/processing.py
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
def ssm_cosine_fit(
    y: pd.Series,
    angles: Tuple[int, ...] = EQUAL_ANGLES,
    bounds: Tuple[List[float], List[float]] = (
        [0, 0, 0, -np.inf],
        [np.inf, 360, np.inf, np.inf],
    ),
) -> SSMMetrics:
    """
    Fit a cosine model to the PAQ data for SSM analysis.

    Parameters
    ----------
    y : pd.Series
        Series of PAQ values
    angles : Tuple[int, ...], optional
        Angles for each PAQ in degrees, by default EQUAL_ANGLES
    bounds : Tuple[List[float], List[float]], optional
        Bounds for the optimization parameters, by default ([0, 0, 0, -np.inf], [np.inf, 360, np.inf, np.inf])

    Returns
    -------
    SSMMetrics
        Calculated SSM metrics

    Examples
    --------
    >>> # xdoctest: +SKIP
    >>> import pandas as pd
    >>> y = pd.Series([4, 3, 2, 1, 5, 3, 4, 2])
    >>> metrics = ssm_cosine_fit(y)
    >>> [round(v, 2) if isinstance(v, float) else v for v in metrics.table()]
    [0.68, 263.82, 10.57, -7.57, 0.15]
    """
    warnings.warn(
        "This function is not yet fully implemented. See https://github.com/MitchellAcoustics/circumplex for a more complete implementation.",
        PendingDeprecationWarning,
    )

    def cosine_model(theta, amp, delta, elev, dev):
        return elev + amp * np.cos(np.radians(theta - delta)) + dev

    param, _ = optimize.curve_fit(
        cosine_model,
        xdata=angles,
        ydata=y,
        bounds=bounds,
    )
    amp, delta, elev, dev = param
    r_squared = _r2_score(y, cosine_model(angles, *param))

    return SSMMetrics(
        amplitude=amp,
        angle=delta,
        elevation=elev,
        displacement=dev,
        r_squared=r_squared,
    )

ssm_metrics

ssm_metrics(df, paq_cols=PAQ_IDS, method='cosine', val_range=(5, 1), angles=EQUAL_ANGLES)

Calculate the Structural Summary Method (SSM) metrics for each response.

PARAMETER DESCRIPTION
df

DataFrame containing PAQ data

TYPE: DataFrame

paq_cols

List of PAQ column names, by default PAQ_IDS

TYPE: List[str] DEFAULT: PAQ_IDS

method

Method to calculate SSM metrics, either "cosine" or "polar", by default "cosine"

TYPE: str DEFAULT: 'cosine'

val_range

Range of values for PAQ responses, by default (5, 1)

TYPE: Tuple[int, int] DEFAULT: (5, 1)

angles

Angles for each PAQ in degrees, by default EQUAL_ANGLES

TYPE: Tuple[int, ...] DEFAULT: EQUAL_ANGLES

RETURNS DESCRIPTION
DataFrame

DataFrame containing the SSM metrics

RAISES DESCRIPTION
ValueError

If PAQ columns are not present in the DataFrame or if an invalid method is specified

Examples:

>>> # xdoctest: +SKIP
>>> import pandas as pd
>>> df = pd.DataFrame({
...     'PAQ1': [4, 2], 'PAQ2': [3, 5], 'PAQ3': [2, 4], 'PAQ4': [1, 3],
...     'PAQ5': [5, 1], 'PAQ6': [3, 2], 'PAQ7': [4, 3], 'PAQ8': [2, 5]
... })
>>> ssm_metrics(df).round(2)
   amplitude   angle  elevation  displacement  r_squared
0       0.68  263.82      10.57         -7.57       0.15
1       1.21   20.63       0.01          3.11       0.39
Source code in soundscapy/surveys/processing.py
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
def ssm_metrics(
    df: pd.DataFrame,
    paq_cols: List[str] = PAQ_IDS,
    method: str = "cosine",
    val_range: Tuple[int, int] = (5, 1),
    angles: Tuple[int, ...] = EQUAL_ANGLES,
) -> pd.DataFrame:
    """
    Calculate the Structural Summary Method (SSM) metrics for each response.

    Parameters
    ----------
    df : pd.DataFrame
        DataFrame containing PAQ data
    paq_cols : List[str], optional
        List of PAQ column names, by default PAQ_IDS
    method : str, optional
        Method to calculate SSM metrics, either "cosine" or "polar", by default "cosine"
    val_range : Tuple[int, int], optional
        Range of values for PAQ responses, by default (5, 1)
    angles : Tuple[int, ...], optional
        Angles for each PAQ in degrees, by default EQUAL_ANGLES

    Returns
    -------
    pd.DataFrame
        DataFrame containing the SSM metrics

    Raises
    ------
    ValueError
        If PAQ columns are not present in the DataFrame or if an invalid method is specified

    Examples
    --------
    >>> # xdoctest: +SKIP
    >>> import pandas as pd
    >>> df = pd.DataFrame({
    ...     'PAQ1': [4, 2], 'PAQ2': [3, 5], 'PAQ3': [2, 4], 'PAQ4': [1, 3],
    ...     'PAQ5': [5, 1], 'PAQ6': [3, 2], 'PAQ7': [4, 3], 'PAQ8': [2, 5]
    ... })
    >>> ssm_metrics(df).round(2)
       amplitude   angle  elevation  displacement  r_squared
    0       0.68  263.82      10.57         -7.57       0.15
    1       1.21   20.63       0.01          3.11       0.39
    """
    # TODO: Replace with a call to circumplex package
    warnings.warn(
        "This function is not yet fully implemented. See https://github.com/MitchellAcoustics/circumplex for a more complete implementation.",
        PendingDeprecationWarning,
    )

    if not set(paq_cols).issubset(df.columns):
        raise ValueError("PAQ columns are not present in the DataFrame")

    if method == "polar":
        iso_pleasant, iso_eventful = calculate_iso_coords(
            df[paq_cols], val_range, angles
        )
        r, theta = _convert_to_polar_coords(iso_pleasant, iso_eventful)
        mean = df[paq_cols].mean(axis=1)
        mean = mean / (max(val_range) - min(val_range)) if val_range != (0, 1) else mean

        return pd.DataFrame(
            {
                "amplitude": r,
                "angle": theta,
                "elevation": mean,
                "displacement": 0,  # Displacement is always 0 for polar method
                "r_squared": 1,  # R-squared is always 1 for polar method
            }
        )
    elif method == "cosine":
        return df[paq_cols].apply(
            lambda y: ssm_cosine_fit(y, angles).table(),
            axis=1,
            result_type="expand",
        )
    else:
        raise ValueError("Method must be either 'polar' or 'cosine'")

Core utility functions for processing soundscape survey data.

This module contains fundamental functions and constants used across the soundscapy package for handling and analyzing soundscape survey data.

PAQ

PAQ(label, id)

Bases: Enum

Enumeration of Perceptual Attribute Questions (PAQ) names and IDs.

Source code in soundscapy/surveys/survey_utils.py
27
28
29
def __init__(self, label: str, id: str):
    self.label = label
    self.id = id

mean_responses

mean_responses(df, group)

Calculate the mean responses for each PAQ group.

PARAMETER DESCRIPTION
df

Input DataFrame containing PAQ data.

TYPE: DataFrame

group

Column name to group by.

TYPE: str

RETURNS DESCRIPTION
DataFrame

DataFrame with mean responses for each PAQ group.

Source code in soundscapy/surveys/survey_utils.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def mean_responses(df: pd.DataFrame, group: str) -> pd.DataFrame:
    """
    Calculate the mean responses for each PAQ group.

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame containing PAQ data.
    group : str
        Column name to group by.

    Returns
    -------
    pd.DataFrame
        DataFrame with mean responses for each PAQ group.

    """
    df = return_paqs(df, incl_ids=False, other_cols=[group])
    return df.groupby(group).mean().reset_index()

rename_paqs

rename_paqs(df, paq_aliases=None)

Rename the PAQ columns in a DataFrame to standard PAQ IDs.

PARAMETER DESCRIPTION
df

Input DataFrame containing PAQ data.

TYPE: DataFrame

paq_aliases

Specify which PAQs are to be renamed. If None, will check if the column names are in pre-defined options. If a tuple, the order must match PAQ_IDS. If a dict, keys are current names and values are desired PAQ IDs.

TYPE: Union[Tuple, Dict] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

DataFrame with renamed PAQ columns.

RAISES DESCRIPTION
ValueError

If paq_aliases is not a tuple, list, or dictionary.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'pleasant': [4, 3],
...     'vibrant': [2, 5],
...     'other_col': [1, 2]
... })
>>> rename_paqs(df)
   PAQ1  PAQ2  other_col
0     4     2          1
1     3     5          2
>>> df_custom = pd.DataFrame({
...     'pl': [4, 3],
...     'vb': [2, 5],
... })
>>> rename_paqs(df_custom, paq_aliases={'pl': 'PAQ1', 'vb': 'PAQ2'})
   PAQ1  PAQ2
0     4     2
1     3     5
Source code in soundscapy/surveys/survey_utils.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def rename_paqs(
    df: pd.DataFrame, paq_aliases: Union[Tuple, Dict] = None
) -> pd.DataFrame:
    """
    Rename the PAQ columns in a DataFrame to standard PAQ IDs.

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame containing PAQ data.
    paq_aliases : Union[Tuple, Dict], optional
        Specify which PAQs are to be renamed. If None, will check if the column names
        are in pre-defined options. If a tuple, the order must match PAQ_IDS.
        If a dict, keys are current names and values are desired PAQ IDs.

    Returns
    -------
    pd.DataFrame
        DataFrame with renamed PAQ columns.

    Raises
    ------
    ValueError
        If paq_aliases is not a tuple, list, or dictionary.

    Examples
    --------
    >>> import pandas as pd
    >>> df = pd.DataFrame({
    ...     'pleasant': [4, 3],
    ...     'vibrant': [2, 5],
    ...     'other_col': [1, 2]
    ... })
    >>> rename_paqs(df)
       PAQ1  PAQ2  other_col
    0     4     2          1
    1     3     5          2
    >>> df_custom = pd.DataFrame({
    ...     'pl': [4, 3],
    ...     'vb': [2, 5],
    ... })
    >>> rename_paqs(df_custom, paq_aliases={'pl': 'PAQ1', 'vb': 'PAQ2'})
       PAQ1  PAQ2
    0     4     2
    1     3     5
    """
    if paq_aliases is None:
        if any(paq_id in df.columns for paq_id in PAQ_IDS):
            logger.info("PAQs already correctly named.")
            return df
        if any(paq_name in df.columns for paq_name in PAQ_LABELS):
            paq_aliases = PAQ_LABELS

    if isinstance(paq_aliases, (list, tuple)):
        rename_dict = dict(zip(paq_aliases, PAQ_IDS))
    elif isinstance(paq_aliases, dict):
        rename_dict = paq_aliases
    else:
        raise ValueError("paq_aliases must be a tuple, list, or dictionary.")

    logger.debug(f"Renaming PAQs with the following mapping: {rename_dict}")
    return df.rename(columns=rename_dict)

return_paqs

return_paqs(df, incl_ids=True, other_cols=None)

Return only the PAQ columns from a DataFrame.

PARAMETER DESCRIPTION
df

Input DataFrame containing PAQ data.

TYPE: DataFrame

incl_ids

Whether to include ID columns (RecordID, GroupID, etc.), by default True.

TYPE: bool DEFAULT: True

other_cols

Other columns to include in the output, by default None.

TYPE: List[str] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

DataFrame containing only the PAQ columns and optionally ID and other specified columns.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'RecordID': [1, 2],
...     'PAQ1': [4, 3],
...     'PAQ2': [2, 5],
...     'PAQ3': [1, 2],
...     'PAQ4': [3, 4],
...     'PAQ5': [5, 1],
...     'PAQ6': [2, 3],
...     'PAQ7': [4, 5],
...     'PAQ8': [1, 2],
...     'OtherCol': ['A', 'B']
... })
>>> return_paqs(df)
   RecordID  PAQ1  PAQ2  PAQ3  PAQ4  PAQ5  PAQ6  PAQ7  PAQ8
0         1     4     2     1     3     5     2     4     1
1         2     3     5     2     4     1     3     5     2
>>> return_paqs(df, incl_ids=False, other_cols=['OtherCol'])
   PAQ1  PAQ2  PAQ3  PAQ4  PAQ5  PAQ6  PAQ7  PAQ8 OtherCol
0     4     2     1     3     5     2     4     1        A
1     3     5     2     4     1     3     5     2        B
Source code in soundscapy/surveys/survey_utils.py
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def return_paqs(
    df: pd.DataFrame, incl_ids: bool = True, other_cols: List[str] = None
) -> pd.DataFrame:
    """
    Return only the PAQ columns from a DataFrame.

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame containing PAQ data.
    incl_ids : bool, optional
        Whether to include ID columns (RecordID, GroupID, etc.), by default True.
    other_cols : List[str], optional
        Other columns to include in the output, by default None.

    Returns
    -------
    pd.DataFrame
        DataFrame containing only the PAQ columns and optionally ID and other specified columns.

    Examples
    --------
    >>> import pandas as pd
    >>> df = pd.DataFrame({
    ...     'RecordID': [1, 2],
    ...     'PAQ1': [4, 3],
    ...     'PAQ2': [2, 5],
    ...     'PAQ3': [1, 2],
    ...     'PAQ4': [3, 4],
    ...     'PAQ5': [5, 1],
    ...     'PAQ6': [2, 3],
    ...     'PAQ7': [4, 5],
    ...     'PAQ8': [1, 2],
    ...     'OtherCol': ['A', 'B']
    ... })
    >>> return_paqs(df)
       RecordID  PAQ1  PAQ2  PAQ3  PAQ4  PAQ5  PAQ6  PAQ7  PAQ8
    0         1     4     2     1     3     5     2     4     1
    1         2     3     5     2     4     1     3     5     2
    >>> return_paqs(df, incl_ids=False, other_cols=['OtherCol'])
       PAQ1  PAQ2  PAQ3  PAQ4  PAQ5  PAQ6  PAQ7  PAQ8 OtherCol
    0     4     2     1     3     5     2     4     1        A
    1     3     5     2     4     1     3     5     2        B
    """
    cols = PAQ_IDS.copy()

    if incl_ids:
        id_cols = [
            name
            for name in ["RecordID", "GroupID", "SessionID", "LocationID"]
            if name in df.columns
        ]
        cols = id_cols + cols

    if other_cols:
        cols.extend(other_cols)

    logger.debug(f"Returning PAQ columns: {cols}")
    return df[cols]