Using Soundscapy for Binaural Recording Analysis¶

In v0.3.2, the ability to perform a huge suite of (psycho)acoustic analyses has been added to Soundscapy. This has been optimised for performing batch processing of many recordings, ease of use, and reproducibility. To do this, we rely on three packages to provide the analysis functions:

Python Acoustics (acoustics) : Python Acoustics is a library aimed at acousticians. It provides two huge benefits - first, all of the analysis functions are referenced directly to the relevant standard. Second, we subclass their Signal class to provide the Binaural functionality and any function available within the Signal class is also available to Soundscapy's Binaural class.
scikit-maad (maad) : scikit-maad is a modular toolbox for quantitiative soundscape analysis, focussed on ecological soundscapes and bioacoustic indices. scikit-maad provides a huge suite of ecosoundscape focussed indices, including Acoustic Richness Index, Acoustic Complexity Index, Normalized Difference Soundscape Index (NDSI), and more.
MoSQITo (mosqito): MoSQITo is a modular framework of key sound quality metrics, providing the psychoacoustic metrics.

The metrics currently available are:

Python Acoustics : $L_{Zeq}$, $L_{Aeq}$, $L_{Ceq}$, SEL, and all associated statistics ($L_5$ through $L_{95}$, $L_{max}$ and $L_{min}$, as well as kurtosis and skewness.
scikit-maad : So far we have only implemented the combined all_temporal_alpha_indices and all_spectral_alpha_indices from scikit-maad; calculating them individually is not yet supported. all_temporal_alpha_indices comprises 16 temporal domain acoustic indices, such as temporal signal-to-noise-ratio, temporal entropy, temporal events. all_spectral_alpha_indices comprises 19 spectral domain acoustic indices, such as Bioacoustic Index, Acoustic Diversity Index, NDSI, Acoustic Evenness Index.
MoSQITo :
- Loudness (Zwicker time varying),
- Sharpness (time varying, from loudness, and per segment with DIN, Aures, Bismarck, and Fastl weightings),
- Roughness (Daniel and Weber, 1997).

Soundscapy combines all of these metrics and makes it easy and (relatively) fast to compute any or all of them for a binaural audio recording. These results have been preliminarily confirmed through comparison of results obtained from Head Acoustics ArtemiS suite on a set of real-world recordings.

Consistent Analysis Settings¶

A primary goal when developing this library was to make it easy to save and document the settings used for all analyses. This is done through a settings.yaml file and the AnalysisSettings class. Although the settings for each metric can be set at runtime, the settings.yaml file allows you to set all of the settings at once and document exactly what settings were passed to each analysis function and to share these settings with collaborators or reviewers.

Batch processing¶

The other primary goal was to make it simple and fast to perform this analysis on many recordings. One aspect of this is unifying the outputs from the underlying libraries and presenting them in an easy to parse format. The analysis functions from Soundscapy can return a MultiIndex pandas DataFrame with the Recording filename and Left and Right channels in the index and a column for each metric calculated. This dataframe can then be easily saved to a .csv or Excel spreadsheet. Alternatively, a dictionary can be returned for further processing within Python. The key point is that after calculating 100+ metrics for 1,000+ recordings, you'll be left with a single tidy spreadsheet.

The Soundscape Indices (SSID) project for which this was developed has over 3,000 recordings for which we needed to calculate a full suite of metrics for both channels. In particular, the MoSQITo functions can be quite slow, so running each recording one at a time can be prohibitively slow and only utilize a small portion of the available computing power. To help with this, a set of simple functions is provided to enable parallel processing, such that multiple recordings can be processed simultaneously by a multi-core CPU. In our initial tests on a 16-core AMD Ryzen 7 4800HS CPU (Fedora Linux 36), this increased the speed for processing 20 recordings by at least 8 times.

In testing, the MoSQITo functions are particularly slow, taking up to 3 minutes to calculate the Loudness for a 30s two-channel recording. When running only a single recording through, this has also been sped up by parallelizing the per-channel calculation, reducing the computation time to around 50s.

Getting Started¶

The basis of all of our analysis starts with the binaural signal, so we begin by importing the Binaural class. We'll also load up and examine our analysis settings. Throughout Soundscapy, we use pathlib.Path for defining filepaths.

In [1]:

Copied!





# Add soundscapy to the Python path
import sys
sys.path.append('../..')

# imports
from soundscapy import Binaural
from soundscapy import AnalysisSettings
from soundscapy.analysis.binaural import prep_multiindex_df, add_results, process_all_metrics
import json
from pathlib import Path
# Add soundscapy to the Python path
import sys
sys.path.append('../..')

# imports
from soundscapy import Binaural
from soundscapy import AnalysisSettings
from soundscapy.analysis.binaural import prep_multiindex_df, add_results, process_all_metrics
import json
from pathlib import Path

Set up where the data is located. In this case, we'll use the sample recordings located under the test folder.

In [2]:

Copied!

# May need to adjust for your system
wav_folder = Path().cwd().parent.parent.joinpath("test", "data")
# May need to adjust for your system
wav_folder = Path().cwd().parent.parent.joinpath("test", "data")

Ensuring that Soundscapy knows exactly how loud your recordings were onsite is crucial to getting correct answers. If you used equipment such as the Head Acoustics SqoBold, and were careful about how the recordings are exported to .wav, then they may already be correctly adjusted (as ours are here). However its best to be safe and calibrate each signal to their real-world dB level. To do this, we load in a .json that contains the per-channel correct dB $L_{eq}$ level.

In [3]:

Copied!

levels = wav_folder.joinpath("Levels.json")

with open(levels) as f:
    levels = json.load(f)

# Look at the first five sets of levels
list(levels.items())[:5]
levels = wav_folder.joinpath("Levels.json")

with open(levels) as f:
    levels = json.load(f)

# Look at the first five sets of levels
list(levels.items())[:5]

Out[3]:

[('CT101', {'Left': 79.0, 'Right': 79.72}),
 ('CT102', {'Left': 79.35, 'Right': 79.88}),
 ('CT103', {'Left': 76.25, 'Right': 76.41}),
 ('CT104', {'Left': 79.9, 'Right': 79.93}),
 ('CT107', {'Left': 78.21, 'Right': 78.47})]

Prepping the results dataframe¶

The easiest way to organise and add the new data as it is processed is to prepare a dataframe ahead of time. We've provided a small function to convert a dictionary of calibration levels (level) into the properly formatted dataframe.

In [4]:

Copied!

df = prep_multiindex_df(levels, incl_metric=True)
df.head()
df = prep_multiindex_df(levels, incl_metric=True)
df.head()

Out[4]:

		Leq
Recording	Channel
CT101	Left	79.00
CT101	Right	79.72
CT102	Left	79.35
CT102	Right	79.88
CT103	Left	76.25

Load in a Binaural recording¶

Load in a binaural wav signal. We can use the plot function provided by the acoustics.Signal super-class.

In [5]:

Copied!

binaural_wav = wav_folder.joinpath("CT101.wav")
b = Binaural.from_wav(binaural_wav)
b.plot();
binaural_wav = wav_folder.joinpath("CT101.wav")
b = Binaural.from_wav(binaural_wav)
b.plot();

No description has been provided for this image

To ensure that the dB level is correct, and therefore any other metrics are correct, we start by calibrating the signal to precalculated levels.

In [6]:

Copied!

decibel = (levels[b.recording]["Left"], levels[b.recording]["Right"])
print(f"Calibration levels: {decibel}")
b.calibrate_to(decibel, inplace=True)
decibel = (levels[b.recording]["Left"], levels[b.recording]["Right"])
print(f"Calibration levels: {decibel}")
b.calibrate_to(decibel, inplace=True)

Calibration levels: (79.0, 79.72)

Out[6]:

Signal([[-0.01080703 -0.0088621  -0.01002671 ... -0.0432281  -0.04708166
  -0.04911411]
 [-0.00745821  0.00049144  0.00590268 ... -0.00493154 -0.01020068
  -0.01875182]])

Now, check it by comparing it to what we already knew were the correct levels:

In [7]:

Copied!

print(f"Predefined levels: {levels[b.recording]}")
print(f"Calculated Levels: {b.pyacoustics_metric('Leq', statistics=['avg'], as_df=False)}")
print(f"Predefined levels: {levels[b.recording]}")
print(f"Calculated Levels: {b.pyacoustics_metric('Leq', statistics=['avg'], as_df=False)}")

Predefined levels: {'Left': 79.0, 'Right': 79.72}
Calculated Levels: {'Left': {'Leq': 79.00281378626991}, 'Right': {'Leq': 79.72243397536936}}

Calculating Acoustic Metrics¶

This brings us to how to calculate any of the many metrics available. Let's start simple with $L_{Aeq}$.

Python Acoustics¶

Since the $L_{Aeq}$ calc is provided by the Python Acoustics library, we'll be calling pyacoustic_metric. Then, we need to tell it what particular metric we want, what stats to calculate as well, what to label it, and what format to return the results in.

In [8]:

Copied!





metric = "LAeq"
stats = ("avg", 10, 50, 90, 95, "max")
label = "LAeq"
b.pyacoustics_metric(metric, stats, label, as_df=False)
metric = "LAeq"
stats = ("avg", 10, 50, 90, 95, "max")
label = "LAeq"
b.pyacoustics_metric(metric, stats, label, as_df=False)

Out[8]:

{'Left': {'LAeq': 68.87570321376752,
  'LAeq_10': 71.15434218749783,
  'LAeq_50': 68.11333930746,
  'LAeq_90': 63.375090914754985,
  'LAeq_95': 62.366533277575606,
  'LAeq_max': 77.38265089009883},
 'Right': {'LAeq': 69.95333344074137,
  'LAeq_10': 72.57815150741932,
  'LAeq_50': 68.49539857979003,
  'LAeq_90': 64.53305677178045,
  'LAeq_95': 63.09765925624846,
  'LAeq_max': 78.70878341389864}}

If we want, we can get the results back as a pandas DataFrame instead:

In [9]:

Copied!

b.pyacoustics_metric(metric, stats, label, as_df=True)
b.pyacoustics_metric(metric, stats, label, as_df=True)

Out[9]:

		LAeq	LAeq_10	LAeq_50	LAeq_90	LAeq_95	LAeq_max
Recording	Channel
CT101	Left	68.875703	71.154342	68.113339	63.375091	62.366533	77.382651
CT101	Right	69.953333	72.578152	68.495399	64.533057	63.097659	78.708783

And we can easily do the same for the C-weighting level:

In [10]:

Copied!

b.pyacoustics_metric("LCeq", stats, as_df=True)
b.pyacoustics_metric("LCeq", stats, as_df=True)

Out[10]:

		LCeq	LCeq_10	LCeq_50	LCeq_90	LCeq_95	LCeq_max
Recording	Channel
CT101	Left	77.245094	80.008421	76.256223	71.467371	70.627245	84.307204
CT101	Right	77.855473	80.633065	76.882287	72.463130	71.262000	84.806283

MoSQITo¶

MoSQITo is very exciting as it is one of the first completely free and open-source libraries for calculating psychoacoustic features. Let's try out calculating the psychoacoustic loudness.

We start by defining many of the same options, but with two new ones. The first is our func_args to pass to MoSQITo. Since our test recording was collected in a public park, we need to select the correct field type: free or diffuse, and pass that to MoSQITo.

The second new argument is parallel. This just tells Soundscapy whether to try to calculate the Left and Right channels simultaneously to speed up processing.

In [11]:

Copied!





metric = "loudness_zwtv"
stats = (5, 50, 'avg', 'max')
func_args = {
    'field_type': 'free'
}

b.mosqito_metric(metric, statistics=stats, as_df=True, parallel=True, verbose=True, func_args=func_args)
metric = "loudness_zwtv"
stats = (5, 50, 'avg', 'max')
func_args = {
    'field_type': 'free'
}

b.mosqito_metric(metric, statistics=stats, as_df=True, parallel=True, verbose=True, func_args=func_args)

 - Calculating MoSQITo metric: loudness_zwtv

Out[11]:

		N_5	N_50	N_avg	N_max
Recording	Channel
CT101	Left	28.834482	23.164299	22.669519	36.160815
CT101	Right	30.834215	23.939352	23.774119	37.762344

sharpness_din_from_loudness is a bit of a special case to keep in mind. It can drastically speed up the processing time since it calculates the Sharpness values from pre-calculated Loudness results. If you are planning to do both analyses, I highly suggest using it. Soundscapy will handle it behind the scenes to make sure it doesn't accidentally calculate the Loudness values twice if you've asked for both of them. Let's try it out.

In [12]:

Copied!

b.mosqito_metric("sharpness_din_from_loudness", stats, as_df=True, parallel=True, verbose=True, func_args=func_args)
b.mosqito_metric("sharpness_din_from_loudness", stats, as_df=True, parallel=True, verbose=True, func_args=func_args)

 - Calculating MoSQITo metrics: `sharpness_din` from `loudness_zwtv`

Out[12]:

		N_5	N_50	N_avg	N_max	S_5	S_50	S_avg	S_max
Recording	Channel
CT101	Left	28.834482	23.164299	22.669519	36.160815	1.339030	1.168987	1.181871	19.298406
CT101	Right	30.834215	23.939352	23.774119	37.762344	1.410216	1.183683	1.202704	18.598348

By default, the metrics will be calculated for both channels. But you may want only a single channel. This can be set with the channel option.

In [13]:

Copied!

b.pyacoustics_metric("LZeq", channel="Left")
b.pyacoustics_metric("LZeq", channel="Left")

Out[13]:

	LZeq	LZeq_5	LZeq_10	LZeq_50	LZeq_90	LZeq_95	LZeq_max	LZeq_min	LZeq_kurt	LZeq_skew
CT101	79.002814	83.065062	81.694714	77.969857	73.533225	72.283847	86.270916	70.194097	-0.088419	-0.056522

In [14]:

Copied!

b.maad_metric("all_spectral_alpha_indices", verbose=True)
b.maad_metric("all_spectral_alpha_indices", verbose=True)

 - Calculating scikit-maad all_spectral_alpha_indices

Out[14]:

		MEANf	VARf	SKEWf	KURTf	NBPEAKS	LEQf	ENRf	BGNf	SNRf	Hf	...	TFSD	H_Havrda	H_Renyi	H_pairedShannon	H_gamma	H_GiniSimpson	RAOQ	AGI	ROItotal	ROIcover
Recording	Channel
CT101	Left	0.001616	0.000050	9.903323	122.639052	48	77.288435	-15.711565	-19.025828	3.314263	0.33898	...	0.596465	0.306220	1.254553	2.981413	1004.342457	0.767638	0.012133	1.502837	38	1.794776
CT101	Right	0.001767	0.000057	9.775576	121.145848	25	77.862094	-15.137906	-18.647919	3.510013	0.34690	...	0.596308	0.306565	1.260965	3.032816	1054.904505	0.771196	0.014212	1.505844	23	0.940919

2 rows × 44 columns

In [32]:

Copied!

b.maad_metric("all_temporal_alpha_indices", verbose=True)
b.maad_metric("all_temporal_alpha_indices", verbose=True)

 - Calculating scikit-maad all_temporal_alpha_indices

Out[32]:

		ZCR	MEANt	VARt	SKEWt	KURTt	LEQt	BGNt	SNRt	MED	Ht	ACTtFraction	ACTtCount	ACTtMean	EVNtFraction	EVNtMean	EVNtCount
Recording	Channel
CT101	Left	791.173986	-0.000040	0.031794	-0.046533	4.070704	78.073263	-10.055829	1.367418	0.314202	0.960724	0.209797	621	5.108619	0.158108	0.044177	3.580179
CT101	Right	837.922297	-0.000043	0.037524	-0.060364	4.023849	78.793755	-9.282139	1.339588	0.343473	0.961327	0.206419	611	5.074564	0.153041	0.041655	3.675228

Defining Analysis Settings¶

Soundscapy provides the ability to predefine your analysis settings. These are defined in a separate .yaml file and are managed by Soundscapy using the AnalysisSettings class. These settings can then be passed to any of the analysis functions, rather than separately defining your settings as we did above. This will be particularly useful when performing our batch processing on an entire folder of wav recordings later.

Soundscapy provides a set of default settings which can be easily loaded in:

In [15]:

Copied!

analysis_settings = AnalysisSettings.default()
analysis_settings = AnalysisSettings.default()

However, in your own analysis you'll probably want to define your own options and load that in. We'll show how this is done using the `example_settings.yaml' file. First, let's take a look at how it's laid out:

# Settings file for batch acoustic analysis.
# Split up according to which library performs which analysis.

# Python Acoustics
# Supported metrics: LAeq, LZeq, LCeq, SEL
# Supported stats: avg/mean, max, min, kurt, skew, any integer from 1-99
PythonAcoustics:
    LAeq:
        run: true
        main: 'avg'
        statistics: [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew']
        channel: ["Left", "Right"]
        label: 'LAeq'
        func_args:
            time: 0.125
            method: "average"

    LZeq:
        run: true
        main: 'avg'
        statistics: [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew']
        channel: ["Left", "Right"]
        label: 'LZeq'
        func_args:
            time: 0.125
            method: "average"

# MoSQITo
# supported metrics: loudness_zwtv, sharpness_din_from_loudness, roughness_dw
# supported stats: avg/mean, max, min, kurt, skew, any integer from 1-99
MoSQITo:
    loudness_zwtv:
        run: true
        main: 5
        statistics: [10, 50, 90, 95, 'min', 'max', 'kurt', 'skew', 'avg']
        channel: ["Left", "Right"]
        label: "N"
        parallel: true
        func_args:
            field_type: "free"

The settings file is broken up according to the three libraries. Within these, we define separate options for each metric to calculate. The name of this metric should correspond exactly with what Soundscapy expects (in the case of PythonAcoustics) or what the underlying library calls its function.

Within each function, we then have a collection of settings that Soundscapy uses: * run : This tells Soundscapy whether or not to actually run this metric. This allows you to define and save the options you use for each metric without needing to run it. * main and statistics : These are the sub-statistics to calculate (e.g. $L_{5}$, $L_{90}$, etc.). main operates just like any of these, except you also have the option to return only the main statistic to simplify your results while still leaving your other preferences intact. * label : What label to assign that metric. For instance, Loudness is typically 'N'. When calculated, the statistics will be appended like so: N_5, N_10, N_avg, ... N_{stat} and this will be the column name for that metric. If you pass nothing here, then Soundscapy will fall back to the labels defined in sq_metrics.DEFAULT_LABELS. Warning: Some functions share a label (e.g. sharpness_din_tv and sharpness_din_perseg are both 'S'), if you run both of these and don't define different labels, one will overwrite the other.

Finally, there is an opportunity to define arguments to pass to the underlying function itself. This is perhaps the most important part for consistency and reproducibility. This is where you define which standard is being used, what time or frequency weighting, or what spectrogram bins to use. These options are defined by the 3 analysis libraries used and are not documented fully in Soundscapy. When the AnalysisSettings is parsed, func_args will be returned as a dict with an entry for each option you'd like to pass. func_args is then passed as **kwargs to e.g. the mosqito.sq_metrics.loudness_zwtv() function. If func_args contains an option the function doesn't recognise it will throw an error, so be careful when defining these arguments.

Let's try loading in the example_settings.yaml file and see how AnalysisSettings handles it.

In [17]:

Copied!





ex_settings = AnalysisSettings.from_yaml(Path("example_settings.yaml"))
ex_settings['scikit-maad']['all_temporal_alpha_indices']['run'] = True
ex_settings['scikit-maad']['all_spectral_alpha_indices']['run'] = True
ex_settings
ex_settings = AnalysisSettings.from_yaml(Path("example_settings.yaml"))
ex_settings['scikit-maad']['all_temporal_alpha_indices']['run'] = True
ex_settings['scikit-maad']['all_spectral_alpha_indices']['run'] = True
ex_settings

Out[17]:

{'PythonAcoustics': {'LAeq': {'run': True,
   'main': 'avg',
   'statistics': [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew'],
   'channel': ['Left', 'Right'],
   'label': 'LAeq',
   'func_args': {'time': 0.125, 'method': 'average'}},
  'LZeq': {'run': True,
   'main': 'avg',
   'statistics': [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew'],
   'channel': ['Left', 'Right'],
   'label': 'LZeq',
   'func_args': {'time': 0.125, 'method': 'average'}},
  'LCeq': {'run': True,
   'main': 'avg',
   'statistics': [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew'],
   'channel': ['Left', 'Right'],
   'label': 'LCeq',
   'func_args': {'time': 0.125, 'method': 'average'}},
  'SEL': {'run': True,
   'main': [],
   'statistics': [],
   'channel': ['Left', 'Right'],
   'label': 'SEL'}},
 'MoSQITo': {'loudness_zwtv': {'run': False,
   'main': 5,
   'statistics': [10, 50, 90, 95, 'min', 'max', 'kurt', 'skew', 'avg'],
   'channel': ['Left', 'Right'],
   'label': 'N',
   'parallel': True,
   'func_args': {'field_type': 'free'}},
  'sharpness_din_from_loudness': {'run': False,
   'main': 'avg',
   'statistics': [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew'],
   'channel': ['Left', 'Right'],
   'label': 'S',
   'parallel': True,
   'func_args': {'weighting': 'din', 'skip': 0.5, 'field_type': 'free'}},
  'sharpness_din_perseg': {'run': False,
   'main': 'avg',
   'statistics': [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew'],
   'channel': ['Left', 'Right'],
   'label': 'S_perseg',
   'parallel': False,
   'func_args': {'weighting': 'din', 'nperseg': 4096, 'field_type': 'free'}},
  'sharpness_din_tv': {'run': False,
   'main': 'avg',
   'statistics': [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew'],
   'channel': ['Left', 'Right'],
   'label': 'S_din_tv',
   'parallel': True,
   'func_args': {'weighting': 'din', 'field_type': 'free', 'skip': 0.5}},
  'roughness_dw': {'run': False,
   'main': 'avg',
   'statistics': [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew'],
   'channel': ['Left', 'Right'],
   'label': 'R',
   'parallel': True}},
 'scikit-maad': {'all_temporal_alpha_indices': {'run': True,
   'channel': ['Left', 'Right']},
  'all_spectral_alpha_indices': {'run': True, 'channel': ['Left', 'Right']}},
 'runtime': '2023-08-12 21:20:44'}

ex_settings is just a Python dict with some class methods added on. One of these is a function to parse the settings object for a specific library:

In [18]:

Copied!

ex_settings.parse_pyacoustics(metric="LAeq")
ex_settings.parse_pyacoustics(metric="LAeq")

Out[18]:

(True,
 ['Left', 'Right'],
 [5, 10, 50, 90, 95, 'min', 'max', 'kurt', 'skew', 'avg'],
 'LAeq',
 {'time': 0.125, 'method': 'average'})

This returns the value for run, channel, statistics, label, and func_args which will then be used by the pyacoustic_metric() function to calculate the $L_{Aeq}$ and its stats.

When passing your settings to an analysis function, it will start by automatically parsing and applying the settings for that particular metric. This will override any other settings passed to the function, so if you're using a settings file and you want to change anything, you should either change it in the .yaml and reload the settings. This also makes sure you keep a record of the settings for the last time you ran the analysis.

You can easily reload the settings .yaml after changing it:

In [19]:

Copied!

# ex_settings = ex_settings.reload()
# ex_settings = ex_settings.reload()

Running a single metric with predefined settings¶

Now, with our settings loaded, we process a recording using those settings:

In [20]:

Copied!

b.pyacoustics_metric("LAeq", analysis_settings=ex_settings)
b.pyacoustics_metric("LAeq", analysis_settings=ex_settings)

Out[20]:

		LAeq	LAeq_5	LAeq_10	LAeq_50	LAeq_90	LAeq_95	LAeq_min	LAeq_max	LAeq_kurt	LAeq_skew
Recording	Channel
CT101	Left	68.875703	72.257301	71.154342	68.113339	63.375091	62.366533	60.560166	77.382651	0.272011	-0.013877
CT101	Right	69.953333	73.623236	72.578152	68.495399	64.533057	63.097659	60.520566	78.708783	0.473515	0.140450

But this is just the start of what makes the analysis settings so useful.

Processing all the metrics at once, using predefined analysis settings.¶

Since we can define the settings for all the metrics, and specify which metrics we want to run, we can process all of our desired metrics at once. process_all_metrics() let's us do this with just a single line of code.

In [21]:

Copied!

b.process_all_metrics(ex_settings, verbose=True)
b.process_all_metrics(ex_settings, verbose=True)

Processing CT101
 - Calculating Python Acoustics metrics: LAeq
 - Calculating Python Acoustics metrics: LZeq
 - Calculating Python Acoustics metrics: LCeq
 - Calculating Python Acoustics metrics: SEL
 - Calculating scikit-maad all_temporal_alpha_indices
 - Calculating scikit-maad all_spectral_alpha_indices

Out[21]:

		LAeq	LAeq_5	LAeq_10	LAeq_50	LAeq_90	LAeq_95	LAeq_min	LAeq_max	LAeq_kurt	LAeq_skew	...	TFSD	H_Havrda	H_Renyi	H_pairedShannon	H_gamma	H_GiniSimpson	RAOQ	AGI	ROItotal	ROIcover
Recording	Channel
CT101	Left	68.875703	72.257301	71.154342	68.113339	63.375091	62.366533	60.560166	77.382651	0.272011	-0.013877	...	0.596465	0.306220	1.254553	2.981413	1004.342457	0.767638	0.012133	1.502837	38	1.794776
CT101	Right	69.953333	73.623236	72.578152	68.495399	64.533057	63.097659	60.520566	78.708783	0.473515	0.140450	...	0.596308	0.306565	1.260965	3.032816	1054.904505	0.771196	0.014212	1.505844	23	0.940919

2 rows × 91 columns

In this case we left out the MoSQITo metrics since they take a while, but let's say you do want to run those. You could either edit the example_settings.yaml file and use ex_settings.reload(), or what we'll do within this notebook is just directly edit the underlying dict.

In [22]:

Copied!

ex_settings['MoSQITo']["loudness_zwtv"]["run"] = True

b.process_all_metrics(ex_settings, verbose=True)
ex_settings['MoSQITo']["loudness_zwtv"]["run"] = True

b.process_all_metrics(ex_settings, verbose=True)

Processing CT101
 - Calculating Python Acoustics metrics: LAeq
 - Calculating Python Acoustics metrics: LZeq
 - Calculating Python Acoustics metrics: LCeq
 - Calculating Python Acoustics metrics: SEL
 - Calculating MoSQITo metric: loudness_zwtv
 - Calculating scikit-maad all_temporal_alpha_indices
 - Calculating scikit-maad all_spectral_alpha_indices

Out[22]:

		LAeq	LAeq_5	LAeq_10	LAeq_50	LAeq_90	LAeq_95	LAeq_min	LAeq_max	LAeq_kurt	LAeq_skew	...	TFSD	H_Havrda	H_Renyi	H_pairedShannon	H_gamma	H_GiniSimpson	RAOQ	AGI	ROItotal	ROIcover
Recording	Channel
CT101	Left	68.875703	72.257301	71.154342	68.113339	63.375091	62.366533	60.560166	77.382651	0.272011	-0.013877	...	0.596465	0.306220	1.254553	2.981413	1004.342457	0.767638	0.012133	1.502837	38	1.794776
CT101	Right	69.953333	73.623236	72.578152	68.495399	64.533057	63.097659	60.520566	78.708783	0.473515	0.140450	...	0.596308	0.306565	1.260965	3.032816	1054.904505	0.771196	0.014212	1.505844	23	0.940919

2 rows × 101 columns

And now we have all of the same metrics from before, along with the psychoacoustic Loudness.

Batch processing a bunch of recordings.¶

The final step is to run all of these metrics on a whole bunch of wav recordings all at once. In the Soundscape Indices (SSID) project for which this package was developed, we use this data to train machine learning models, so it's necessary to process a large number of recordings. Using the predefined analysis settings, we can loop through an entire folder, add it all to the results dataframe and come away with a single spreadsheet of all of the metrics needed.

To start, we'll demonstrate this using a normal for loop. The for loop will process one recording at a time and add it to our results dataframe. For the demonstration, we'll reload all of the variables needed and use the default analysis settings to run all of the metrics and time it.

In [31]:

Copied!





import time # Just for timing
from tqdm import tqdm

wav_folder = Path().cwd().parent.parent.joinpath("test", "data")
levels = wav_folder.joinpath("Levels.json")
with open(levels) as f:
    levels = json.load(f)
df = prep_multiindex_df(levels, incl_metric=False)

analysis_settings = AnalysisSettings.default()
analysis_settings['scikit-maad']['all_temporal_alpha_indices']['run'] = True
analysis_settings['scikit-maad']['all_spectral_alpha_indices']['run'] = True
# analysis_settings["MoSQITo"]["sharpness_din_from_loudness"]["run"] = False
# analysis_settings["MoSQITo"]["sharpness_din_perseg"]["run"] = False
# analysis_settings["MoSQITo"]["roughness_dw"]["run"] = False

begin = time.perf_counter() # Start timer

# Loop through each wav file in the folder
for wav in (pbar := tqdm(list(wav_folder.glob("*.wav")))):
    recording = wav.stem
    pbar.set_description(f"Processing {recording}")
    decibel = tuple(levels[recording].values())
    b = Binaural.from_wav(wav, calibrate_to=decibel)
    df = add_results(df, b.process_all_metrics(analysis_settings, verbose=False, parallel=True)) # Process all metrics and add to results df

end = time.perf_counter() # Stop timer
import time # Just for timing
from tqdm import tqdm

wav_folder = Path().cwd().parent.parent.joinpath("test", "data")
levels = wav_folder.joinpath("Levels.json")
with open(levels) as f:
    levels = json.load(f)
df = prep_multiindex_df(levels, incl_metric=False)

analysis_settings = AnalysisSettings.default()
analysis_settings['scikit-maad']['all_temporal_alpha_indices']['run'] = True
analysis_settings['scikit-maad']['all_spectral_alpha_indices']['run'] = True
# analysis_settings["MoSQITo"]["sharpness_din_from_loudness"]["run"] = False
# analysis_settings["MoSQITo"]["sharpness_din_perseg"]["run"] = False
# analysis_settings["MoSQITo"]["roughness_dw"]["run"] = False

begin = time.perf_counter() # Start timer

# Loop through each wav file in the folder
for wav in (pbar := tqdm(list(wav_folder.glob("*.wav")))):
    recording = wav.stem
    pbar.set_description(f"Processing {recording}")
    decibel = tuple(levels[recording].values())
    b = Binaural.from_wav(wav, calibrate_to=decibel)
    df = add_results(df, b.process_all_metrics(analysis_settings, verbose=False, parallel=True)) # Process all metrics and add to results df

end = time.perf_counter() # Stop timer

Processing CT110: 100%|██████████| 8/8 [11:08<00:00, 83.61s/it]

In [33]:

Copied!

print("Run time on 16 core AMD Ryzen 7 4800HS (Windows 11): 1781.15 seconds (29.69 minutes)")

print("Run on M2 Max:")
print(f"Time taken (using for loop): {end-begin: .2f} seconds ({(end-begin)/60: .2f} minutes)")
df
print("Run time on 16 core AMD Ryzen 7 4800HS (Windows 11): 1781.15 seconds (29.69 minutes)")

print("Run on M2 Max:")
print(f"Time taken (using for loop): {end-begin: .2f} seconds ({(end-begin)/60: .2f} minutes)")
df

Run time on 16 core AMD Ryzen 7 4800HS (Windows 11): 1781.15 seconds (29.69 minutes)
Run on M2 Max:
Time taken (using for loop):  668.88 seconds ( 11.15 minutes)

Out[33]:

		LAeq	LAeq_5	LAeq_10	LAeq_50	LAeq_90	LAeq_95	LAeq_min	LAeq_max	LAeq_kurt	LAeq_skew	...	R_5	R_10	R_50	R_90	R_95	R_min	R_max	R_kurt	R_skew	R_avg
Recording	Channel
CT101	Left	68.875703	72.257301	71.154342	68.113339	63.375091	62.366533	60.560166	77.382651	0.272011	-0.013877	...	0.090225	0.077212	0.043472	0.024918	0.021059	0.013380	0.250412	14.534464	2.680298	0.048825
CT101	Right	69.953333	73.623236	72.578152	68.495399	64.533057	63.097659	60.520566	78.708783	0.473515	0.140450	...	0.100709	0.084377	0.049555	0.029657	0.024672	0.014081	0.777253	101.135194	8.879461	0.057800
CT102	Left	70.613447	74.541317	73.322597	69.297264	65.074481	64.561974	63.337182	78.922344	0.270694	0.530142	...	0.095503	0.079576	0.045079	0.025084	0.020123	0.007751	0.647052	110.975758	8.886921	0.051998
CT102	Right	70.491840	75.681356	73.055797	69.131814	64.975452	64.209771	63.114758	81.634943	0.648596	0.601576	...	0.121427	0.085234	0.045587	0.026302	0.021685	0.014702	0.497556	28.406228	4.749014	0.058000
CT103	Left	66.330006	69.010487	68.241612	65.579475	63.680847	63.194765	62.103353	74.234566	2.236365	1.127518	...	0.099553	0.082312	0.043925	0.023160	0.020682	0.011923	0.729705	85.064634	8.251089	0.055371
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
SM518	Right	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM519	Left	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM519	Right	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM520	Left	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM520	Right	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

2904 rows × 71 columns

As we can see, this took quite a long time. The problem is that the MoSQITo metrics can take up to a minute for each channel of each recording and the for loop only processes one recording at a time. Since most modern computers are multicore, this leaves a ton of processing power unused. On my machine, I have 16 cores and at any one time during this process, only two of those cores are running at 100%.

In order to take full advantage of the other cores, we need to tell the computer to process multiple recordings at once and add them all together later. This is called parallel processing and could theoretically speed up the analysis by 8x (usually we don't get this full speed up though). The reason two cores are running in parallel above is because we're already running both channels at the same time for the MoSQITo metrics.

To do this, we provide a function called parallel_process() which takes the path to your wav folder as an argument, then performs our full processing on multiple recordings in parallel and returns the full result dataframe at the end.

Note: Don't worry about the status updates looking a bit jumbled - that's what happens with parallel processing.

In [25]:

Copied!





import time
from soundscapy.analysis.parallel_processing import parallel_process

# Redefine path etc. just for the example
wav_folder = Path().cwd().parent.parent.joinpath("test", "data")
levels = wav_folder.joinpath("Levels.json")
with open(levels) as f:
    levels = json.load(f)
df = prep_multiindex_df(levels, incl_metric=False)

analysis_settings = AnalysisSettings.default()
analysis_settings['scikit-maad']['all_temporal_alpha_indices']['run'] = True
analysis_settings['scikit-maad']['all_spectral_alpha_indices']['run'] = True
# analysis_settings["MoSQITo"]["sharpness_din_from_loudness"]["run"] = False
# analysis_settings["MoSQITo"]["sharpness_din_perseg"]["run"] = False
# analysis_settings["MoSQITo"]["roughness_dw"]["run"] = False

start = time.perf_counter() # Start timer

df = parallel_process(
    wav_folder.glob("*.wav"), df, levels, analysis_settings, verbose=False
)

stop = time.perf_counter()
import time
from soundscapy.analysis.parallel_processing import parallel_process

# Redefine path etc. just for the example
wav_folder = Path().cwd().parent.parent.joinpath("test", "data")
levels = wav_folder.joinpath("Levels.json")
with open(levels) as f:
    levels = json.load(f)
df = prep_multiindex_df(levels, incl_metric=False)

analysis_settings = AnalysisSettings.default()
analysis_settings['scikit-maad']['all_temporal_alpha_indices']['run'] = True
analysis_settings['scikit-maad']['all_spectral_alpha_indices']['run'] = True
# analysis_settings["MoSQITo"]["sharpness_din_from_loudness"]["run"] = False
# analysis_settings["MoSQITo"]["sharpness_din_perseg"]["run"] = False
# analysis_settings["MoSQITo"]["roughness_dw"]["run"] = False

start = time.perf_counter() # Start timer

df = parallel_process(
    wav_folder.glob("*.wav"), df, levels, analysis_settings, verbose=False
)

stop = time.perf_counter()

Processing files:   0%|          | 0/8 [00:00<?, ?it/s]

Processing CT108
Processing CT107
Processing CT101
Processing CT110
Processing CT109
Processing CT102
Processing CT104
Processing CT103

In [26]:

Copied!

print("Previous run time (16 core AMD Ryzen 7 4800HS, Windows 11): 681.61 seconds (11.36 minutes)")

print("Run on M2 Max:")
print(f"Time taken: {stop-start:.2f} seconds ({(stop-start)/60: .2f} minutes)")
df
print("Previous run time (16 core AMD Ryzen 7 4800HS, Windows 11): 681.61 seconds (11.36 minutes)")

print("Run on M2 Max:")
print(f"Time taken: {stop-start:.2f} seconds ({(stop-start)/60: .2f} minutes)")
df

Previous run time (16 core AMD Ryzen 7 4800HS, Windows 11): 681.61 seconds (11.36 minutes)
Run on M2 Max:
Time taken: 186.33 seconds ( 3.11 minutes)

Out[26]:

		LAeq	LAeq_5	LAeq_10	LAeq_50	LAeq_90	LAeq_95	LAeq_min	LAeq_max	LAeq_kurt	LAeq_skew	...	TFSD	H_Havrda	H_Renyi	H_pairedShannon	H_gamma	H_GiniSimpson	RAOQ	AGI	ROItotal	ROIcover
Recording	Channel
CT101	Left	68.875703	72.257301	71.154342	68.113339	63.375091	62.366533	60.560166	77.382651	0.272011	-0.013877	...	0.596465	0.306220	1.254553	2.981413	1004.342457	0.767638	0.012133	1.502837	38.0	1.794776
CT101	Right	69.953333	73.623236	72.578152	68.495399	64.533057	63.097659	60.520566	78.708783	0.473515	0.140450	...	0.596308	0.306565	1.260965	3.032816	1054.904505	0.771196	0.014212	1.505844	23.0	0.940919
CT102	Left	70.613447	74.541317	73.322597	69.297264	65.074481	64.561974	63.337182	78.922344	0.270694	0.530142	...	0.601337	0.315584	1.466408	3.338958	1570.553167	0.817882	0.018555	1.649930	2.0	3.491107
CT102	Right	70.491840	75.681356	73.055797	69.131814	64.975452	64.209771	63.114758	81.634943	0.648596	0.601576	...	0.594765	0.314666	1.441181	3.266859	1171.085166	0.813338	0.012127	1.497617	50.0	2.569614
CT103	Left	66.330006	69.010487	68.241612	65.579475	63.680847	63.194765	62.103353	74.234566	2.236365	1.127518	...	0.591105	0.309768	1.324691	3.109321	1062.343059	0.785795	0.013687	1.484989	117.0	3.370993
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
SM518	Right	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM519	Left	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM519	Right	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM520	Left	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
SM520	Right	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

2904 rows × 131 columns

That's three times as fast!

Then save it if you want to.

In [26]:

Copied!

# from datetime import datetime
# df.to_excel(wav_folder.joinpath("test", f"ParallelTest_{datetime.today().strftime('%Y-%m-%d')}.xlsx"))
# from datetime import datetime
# df.to_excel(wav_folder.joinpath("test", f"ParallelTest_{datetime.today().strftime('%Y-%m-%d')}.xlsx"))

In [ ]: