Introduction
The content discusses the importance of wildlife conservation in maintaining ecosystem balance and supporting human survival. Despite efforts like wildlife sanctuaries and legal protections, activities such as poaching continue to threaten endangered species. In India, organizations like the Wildlife Protection Society have documented cases of tiger and leopard poaching. Industrialization and deforestation further endanger wildlife, prompting predictions of a significant disparity between human and animal populations by 2050. The use of modern technology, specifically machine learning, is proposed as a tool to monitor and protect wildlife more effectively. Systems incorporating sound sensors and machine learning algorithms aim to differentiate between animal and human activity in wildlife habitats, offering real-time monitoring and data analysis for conservation purposes.
Research Summary
SOME FACTS ABOUT ELEPHANT RUMBLES:
1. Infrasonic Calls:
2. Identified Infrasonic Calls at Amboseli National Park:
THE EXISTING TECHNOLOGY FOR SPEECH-TO-TEXT:
Speech-to-text (STT) technology, also known as automatic speech recognition (ASR), converts spoken language into written text. Machine learning plays a crucial role in this process. Here's an overview of how speech-to-text works and how machine learning is used:
Point to be noted:
A modified approach of speech-to-text is being used to decode the rumbles made by elephants.
Many of the techniques used in speech-to-text (STT) can be adapted for other audio recognition tasks, including analyzing low-frequency elephant rumbles. However, there are some important considerations and adaptations needed:
Traditional MFCC, Long-term Spectral Features And Modulation Spectra
import librosa as librosa
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import librosa.display
from IPython.display import Audio
import pandas as pd
import os
from sklearn.model_selection import train_test_split
import skimage.io
y, sr = librosa.load('/content/begging1.mp3', sr=32000)
# y, sr = librosa.load('/content/Elephant calls for companions.wav', sr=32000)
librosa.display.waveshow(y, sr= sr, x_axis='s')
print("The sampled audio is returned as a numpy array (time series) and has ", y.shape, " number of samples")
print("The 10 randomly picked consecutive samples of the audio are: ", y[3000:3010])
The sampled audio is returned as a numpy array (time series) and has (164676,) number of samples The 10 randomly picked consequitive samples of the audio are: [-0.00036051 0.0018779 -0.00192935 0.00093846 -0.00352445 -0.01220394 -0.01800036 -0.02024757 -0.0188582 -0.01652359]
To grasp the concept of a spectrogram, it's essential to comprehend what a spectrum entails. The spectrum refers to the collection of frequencies present in a specific signal, with the fundamental frequency being the lowest. Harmonics, which are frequencies that are integer multiples of the fundamental frequency, are also part of the spectrum. As signals, especially non-periodic ones, exhibit changes in their spectrum over time, a common approach is to analyze small fixed sections of the signal sequentially. This process, known as Short Time Fourier Transform (STFT), involves dividing the sampled signal into equal segments and performing Fourier Transform on each segment individually. These spectra are then stacked together to form the spectrogram, represented as a matrix.
It's important to note that the Fourier Transform is employed to determine the spectrum of a signal in the time domain. In STFT, the signal is divided into equal parts, and the Fourier Transform is applied to each part separately. Consequently, when conducting STFT on a signal, the window size, or the number of samples considered at a time, needs to be specified.
For further understanding, the provided sources delve into essential concepts such as spectrum, windowing, and Short Time Fourier Transform (STFT).
• https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fwww.phon.ucl.ac.uk%2Fcourses%2Fspsci%2Facoustics%2Fweek1-10.pdf
• https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fdownload.ni.com%2Fevaluation%2Fpxi%2FUnderstanding%2520FFTs%2520and%2520Windowing.pdf
• https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Ftowardsdatascience.com%2Faudio-deep-learning-made-simple-part-1-state-of-the-art-techniques-da1d3dff2504
# Size of the Fast Fourier Transform (FFT), which will also be used as the window length
n_fft=1024
# Step or stride between windows. If the step is smaller than the window length, the windows will overlap
hop_length=320
# Specify the window type for FFT/STFT
window_type ='hann'
# Calculate the spectrogram as the square of the complex magnitude of the STFT
spectrogram_librosa = np.abs(librosa.stft(y, n_fft=n_fft, hop_length=hop_length, win_length=n_fft, window=window_type)) ** 2
Transform the spectrogram output to a logarithmic scale by transforming the amplitude to decibels and frequency to a mel scale
Mel Spectrogram
The Mel spectrogram utilizes a non-linear transformation of the frequency scale known as the mel scale, which is based on human perception of pitch. This scale ensures that two pairs of frequencies separated by a constant delta in the mel scale are perceived as equally distant by humans.
In machine learning applications involving speech and audio analysis, it is common to represent the power spectrogram using the mel scale. This is achieved by employing a bank of overlapping triangular filters, known as the mel filter bank, which calculates the energy of the spectrum within each frequency band.
The Mel spectrogram is characterized by its shape, which is determined by the number of mel bands and the frame size (half of the FFT components, denoted as n_fft). Specifically, its dimensions are [number of mel bands x (frame_size/2) + 1].
mel_bins = 64 # Number of mel bands
fmin = 0
fmax= None
Mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=n_fft, hop_length=hop_length, win_length=n_fft, window=window_type, n_mels = mel_bins, power=2.0)
print("The shape of mel spectrogram is: ", Mel_spectrogram.shape)
librosa.display.specshow(Mel_spectrogram, sr=sr, x_axis='time''mel',hop_length=hop_length)
plt.colorbar(format='%+2.0f dB')
plt.title('Mel spectrogram')
plt.tight_layout()
plt.show()
Move from power (mel) spectrum and apply log and move amplitude to a log scale (decibels). While doing so we will also normalize the spectrogram so that its maximum represents the 0 dB point.
https://stackoverflow.com/questions/52432731/store-the-spectrogram-as-image-in-python/52683474 - How to save the figure in the working directory.
https://stackoverflow.com/questions/56719138/how-can-i-save-a-librosa-spectrogram-plot-as-a-specific-sized-image/57204349#57204349 - if the desire is to save the data in the spectrogram (not the image itself)
import matplotlib.pyplot as plt
import numpy as np
# Calculate the aspect ratio for horizontal and vertical stretching
aspect_ratio_horizontal = mel_spectrogram_db.shape[1] / mel_spectrogram_db.shape[0]
aspect_ratio_vertical = mel_spectrogram_db.shape[0] / mel_spectrogram_db.shape[1]
# Limit the maximum size of the figure
max_fig_width = 20 # Maximum width in inches
max_fig_height = 8 # Maximum height in inches
max_aspect_ratio = max_fig_width / max_fig_height
# Adjust the aspect ratio if it exceeds the maximum
if aspect_ratio_horizontal > max_aspect_ratio:
aspect_ratio_horizontal = max_aspect_ratio
if aspect_ratio_vertical > max_aspect_ratio:
aspect_ratio_vertical = max_aspect_ratio
# Create a new figure with adjusted aspect ratio
fig, ax = plt.subplots(figsize=(max_fig_width, max_fig_height))
# Plot the Mel spectrogram
img = ax.imshow(mel_spectrogram_db, origin='lower', aspect='auto', cmap='viridis', extent=[0, mel_spectrogram_db.shape[1]*hop_length/sr, 0, mel_spectrogram_db.shape[0]])
# Create colorbar using the image object
plt.colorbar(img, ax=ax, format='%+2.0f dB')
plt.title('Log Mel spectrogram')
plt.xlabel('Time (s)')
plt.ylabel('Mel Frequency')
plt.show()
mel_filter_bank = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=mel_bins, fmin=0.0, fmax=None, htk=False, norm='slaney')
print("The shape of the mel filter bank is: ", mel_filter_bank.shape)
librosa.display.specshow(mel_filter_bank, sr=sr, x_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel filter bank')
plt.tight_layout()
plt.show()
import librosa
import librosa.display
import matplotlib.pyplot as plt
def extract_long_term_spectral_features(audio_path, n_fft=2048, hop_length=320):
"""
Extracts long-term spectral features from an audio file.
Args:
- audio_path (str): Path to the audio file.
- n_fft (int): Number of samples used for each Fourier Transform.
- hop_length (int): Hop length (in samples) for the STFT. Controls the time resolution of the spectrogram.
Returns:
- long_term_spectral_features (ndarray): Extracted long-term spectral features.
"""
# Load audio file
y, sr = librosa.load(audio_path)
# Compute short-time Fourier transform (STFT)
stft = librosa.stft(y, n_fft=n_fft, hop_length=hop_length)
# Compute magnitude spectrogram
magnitude_spectrogram = np.abs(stft)
# Transpose magnitude spectrogram
magnitude_spectrogram = np.transpose(magnitude_spectrogram)
return magnitude_spectrogram
# Usage:
audio_path = "/content/begging1.mp3"
long_term_spectral_features = extract_long_term_spectral_features(audio_path)
print("Long-term spectral features shape:", long_term_spectral_features.shape)
# Normalize the spectrogram
normalized_spectrogram = librosa.util.normalize(long_term_spectral_features)
print("normalized_spectrogram features shape:", normalized_spectrogram.shape)
# Plot spectrogram
plt.figure(figsize=(10, 5))
librosa.display.specshow(normalized_spectrogram, sr=sr, hop_length=hop_length, x_axis='time', y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('long-term spectral features')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.tight_layout()
plt.show()
import librosa
import librosa.display
import matplotlib.pyplot as plt
def extract_modulation_spectra(audio_path, n_fft=2048, hop_length=512):
"""
Extracts modulation spectra from an audio file.
Args:
- audio_path (str): Path to the audio file.
- n_fft (int): Number of samples used for each Fourier Transform.
- hop_length (int): Hop length (in samples) for the STFT. Controls the time resolution of the spectrogram.
Returns:
- modulation_spectra (ndarray): Extracted modulation spectra.
"""
# Load audio file
y, sr = librosa.load(audio_path)
# Compute short-time Fourier transform (STFT)
stft = librosa.stft(y, n_fft=n_fft, hop_length=hop_length)
# Compute modulation spectra
modulation_spectra = np.abs(librosa.feature.melspectrogram(S=stft))
return modulation_spectra
# Example usage:
audio_path = "/content/begging1.mp3"
modulation_spectra = extract_modulation_spectra(audio_path)
print("modulation spectra shape:", modulation_spectra.shape)
# Normalize the modulation spectra differently for better contrast
normalized_modulation_spectra = librosa.util.normalize(modulation_spectra, axis=1)
print("normalized_modulation_spectra shape:", normalized_modulation_spectra.shape)
# Plot modulation spectra
plt.figure(figsize=(10, 5))
librosa.display.specshow(normalized_modulation_spectra, sr=sr, hop_length=hop_length, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Modulation Spectra')
plt.xlabel('Time (s)')
plt.ylabel('Modulation Frequency (Hz)')
plt.tight_layout()
plt.show()
After obtaining the desired features, the next step typically involves feeding these features into a machine learning model for further analysis and processing. This could entail tasks such as classification, clustering, regression, or any other relevant task depending on the specific objectives of the application. The machine learning model leverages the extracted features to learn patterns and relationships within the data, ultimately enabling it to make predictions or perform tasks that are beneficial for the given application. This phase of the workflow often involves training the model on labelled data to learn from examples, followed by evaluation on unseen data to assess its performance and generalization capabilities. Additionally, fine-tuning of model parameters and feature selection techniques may be employed to optimize performance and enhance interpretability. Overall, the process of extracting features and leveraging them in a machine-learning model is crucial for deriving meaningful insights and making informed decisions in various domains.
auctor lectus better best conbia euismot rhoncus dolora gorgeous system.