Fig1: An example of bad channel's PPSD results
Fig2: An example of good channel's PPSD results
Earthquake data before bad trace removal
Earthquake data after bad trace removal

Inspiration

The main idea of the Distributed Acoustic Sensing (DAS) System at San Andrea Fault is to provide high-quality and densely recorded data for earthquake monitoring, source locating purpose. The Utility of seismic data can be greatly increased if noise level is removed. A good qualification and understanding of noise levels is a first step at reducing noise level in seismic data.

What it does

In this study, we have three main objectives. First, we intend to employ a standard method (probabilistic power spectral densities [PPSD] method; McNamara and Buland, 2004) to calculate DAS ambient noise level for a direct comparison to the standard Seismological models (new low noise model [NLNM] and new high noise model [NHNM]; Peterson, 1993); Second, we utilize the PPSD results to differentiate good channels from bad channels using supervised learning method (support vector machine [SVM]); Third, we show the bad channel removal effect using an Earthquake recording as an example. Moreover, we compare the earthquake PPSD spectrum with the PPSD noise level to see the potential of event detection in the PPSD domain if time allows.

How I built it

PPSD Method

The standard method for quantifying seismic background noise is to calculate the noise PSD. To have the seismometer results as a benchmark for understanding DAS noise level, we adopted the implementation of the PSD method described in Peterson (1993).

Most of the time DAS records continuous data. For preprocessing, we parse continuous time series, for each DAS channel, into 10-minutes time series segments, overlapping by 50% and distributed continuously throughout the day. The point of overlapping time series segments is to reduce variance in the PSD estimate (Cooley and Tukey, 1965). In order to minimize long-period contamination, the data are transformed to a zero mean value such that any long-period linear trend is removed by the average slope method. If trends are not eliminated in the data, large distortions can occur in spectral processing by nullifying the estimation of low-frequency spectral quantities. To suppress side lobe leakage in the resulting FFT, a 10% cosine taper is applied to the ends of each truncated and detrended time series segment. Tapering the time series has the effect of smoothing the FFT and minimizing the effect of the discontinuity between the beginning and end of the time series. The time series variance reduction can be quantified by the ratio of the total power in the raw FFT to the total power in the smoothed filter (1.142857) and will be used to correct absolute power in the final spectrum (Bendat and Piersol, 1973).

To estimate the true variation of noise at a given DAS channel, at each period slice, we calculate the probability distribution of DB values. We end up having PPSD results for all DAS channels. The PPSD we implemented follows the same standard as McNamara and Buland (2004) do for processing the seismometer data.

Picking good channels with SVM method

After we get PPSD noise level results, we found that good channels and bad channels tend to behave very different in the PPSD domain (See an example of a good channel’s PPSD results and a bad channel’s results in the following two figures). It occurs to us to take advantage of the difference to pick good channels.

We transform the 2D results of the PPSD results for each channel to be a 1D vector as the feature. We manually labeled 900 hundreds channels as good or bad channels. We use 750 channels as the training set and the other 150 channels as the testing set to feed into the SVM algorithm. We end up having great accuracy: 1.00 for the training set, 0.99 for the cross validation results, 0.9621 for the testing-set results.

Bad channel removal in earthquake data

We are interested in learning how well the SVM method does on picking good channels. To inspect this, we use an earthquake data shown below as a testing example. From the comparison between the following two figures, we can see that SVM does a pretty decent job at removing the bad channels, especially at the two ends.

Challenges I ran into

First, the IO was very slow. And our code sometimes need optimization. The biggest challenge for us is the limited time for the project.

Accomplishments that I'm proud of

We gain a better understanding of the DAS noise level by comparing with the Seismometer's noise model using PPSD method. We can differentiate good channels and bad channels quantitatively by supervised learning of the PPSD results with a great accuracy. The PPSD results are potentially to detect earthquake data.