In our method, we first build a wavelet group that reflects the major correlation energy of a template signal. The fuzzy features of the target signals are then obtained by convolving them with an optimum collection of wavelets. Finally, fuzzy features are used to compute matching features between the template signal and the target signal. This work extracted self-matching features in the time domain, self-matching features in the frequency domain, and mutual matching features in the frequency domain.

The dataset we used in this study is from reference [8], which include five categories of PCG: normal heart sound (NHS), mitral stenosis (MS), aortic regurgitation (AR), mitral regurgitation (MR) and mitral valve prolapse (MVP). Typical representative of each category is shown in Fig. 1. They were gathered from a variety of sources, including books (Auscultation skills CD, Heart sound made easy) and websites (48 different websites provided the data including Washington, Texas, 3 M, and Michigan and so on). After excluding files with excessive noise, heart sound was sampled at 8000 Hz frequency rate and converted to mono channel, 3 period heart sound signal, data sampling, conversion, and editing were completed. The duration of each sample lasts 2 to 3 s containing 3 cardiac circles. Finally, there are 1000 samples in total, and each category holds 200 samples respectively. There are several benefits to using this dataset. The first comprises a large enough amount of data (1000 samples) and 200 samples for each analogy. Second, because it satisfies the sample balance property, it will not lead the algorithm to form preferences when the machine learning model is trained. Third, each item of data is labeled clearly. Fourth, this dataset has been used in several researches with positive experimental outcomes.

Procedures of FMFE are shown in Fig. 2. We got the template PCG signal from training PCG set and test signal from test PCG set. Then, a group of Gaussian wavelets were optimized from originally selected wavelets based on the template signal. Subsequently, template signal and test signal were convolved with these optimized wavelets, and fuzzy convolved features of template PCG signal and test PCG signal can be computed. Finally, based on these convolved features, fuzzy matching degree and fuzzy matching energy are obtained by matching computation.

### Acquisition of template feature

First of all, we need to clarify the following mathematic definition. For ease of more details in the proposed method and the ways of expression, the following formula representations will be used.

Given \(x_{{\varvec{1}}}\) to \(x_{{\varvec{n}}}\) are a set of vectors, so:

$$(x_{1} {\mathbf{;}}x_{2} {\mathbf{;}} \ldots {\mathbf{;}}x_{n} ) = \left( {\begin{array}{*{20}c} {x_{11} } & {{\varvec{x}}_{12} } & {\ldots } & {{\varvec{x}}_{1n} } \\ {{\varvec{x}}_{21} } & {{\varvec{x}}_{22} } & {\ldots } & {{\varvec{x}}_{2n} } \\ \vdots & \vdots & {} & \vdots \\ {{\varvec{x}}_{n1} } & {{\varvec{x}}_{n2} } & {\ldots } & {{\varvec{x}}_{n3} } \\ \end{array} } \right)$$

(1)

$$(x_{1} ,x_{2} ,\ldots ,x_{n} ) = \left( {\begin{array}{*{20}c} {x_{11} } & {{\varvec{x}}_{21} } & {\ldots } & {{\varvec{x}}_{n1} } \\ {{\varvec{x}}_{12} } & {{\varvec{x}}_{22} } & {\ldots } & {{\varvec{x}}_{n2} } \\ \vdots & \vdots & {} & \vdots \\ {{\varvec{x}}_{1n} } & {{\varvec{x}}_{2n} } & {\ldots } & {{\varvec{x}}_{nn} } \\ \end{array} } \right)$$

(2)

Given \(x\) is a vector, then:

$$\left\| x \right\|^{2} = {\varvec{x}}_{1}^{2} + {\varvec{x}}_{2}^{2} + \cdots + {\varvec{x}}_{n}^{2}$$

(3)

$$Sqrt(x) = (\sqrt {{\varvec{x}}_{1} } ,\sqrt {{\varvec{x}}_{2} } ,\ldots ,\sqrt {{\varvec{x}}_{n} } )$$

(4)

As for \(\odot\), given \(x\) and \(y\) are two vectors, then:

$$x \odot y = ({\varvec{x}}_{1} \cdot {\varvec{y}}_{1} ,{\varvec{x}}_{2} \cdot {\varvec{y}}_{2} ,\ldots ,{\varvec{x}}_{n} \cdot {\varvec{y}}_{n} )$$

(5)

Given \(X\) and \(Y\) are two matrixes, then:

$$X \odot Y = \left( {\begin{array}{*{20}c} {{\varvec{x}}_{11} \cdot {\varvec{y}}_{11} } & {{\varvec{x}}_{12} \cdot {\varvec{y}}_{12} } & {\ldots } & {{\varvec{x}}_{1n} \cdot {\varvec{y}}_{1n} } \\ {{\varvec{x}}_{21} \cdot {\varvec{y}}_{21} } & {{\varvec{x}}_{22} \cdot {\varvec{y}}_{22} } & {\ldots } & {{\varvec{x}}_{2n} \cdot {\varvec{y}}_{2n} } \\ \vdots & \vdots & {} & \vdots \\ {{\varvec{x}}_{n1} \cdot {\varvec{y}}_{n1} } & {{\varvec{x}}_{n2} \cdot {\varvec{y}}_{n2} } & {\ldots } & {{\varvec{x}}_{nn} \cdot {\varvec{y}}_{nn} } \\ \end{array} } \right)$$

(6)

Given \(X\) is a matrix, *a* is a scalar, then:

$$X \odot \user2{a = }\left( {\begin{array}{*{20}c} {{\varvec{a}}x_{11} } & {{\varvec{a}}x_{12} } & \cdots & {{\varvec{a}}x_{1n} } \\ {{\varvec{a}}x_{21} } & {{\varvec{a}}x_{22} } & \cdots & {{\varvec{a}}x_{2n} } \\ \vdots & \vdots & \cdots & \vdots \\ {{\varvec{a}}x_{n1} } & {{\varvec{a}}x_{n2} } & \cdots & {{\varvec{a}}x_{nn} } \\ \end{array} } \right)$$

(7)

\(FFT(x)\) represents the Fast Fourier transform for vector \(x\).

As mentioned above, we used FMFE in 3 dimensions, which are self-matching in time domain, self-matching in frequency domain and mutual matching in frequency domain respectively. Thus, 3 matching templates are needed. In the time domain self-matching, the source of template is come from the PCG signal itself. Because the complete PCG signal are composed of 3 cycles in our dataset. In the dimension of self-matching in time domain, the template signal (*m*) is simply calculated by averaging all 3 cycles (*h*_{1}*, h*_{2}*, h*_{3}) of one signal:

$${\varvec{m}} = \frac{{{\varvec{h}}_{1} + {\varvec{h}}_{2} + {\varvec{h}}_{3} }}{3}$$

(8)

In the dimension of self-matching in frequency domain and mutual matching in frequency domain, the template signal (*m*) is calculated by averaging Fast Fourier Transform (FFT) of all 3 cycles (*h*_{1}, *h*_{2}, *h*_{3}) of one signal:

$$\user2{m = }\frac{{\left\| {FFT(h_{1} )} \right\|\user2{ + }\left\| {FFT(h_{2} )} \right\|\user2{ + }\left\| {FFT(h_{3} )} \right\|}}{3}$$

(9)

What is worth to be mentioned is that in mutual matching in frequency domain, *h*_{1}, *h*_{2}, *h*_{3} are the 3 parts of one specific normal PCG (the shortest PCG in normal PCG dataset). In conclusion, all sample signals used the same template signal in mutual matching, but every sample signal has its own template signal in self-matching. After we have the template signal, we need to obtain the template features. *W* is an initial filter, it is a matrix constructed by N wavelets, which is described as \({(}{\varvec{w}}_{1} {;}{\varvec{w}}_{2} ;\ldots ;{\varvec{w}}_{N} )\). In this study, Gaussian wavelets were used. They are Gaussian 1th-8th high-derivative filters wavelets, N wavelets in total, and the length of each wavelet is L. N, and L are hyperparameters in FMFE. Then, the template features can be obtained by convolving the template signal with the initial filter matrix (*X* is the template feature matrix, \(\otimes\) is convolution operation):

$${\varvec{X}} = {\varvec{m}} \otimes \user2{W = }({\varvec{m}} \otimes {\varvec{w}}_{1} ;{\varvec{m}} \otimes {\varvec{w}}_{2} ;\ldots ;{\varvec{m}} \otimes {\varvec{w}}_{N} )$$

(10)

### Acquisition of correlation energy feature *X*
_{m} and *X*
_{s}

However, not all the template features are usually needed to be considered. Because the same type of heart sound signals (for example, the same type of heart sound signals in normal people or the same type of heart sound signals in people with certain heart disease) come from different samples of individuals, and its features are not completely consistent. To reduce overfitting, an idea of fuzzy matching is proposed here. We consider that the same type of heart sound signal has the highest correlation energy. Therefore, a matching filter matrix based on the maximum correlation energy was constructed to extract the correlation energy features of the signals. We use a mask *U* to optimize initial filter *W*:

$${\varvec{U}} = ({\varvec{\beta}}_{1} ,{\varvec{\beta}}_{2} ,\ldots ,{\varvec{\beta}}_{O} )$$

(11)

The \({\varvec{\beta}}_{1} ,{\varvec{\beta}}_{2} ,\ldots {\varvec{\beta}}_{O}\) are the eigenvectors corresponding to the top O eigenvalues of *XX*^{T}. Then, \({\varvec{W}}^{^{\prime}}\) can be optimized. They can be described as follows:

$${\varvec{W}}^{^{\prime}} = {\varvec{U}}^{T} {\varvec{W}} = ({\varvec{w}}_{{\varvec{1}}}^{\user2{^{\prime}}} ;{\varvec{w}}_{{\varvec{2}}}^{\user2{^{\prime}}} ;\ldots ;{\varvec{w}}_{O}^{\user2{^{\prime}}} )$$

(12)

where \({\varvec{w}}_{{\varvec{1}}}^{\user2{^{\prime}}} ;{\varvec{w}}_{{\varvec{2}}}^{\user2{^{\prime}}} ;\ldots ;{\varvec{w}}_{O}^{\user2{^{\prime}}}\) represent the optimized O filters (Because we only selected O eigenvectors to optimize the filter, the number of optimized filters becomes one of the hyperparameters O of the proposed method). Correspondingly, the fuzzy feature of template signal (*X*_{m}) and of target signal (*X*_{s}) can be obtained as follows:

$${\varvec{X}}_{m} = {\varvec{m}} \otimes {\varvec{W}}^{\user2{^{\prime}}} = ({\varvec{m}} \otimes {\varvec{w}}_{{\varvec{1}}}^{\user2{^{\prime}}} ;\,{\varvec{m}} \otimes {\varvec{w}}_{{\varvec{2}}}^{\user2{^{\prime}}} ;\,\ldots ;{\varvec{m}} \otimes {\varvec{w}}_{O}^{\user2{^{\prime}}} ) = ({\varvec{x}}_{{m{\varvec{1}}}} ;\,{\varvec{x}}_{{m{\varvec{2}}}} ;\,\ldots ;\,{\varvec{x}}_{mO} )$$

(13)

$${\varvec{X}}_{s} = {\varvec{s}} \otimes {\varvec{W}}^{\user2{^{\prime}}} = ({\varvec{s}} \otimes {\varvec{w}}_{{\varvec{1}}}^{\user2{^{\prime}}} ;{\varvec{s}} \otimes {\varvec{w}}_{{\varvec{2}}}^{\user2{^{\prime}}} ;\,\ldots ;\,{\varvec{s}} \otimes {\varvec{w}}_{O}^{\user2{^{\prime}}} ) = ({\varvec{x}}_{{s{\varvec{1}}}} ;\,{\varvec{x}}_{{s{\varvec{2}}}} ;\,\ldots ;\,{\varvec{x}}_{sO} )$$

(14)

### Acquisition of matching degree feature

Here we design a convolution to continuously compute matching degree between the template signal and target signal. To avoid the endpoint effect from convolution, the endpoints of \({\varvec{x}}_{mi}\) and \({\varvec{x}}_{si}\) were removed and then renamed as \({\varvec{x}}_{mi}^{\prime }\) and \({\varvec{x}}_{si}^{^{\prime}}\) respectively. Equations (15) and 16 described how we obtained the matching degree *d*. \({\varvec{x}}_{si - norm}^{^{\prime}}\) is the norm of \({\varvec{x}}_{si}^{^{\prime}}\) in our convolutional computing, which is described in Eq. (17). Vector *a* in Eq. (17) are composed of the only element 1, which has the same size with \({\varvec{x}}_{mi}^{^{\prime}}\). \(x_{mi - norm}^{^{\prime}}\) is the norm of template signal \({\varvec{x}}_{mi}^{^{\prime}}\), which is expressed in Eq. (18).

$${\varvec{y}}_{i} = \frac{{{\varvec{x}}_{si}^{^{\prime}} \otimes \overleftarrow {{{\varvec{x}}_{mi}^{^{\prime}} }} }}{{{\varvec{x}}_{si - norm}^{^{\prime}} \odot x^{^{\prime}}_{mi - norm} }},(i = 1,2,\ldots ,O)$$

(15)

$$\user2{d = y}_{1} \odot {\varvec{y}}_{2} ,\ldots , \odot {\varvec{y}}_{o}$$

(16)

$${\varvec{x}}_{si - norm}^{\prime } = Sqrt(({\varvec{x}}_{si}^{\prime } \odot {\varvec{x}}_{si}^{\prime } ) \otimes {\varvec{a}})$$

(17)

$$x_{mi - norm}^{^{\prime}} { = }\left\| {{\varvec{x}}_{mi}^{^{\prime}} } \right\|^{{2}}$$

(18)

### Acquisition of other matching features

We use max matching degree as one extracted feature in this study, we record the maximum value of matching degree *d* vector, which is recorded as \(mmd_{{}}\). And record the point where the maximum value corresponds as the index of *mmd**.* Based on index of the \(mmd_{{}}\), corresponding energy features in \({\varvec{X}}_{s}\) can be easily found by taking the values according to the index position of *mmd*. These values are referred as \(e_{s1} ,e_{s2} ,\ldots ,e_{sO}\) in a vector \({\varvec{ME}}\):

$${\varvec{ME}} = (e_{s1} ,e_{s2} ,\ldots ,e_{sO} )$$

(19)

Each optimized wavelet gives a correlation energy value, so there are totally \(O\) energy features in one cardiac cycle.

Because a template signal represents information of one cycle of each PCG signal, all 3 cycles of PCG in our study should have 3-folds of matching features. For each cycle, matching degree features and matching energy features are extracted according to the method above. This means that in one dimension of matching computation, each heart sound signal extracted \(3 \times \left( {{\text{O}} + 1} \right)\) matching features. In the time domain, the self-matching degree features (TD.S.MD) and self-matching energy features (TD.S.ME) of one complete PCG signal can be expressed as follows:

$${\varvec{TD}}\user2{.S}\user2{.MD} = (mmd_{1}^{TD.S} , \,mmd_{2}^{TD.S} , \,mmd_{3}^{TD.S} )$$

(20)

$${\varvec{TD}}\user2{.S}\user2{.ME} = ({\varvec{ME}}_{1}^{TD.S} , \,{\varvec{ME}}_{2}^{TD.S} , \,{\varvec{ME}}_{3}^{TD.S} )$$

(21)

The extraction method of self-matching degree features and self-matching energy features in frequency domain is similar to the method above. The difference is that the heart sound signals in time domain of one heart beat cycle are transformed to the frequency domain using Fast Fourier Transform (FFT) and the template signal can be given in Eq. (9). Using the same method, we can have the following feature expression:

$${\varvec{FD}}\user2{.S}\user2{.MD} = (mmd_{{1}}^{FD.S} ,\,mmd_{2}^{FD.S} ,\,mmd_{3}^{FD.S} )$$

(22)

$${\varvec{FD}}\user2{.M}\user2{.MD} = (mmd_{1}^{FD.M} ,\,mmd_{2}^{FD.M} ,\,mmd_{3}^{FD.M} )$$

(23)

$${\varvec{FD}}\user2{.S}\user2{.ME} = ({\varvec{ME}}_{1}^{FD.S} ,\,{\varvec{ME}}_{2}^{FD.S} ,\,{\varvec{ME}}_{3}^{FD.S} )$$

(24)

$${\varvec{FD}}\user2{.M}\user2{.ME} = ({\varvec{ME}}_{1}^{FD.M} ,\,{\varvec{ME}}_{2}^{FD.M} ,\,{\varvec{ME}}_{3}^{FD.M} )$$

(25)

\({\varvec{FD}}\user2{.S}\user2{.MD}\) is frequency domain self-matching degree feature of one complete heart sound signal, and \({\varvec{FD}}\user2{.S}\user2{.ME}\) is frequency domain self-matching energy features. \({\varvec{FD}}\user2{.M}\user2{.MD}\) and \({\varvec{FD}}\user2{.M}\user2{.ME}\) are mutual matching degree features and mutual matching energy features respectively.

### Classifiers

4 classifiers are used to evaluate obtained features in this study. Support vector machine (SVM) is a kind of generalized linear classifier. The decision boundary of SVM is the maximum margin hyperplane for learning samples. It utilizes hinge loss function to calculate empirical risk and adds regularization term to the solution system to optimize structural risk [27]. K-nearest neighbor (KNN) classification algorithm is one of the simplest methods in machine learning. The K nearest neighbors refers K nearest samples, which means that each category can be represented by its closest k neighbor’s category [28]. Random forest is a classifier in machine learning that contains multiple decision trees, which normally has well performance in machine learning task [29]. Multilayer Perceptron (MLP) is a classifier that follows the principle of human nervous system to learning and prediction. It uses the weight to store data, and uses the algorithm to adjust the weight and reduce the deviation in the training process [30]. Parameters of each classifier were optimized using grid search and the best ones are given in the Additional file1.

### Evaluation

In this paper, we used macro-recall (macro-R), macro-precision (macro-P) and accuracy to evaluate the performance of the method we proposed. Macro-R and macro-P are the assessment parameters often used in multi-classification. They are the average of the recall rate(R) and precision(P) obtained from each confusion matrix in our tenfold cross validation. These indicators are computed according to the following equation:

$$Recall = \frac{TP}{{TP + FN}}$$

(26)

$$P = \frac{TP}{{TP + FP}}$$

(27)

$$Sensitivity { = }\frac{TP}{{TP + FN}}$$

(28)

$$Specificity = \frac{TN}{{FP + TN}}$$

(29)

$${\text{macro - }}R = \frac{1}{n}\sum\limits_{i = 1}^{n} {R_{i} }$$

(30)

$${\text{macro - }}P = \frac{1}{n}\sum\limits_{i = 1}^{n} {P_{i} }$$

(31)

In equations above, TP, FP, FN and TN indicate true positive, false positive, false negative, true negative in confusion matrix respectively.