In addition to the large size issue, the analysis of VAERS data deals with nominal variables such as vaccines and events or symptoms; in particular, the symptom is a nominal variable of very large dimension. Here, we use data visualization methods in our studies.
For an initial data visualization, we consider all different n=7368 events or symptoms reported in processed VAERS dataset (1) and arrange them according to the alphabetical order: E1,E2,⋯,En. We denote all reported 72 vaccines according to the following order:
$$ V_{1}, V_{2}, \cdots, V_{72} $$
(2)
where V1,⋯,V24 are alphabetically ordered 24 bacteria vaccines, V25,⋯,V62 are alphabetically ordered 38 virus vaccines, V63,⋯,V71 are alphabetically ordered 9 bacteria/virus combined vaccines, and V72 represents the vaccine listed as unknown. For each vaccine Vk, we obtain the frequency vector Xk=(Xk1,Xk2,⋯,Xkn), where n=7,368 and Xki is the total number of times that event Ei was reported for vaccine Vk. Based on these 72 vectors Xk, we compute the rotated 7368×7368 matrix of sample correlation coefficients:
$$ {{} \begin{aligned} \hat{\rho}_{ij} &= \frac{\sum^{72}_{k=1}\left(X_{ki} - \bar{X}_{i}\right)\left(X_{kj} - \bar{X}_{j}\right)} {\sqrt{\sum^{72}_{k=1}\left(X_{ki} - \bar{X}_{i}\right)^{2}}\, \sqrt{\sum^{72}_{k=1}\left(X_{kj} - \bar{X}_{j}\right)^{2}}},\\& \qquad i, j = 1, 2, \cdots, 7368 \end{aligned}} $$
(3)
where \(\bar {X}_{i}\) is the sample mean of X1,i,⋯,X72,i, and \(\hat {\rho }_{ij}\) is the sample correlation coefficient of symptoms Ei and Ej. This matrix is displayed in Fig. 1a, where red dots represent for those \(\hat {\rho }_{ij} > 0.01\), white dots for \(|\hat {\rho }_{ij}|\le 0.01\), and blue dots for \(\hat {\rho }_{ij} < -0.01\). Throughout this article, all matrices are displayed as the rotated version of the conventional matrix, i.e., with the bottom row of the conventional matrix as the top row here. Obviously, Fig. 1a shows no informative patterns about the dataset.
Next, we denote all reported symptoms or events in VAERS data (1) by: \(\mathbb {E}_{1}, \mathbb {E}_{2}, \cdots, \mathbb {E}_{n}\), where \(\mathbb {E}_{1}\) is the symptom or event with the highest occurrence frequency in the dataset, \(\mathbb {E}_{2}\) is the symptom or event with the 2nd highest occurrence frequency in the dataset, and so forth. For each vaccine Vk in (2), we obtain the frequency vector Yk=(Yk1,Yk2,⋯,Ykn), where Yki is the total number of times that event \(\mathbb {E}_{i}\) was reported for vaccine Vk. Based on such 72 vectors Yk, we compute the rotated matrix of sample correlation coefficients \(\hat {\rho }_{ij}^{Y}\) using the formula in (3) for Yki’s, where \(\hat {\rho }_{ij}^{Y}\) is the sample correlation coefficient of symptoms \(\mathbb {E}_{i}\) and \(\mathbb {E}_{j}\). This matrix is displayed in Fig. 1b, where the colored dots have the same meaning for \(\hat {\rho }^{Y}_{ij}\) as for those in Fig. 1a. In addition, Fig. 1c displays the matrix of Fig. 1b with 20 different colors to illustrate the values of the sample correlation coefficients \(\hat {\rho }^{Y}_{ij}\), where green color corresponds to values of \(\hat {\rho }^{Y}_{ij}\) around 0, color from green to red corresponds to \(\hat {\rho }^{Y}_{ij} > 0\), and color from green to blue corresponds to \(\hat {\rho }^{Y}_{ij} < 0\). Interestingly, such a method of data visualization clearly indicates cross-board patterns.
For the study of the cross-board patterns on the relationship between the vaccines and the adverse events or symptoms, we consider the top 100 adverse symptoms Z1,⋯,Z100 listed in Table 1, and consider the vaccines V1,⋯,V71 listed in (2); that is in our analysis hereafter we exclude those vectors in processed VAERS dataset (1) that list the vaccine as “unknown”. For each year, we obtain frequency vector Fk=(Fk,1,1,⋯,Fk,1,100,Fk,2,1,⋯,Fk,2,100,⋯Fk,71,100), where k=1,⋯,24 represent 24 years between 1990–2013; and Fkij is the total number of times that symptom Zj was reported for vaccine Vi during year k. Based on these 24 vectors Fk, we compute the rotated 7100×7100 matrix of sample correlation coefficients \(\hat {\rho }_{ij,lq}\) using the formula in (3) for Fkij’s, where \(\hat {\rho }_{ij,lq}\) is the sample correlation coefficient of symptom Zj under vaccine Vi and symptom Zq under vaccine Vl, thus \(\hat {\rho }_{ij,iq}\) is the sample correlation coefficient of symptoms Zj and Zq under vaccine Vi. This matrix is displayed in Fig. 2, where the colored dots have the same meaning for \(\hat {\rho }_{ij,lq}\) as for those in Fig. 1c.
As indicated by solid lines, the matrix in Fig. 2 consists of 712=5041 block matrices Mij, each of which is of dimension 100×100 and is the matrix of sample correlation coefficients of top 100 adverse symptoms under vaccines Vi and Vj. For i≠j, the block matrices Mij and Mji satisfy \(\boldsymbol {M}_{ij}^{\top }=\boldsymbol {M}_{ji}\), while Mii is the matrix of sample correlation coefficients of top 100 adverse symptoms under vaccine Vi and is a block matrix located on the diagonal line of the matrix in the direction from bottom left to top right.
Due to the order of vaccines Vi’s in (2), the bold dashed lines separate the matrix of Fig. 2 into 9 big block matrices, among which the square block matrix in the bottom left, displayed separately in Fig. 3, is the matrix of sample correlation coefficients of top 100 adverse symptoms under all 24 different bacteria vaccines; and the square block matrix in the middle, displayed separately in Fig. 5, is the the matrix of sample correlation coefficients of top 100 adverse symptoms under all 38 different virus vaccines.
In Fig. 4, the top are block matrices M16,22 and M22,16 in Fig. 3, and the bottom are block matrices M16,21 and M21,16 in Fig. 3. Due to better picture resolution reason, these block matrices clearly show that equation \(\boldsymbol {M}_{ij}^{\top }=\boldsymbol {M}_{ji}\) holds. The two block matrices on the top of Fig. 4 are among those mostly green-blue colored block matrices in Fig. 3, while the two block matrices on the bottom are the very few non-diagonal block matrices in Fig. 3 that are mostly red colored.
Figure 6 contains the block matrices Mij of Fig. 5 for i,j=3,4,5,6, which are the correlation matrices for the top 100 adverse symptoms under 4 different flu vaccines: FLU, FLU(H1N1), FLUN and FLUN(H1N1).
For the study of the relations between vaccine-adverse events and attributes of vaccines, such as live attenuated vaccine vs. killed inactivated vaccine, Fig. 7 displays the matrix of sample correlation coefficients of top 100 adverse symptoms under all 23 different live vaccines in processed VAERS dataset (1), while Fig. 8 displays the matrix of sample correlation coefficients of top 100 adverse symptoms under all 47 different inactive vaccines.