Continual learning framework for a multicenter study with an application to electrocardiogram

Table 4 Test performances of all methods on a single domain (PTB-XL) are presented. The mean and standard deviation across five random seeds are shown. Bold reflects the method with the best performance. The overall performance is the weighted average AUROC by the number of data in each site. Bold is the best and underlined is the second best

	Site 1	Site 2	Site 3	Site 4	Overall
Supervised (baseline)
Single data	0.874 ± 0.010	0.902 ± 0.013	0.890 ± 0.003	0.894 ± 0.009
Merged data	0.903 ± 0.013	0.935 ± 0.010	0.910 ± 0.009	0.911 ± 0.008	0.914 ± 0.010
Federated
FedAvg	0.901 ± 0.003	0.917 ± 0.007	0.909 ± 0.006	0.915 ± 0.006	0.910 ± 0.005
FedProx	0.902 ± 0.003	0.925 ± 0.004	0.906 ± 0.008	0.906 ± 0.004	0.909 ± 0.003
Finetuning
Small to Large	0.899 ± 0.006	0.918 ± 0.003	0.903 ± 0.005	0.905 ± 0.004	0.906 ± 0.003
Large to Small	0.889 ± 0.005	0.926 ± 0.005	0.898 ± 0.006	0.903 ± 0.004	0.903 ± 0.004
Continual
Small to Large	0.894 ± 0.010	0.923 ± 0.009	0.907 ± 0.010	0.904 ± 0.010	0.906 ± 0.009
Large to Small	0.903 ± 0.006	0.929 ± 0.011	0.916 ± 0.007	0.909 ± 0.006	0.914 ± 0.007

ISSN: 1472-6947