Open access
Technical Papers
Feb 28, 2020

Data Collaboration Analysis Framework Using Centralization of Individual Intermediate Representations for Distributed Data Sets

Publication: ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering
Volume 6, Issue 2

Abstract

This paper proposes a data collaboration analysis framework for distributed data sets. The proposed framework involves centralized machine learning while the original data sets and models remain distributed over a number of institutions. Recently, data has become larger and more distributed with decreasing costs of data collection. Centralizing distributed data sets and analyzing them as one data set can allow for novel insights and attainment of higher prediction performance than that of analyzing distributed data sets individually. However, it is generally difficult to centralize the original data sets because of a large data size or privacy concerns. This paper proposes a data collaboration analysis framework that does not involve sharing the original data sets to circumvent these difficulties. The proposed framework only centralizes intermediate representations constructed individually rather than the original data set. The proposed framework does not use privacy-preserving computations or model centralization. In addition, this paper proposes a practical algorithm within the framework. Numerical experiments reveal that the proposed method achieves higher recognition performance for artificial and real-world problems than individual analysis.

Introduction

Dimensionality reduction methods that project high-dimensional data to a low-dimensional space are successfully applied in several application areas to improve the prediction performance and accelerate machine learning algorithms, including gene expression data analysis (Tarca et al. 2006), chemical sensor data analysis (Jurs et al. 2000), social network analysis (Tichy et al. 1979), infrastructure analysis (Lasisi and Attoh-Okine 2018, 2020) and so on. Recently, there has been a rise in large and distributed data, and the costs of data collection have decreased. Centralizing distributed data sets and analyzing as one data set, which we refer to as centralized analysis, can enable us to obtain novel insights and achieve higher prediction performance than that of individual analysis on an individual distributed data set. However, it is generally difficult to centralize the original data sets because of large data size or privacy concerns.
For example, in the case of medical data analysis, the data sets in each medical institution may not be sufficient for generating a high-quality prediction result because of insufficiency and imbalance of the data samples. However, it is difficult to centralize the original medical data samples with those from other institutions because of privacy concerns. If the original data is transformed to another (e.g., low-dimensional) space by an appropriate mapping; however, the mapped data, which is referred to as an intermediate representation, can be centralized fairly easily because each feature of the intermediate representation lacks any physical interpretation.
Examples of overcoming the difficulties of centralized analysis include the usage of privacy-preserving computation based on cryptography (Jha et al. 2005; Kerschbaum 2012; Cho et al. 2018; Gilad-Bachrach et al. 2016) and differential privacy (Abadi et al. 2016; Ji et al. 2014; Dwork 2006). Federated learning (Konečnỳ et al. 2016; McMahan et al. 2016), in which a model is centralized while the original data sets remain distributed, has also been studied in this context.
In contrast to these existing methods, this paper proposes a data collaboration analysis framework for distributed data sets that centralizes only individually constructed intermediate representations. The proposed framework assumes that each institution uses a different mapping function for constructing intermediate representations. The framework does not centralize the mapping functions to avoid a risk of approximating the original data samples from their intermediate representations by using the (approximate) inverse of the mapping functions. The proposed data collaboration analysis framework also does not use privacy-preserving computation. Instead, using sharable data such as public data and randomly constructed dummy data, the proposed framework achieves a data collaboration analysis by mapping individual intermediate representations to incorporable representations referred to as collaboration representations.
This paper additionally proposes a practical algorithm and a practical operation strategy regarding the problem of privacy preservation. Using numerical experiments on artificial and real-world data sets, the recognition performance of the proposed method is evaluated and compared with centralized and individual analyses.
The main contributions of this paper are as follows:
We propose a data collaboration analysis framework using centralization of the individual intermediate representations that does not centralize the original data sets for distributed data sets.
The proposed framework differs from existing approaches as it does not use privacy-preserving computations or model centralization.
The proposed data collaboration analysis achieves higher recognition performance than that produced by individual analysis.

Data Collaboration Analysis Framework

In this section, we discuss the case that there are multiple institutions and each institution has an individual data set. We propose a data collaboration analysis framework for distributed data sets that does not centralize the original data. The proposed method can be considered a dimensionality reduction method for distributed data sets. The distributed original data sets are transformed into the collaboration representations via the intermediate representations. Therefore, after constructing the collaboration representations, we can use any machine learning algorithms including unsupervised, supervised, and semi-supervised learning.
Let d be the number of institutions. Let m, ni be the numbers of features and training data samples of the ith institution and n be the total number of training data samples, n=i=1dni. In addition, let Xi=[xi1,xi2,,xini]Rm×ni be the training data set of the ith institution. For supervised learning, we additionally let Li=[li1,li2,,lini]Rl×ni be the ground truth for the training data. Also let si be the number of test data samples of the ith institution, s=i=1dsi and Yi=[yi1,yi2,,yisi]Rm×si be test data sets of the ith institution.
We do not centralize the original data set Xi (and Yi in supervised learning). Instead, we centralize the intermediate representations constructed individually from Xi. We also do not centralize the mapping function for the intermediate representation to reduce the risk of approximating the original data.
In the remainder of this section, we introduce a fundamental concept of the data collaboration analysis framework and propose a practical algorithm. In addition, we consider a practical operation strategy regarding privacy concerns.

Fundamental Concept and Framework

Instead of centralizing the original data set Xi, we consider centralizing the intermediate representation
X˜i=[x˜i1,x˜i2,,x˜ini]=fi(Xi)Ri×ni
(1)
constructed individually in each institution, where fi is a linear or nonlinear column-wise mapping function. Since each mapping function fi is constructed using Xi individually, fi and its dimensionality i depend on i.
Examples of the mapping function include unsupervised dimensionality reductions, such as principal component analysis (PCA) (Pearson 1901; Jolliffe 1986); locality preserving projections (LPP) (He and Niyogi 2004); and supervised dimensionality reductions, such as Fisher discriminant analysis (FDA) (Fisher 1936), local FDA (LFDA) (Sugiyama 2007), semi-supervised LFDA (SELF) (Sugiyama et al. 2010), locality adaptive discriminant analysis (LADA) (Li et al. 2017); and complex moment-based supervised eigenmap (CMSE) (Imakura et al. 2019). One can also consider a partial structure of deep neural networks. The proposed framework aims to avoid difficulties of centralized analysis by achieving collaboration analysis while the original data set Xi and the mapping function fi remain distributed in each institution.
Because fi depends on the institution i, even when each institution has an identical data sample x, the intermediate representation of the data differs; that is
fi(x)fj(x)(ij)
(2)
In addition, the relationship between the original data samples x and y is generally not preserved across different institutions; that is
D(fi(x),fj(y))D(x,y)(ij)
(3)
where D(·,·) denotes a relationship between data samples, such as distance and similarity. Therefore, one cannot analyze intermediate representations as one data set, even if dimensionality is identical, i=j.
To overcome this difficulty, the authors transform individual intermediate representations to incorporable representations again as follows:
X^i=[x^i1,x^i2,,x^ini]=gi(X˜i)R×ni
(4)
Here, gi is a column-wise mapping function such that
gi(fi(x))gj(fj(x))
(5)
D(gi(fi(x)),gj(fj(y)))D(x,y)(ij)
(6)
Preserving the relationships of the original data set, one can analyze the obtained data X^i (i=1,2,,d) as one data set as follows:
X^=[X^1,X^2,,X^d]R×n
(7)
Because the mapping function fi for the intermediate representation is not centralized, the function gi cannot be constructed only from the centralized intermediate representations X˜i. To construct the mapping function gi, we introduce sharable data referred to as an anchor data set consisting of public data or dummy data constructed randomly:
Xanc=[x1anc,x2anc,,xranc]Rm×r
(8)
where ri. Applying each mapping function fi to the anchor data, we have the ith intermediate representation of the anchor data set
X˜ianc=[x˜i1anc,x˜i2anc,,x˜iranc]=fi(Xanc)Ri×r
(9)
Then, we centralize X˜ianc and construct gi such that
X^ianc=[x^i1anc,x^i2anc,,x^inianc]=gi(X˜ianc)R×r
(10)
satisfies
X^iancX^janc,D(x^ikanc,x^jlanc)D(xkanc,xlanc)(ij)
(11)
The fundamental procedure in the proposed data collaboration analysis framework is as follows:
1.
Construction of intermediate representations
Each institution constructs intermediate representations individually and centralizes them.
2.
Construction of collaboration representations
From the centralized intermediate representations, the collaboration representations are constructed.
3.
Collaboration analysis
Collaboration representations obtained from individual original data sets are analyzed as one data set.

Proposal for Practical Algorithm

A fundamental component of the proposed framework involves constructing the collaboration representations using the anchor data (Phase 2). The mapping function gi can be constructed using the following two steps.
1.
Target setting
We set target Z=[z1,z2,,zr]R×r for the collaboration representations X^ianc of the anchor data satisfying
ZX^ianc(i=1,2,,d)
(12)
or
D(zk,zl)D(xkanc,xlanc)(k,l=1,2,,r)
(13)
2.
Map function construction
We construct mapping function gi such that
Zgi(X˜ianc)(i=1,2,,d)
(14)
There may be several ways for computing Steps 1 and 2. This paper assumes gi to be a linear map. Considering only Eq. (12) for Step 1, we propose a practical algorithm.
Because the map function gi is a linear map, using a matrix GiR×i, we have
X^i=gi(X˜i)=GiX˜i,X^ianc=gi(X˜ianc)=GiX˜ianc
(15)
Then, to achieve Eq. (12), we address the following minimization problem:
minG1,G2,,Gd,Zi=1dZGiX˜iancF2
(16)
This problem is difficult to solve directly. Instead, we consider solving the following minimal perturbation problem, i.e.
minEi,Gi(i=1,2,,d),Zi=1dEiF2s.t.Gi(X˜ianc+Ei)=Z
(17)
The minimal perturbation problem Eq. (17) with d=2 is called the total least squares problem and is solved by singular value decomposition (SVD) (Ito and Murota 2016). In the same manner, one can solve Eq. (17) with d>2 using SVD. Let
[(X˜1anc)T,(X˜2anc)T,,(X˜danc)T]=[U1,U2][Σ1Σ2][V11TV21TVd1TV12TV22TVd2T]
(18)
be the SVD of the matrix combining X˜ianc, where
U1Rr×,Σ1R×,Vi1Ri×
(19)
and Σ1 has larger part of singular values. Then, we have
Z=CU1T
(20)
where CR× is a nonsingular matrix.
Next, setting Z=U1T, we compute Gi from Eq. (14). The matrix Gi can be computed individually by solving the following linear least squares problem:
Gi=argminGZGX˜iancF2=U1T(X˜ianc)
(21)
where (X˜ianc) denotes the Moore-Penrose pseudo-inverse of the matrix X˜ianc.
Algorithm 1 summarizes the algorithm of the proposed method for supervised learning.
One of the main computational costs of the proposed method is for SVD (18) that depends on the number of anchor data r and dimensionality of the intermediate representations i. We can use some approximation algorithms including randomized SVD (Halko et al. 2011) for reducing the computational costs. On the other hand, the anchor data Xanc also strongly affects the recognition performance of the proposed method. A simple method is to set Xanc as a random matrix. If the anchor data has the same statistics with the original data set, it may improve the recognition performance of the proposed method. We intend to investigate practical techniques for constructing suitable anchor data in the future.
Algorithm 1. Proposed method
Input: XiRm×ni, LiRl×ni, YiRm×si (i=1,2,,d) individually.
Output: LYiRl×si (i=1,2,,d) individually.
  {Phase 0. Preparation}
 1: Centralize XancRm×r
  {Phase 1. Construction of intermediate representations}
 2: Construct X˜i=fi(Xi) and Xianc=fi(Xanc) for each i individually
 3: Centralize X˜i, X˜ianc, Li for all i
  {Phase 2. Construction of collaboration representations}
 4: Compute left singular vectors U1 of SVD (18)
 5: Compute Gi=U1T(X˜ianc)
 6: Compute X^i=GiX˜i
 7: Set X^=[X^1,X^2,,X^d] and L=[L1,L2,,Ld]
  {Phase 3. Collaboration analysis}
 8: Construct model h by a machine learning algorithm using X^ as training date and L as the ground truth, i.e., Lh(X^).
 9: Predict test data Y^i using a model h and obtain LYi=h(Gifi(Yi)).

Practical Operation Strategy Regarding Privacy Concerns

Here, we consider a practical operation strategy regarding privacy concerns based on the proposed framework for supervised learning. This paper uses the term privacy is preserved when each entry of corresponding data cannot be (approximately) obtained by others. Here, this paper does not consider the privacy of data set statistics.
Based on this definition, one can assert that regarding the original data Xi in each institution, privacy is preserved if the data collaboration analysis satisfies the following operation strategies:
1.
There are two roles: users who have training and test data sets individually and an analyst who centralizes the intermediate representations and analyzes them.
a.
The users and analyst possess some of the data, as illustrated in Tables 1 and 2.
b.
Each step of Algorithm 1 is executed by the corresponding role, as demonstrated in Fig. 1.
2.
Each mapping function fi is constructed with the following requirements:
a.
The original data can be approximated only with an intermediate representation and the mapping function fi or its approximation.
b.
The mapping function fi can be approximated only with both the input and output data of fi.
3.
The analyst does not collude with user(s) to obtain the original data of other users.
Table 1. Practical operation strategy: data for each role
RoleData
User iXi, X˜i, Li, Xanc, X˜ianc, Yi, LYi, fi, gi, h
AnalystX˜i, Li, X˜ianc, gi (i=1,2,,d), h
Table 2. Practical operation strategy: role for each data
DataRole
Xi, Yi, LYi, fiUser i
X˜i, Li, X˜ianc, giUser i and analyst
XancAll users
hAll users and analyst
Fig. 1. Practical operation strategy: algorithm flow.
In this operation strategy, each user does not possess the intermediate representations of other users and the analyst does not possess the original anchor data Xanc. Therefore, the original data set Xi cannot be (approximately) obtained by others; that proves the privacy of the original data Xi is preserved in our definition.

Related Works

One possibility for achieving a high-quality analysis while avoiding the difficulties of centralized analysis involves the usage of privacy-preserving computation. There are two types of typical privacy-preserving computation techniques based on cryptography (Jha et al. 2005; Kerschbaum 2012; Cho et al. 2018; Gilad-Bachrach et al. 2016) and differential privacy (Abadi et al. 2016; Ji et al. 2014; Dwork 2006).
Cryptographic privacy-preserving (or secure multi-party) computations can compute a function over distributed data while retaining the privacy of the data. Fully homomorphic encryption (FHE) (Gentry 2009) can compute any given function; however, it is impractical for large data sets with respect to computational cost even using the latest implementations (Chillotti et al. 2016). Differential privacy is another type of privacy-preserving computation that protects the privacy of the original data sets by randomization. In terms of computational cost, these computations are more efficient than cryptographic computations; however, they may have low prediction accuracy because of the noise added for protecting privacy.
Federated learning, involving centralizing a model, has also been studied in this context (Konečnỳ et al. 2016; McMahan et al. 2016). Federated learning achieves a high-quality analysis avoiding the difficulties of centralized analysis by centralizing a model function instead of using cryptography or randomization. However, it may carry a risk of exposing the original data set as a result of centralizing a model for each institution. Therefore, in practice, federated learning is used in conjunction with privacy-preserving computations (Yang 2019).
Our proposed framework differs from these existing approaches as it does not use privacy-preserving computations or a model centralization.

Numerical Experiments

This section presents an evaluation of the recognition performance of the proposed data collaboration analysis method and compares it with that of centralized and individual analyses for classification problems. In our target situation, it should be noted that centralized analysis is just ideal because one cannot share the original data sets Xi. The proposed data collaboration analysis must achieve a recognition performance higher than that of individual analysis and lower, but similar to, that of centralized analysis.
We used kernel ridge regression (Saunders et al. 1998) for the individual and centralized analyses and Step 8 in the proposed method (Algorithm 1). In the proposed method, each intermediate representation is constructed from Xi by kernel LPP (K-LPP) (He and Niyogi 2004). We note that K-LPP is an unsupervised dimensionality reduction; however, the constructed map fi depends on i because it depends on data set Xi. The anchor data set is constructed as a random matrix.
In the training phase, we use the ground truth L as a binary matrix whose (i,j) entry is 1 if the training data xj is in class i. This type of ground truth L is used for several classification algorithms including ridge regression and deep neural networks (Bishop 2006). All numerical experiments were performed using MATLAB 2018b.

Artificial Data

In this experiment, we used a three-class classification of 10-dimensional artificial data. Fig. 2(a) illustrates the first two dimensions of the ground truth. Figs. 2(b–d) illustrate 40 training data points in each user of the first two dimensions with the corresponding labels: ∘, •, and +. For the test data set, we used 201×201 data points whose first two dimensions were square grid points in [1,1]×[1,1]. The remaining eight dimensions of the training and test data sets were random values in [0.1,0.1] generated by the Mersenne Twister. The Gaussian kernel was used for all methods.
Fig. 2. Training data set and ground truth for artificial data: (a) training data set; (b) training data set in user 1; (c) training data set in user 2; and (d) training data set in user 3.
The accuracy (ACC) of centralized analysis and the average ACC of three users of individual and proposed data collaboration analyses are 92.3, 79.8, and 91.3. Fig. 3 presents the recognition results. In each subfigure, white markers: ∘, •, and +, denote training data points. From the comparison between the results of centralized and individual analyses, we observed that the recognition results of individual analysis are significantly poorer than those of centralized analysis because of the insufficiency of data samples. In contrast, the proposed data collaboration analysis achieves results comparable to those of centralized analysis.
Fig. 3. Recognition results for artificial data: (a) centralized analysis; (b) individual analysis in user 1; (c) individual analysis in user 2; (d) individual analysis in user 3; (e) data collaboration analysis (user 1 has test data set); (f) data collaboration analysis (user 2 has test data set); and (g) data collaboration analysis (user 3 has test data set).

Handwritten Digits Data (MNIST)

In this experiment, we used a 10-class classification of handwritten digits (MNIST) (LeCun 1998), where the number of features was m=784. Here, we set 100 data samples for each user and evaluated the recognition performance, normalized mutual information (NMI) (Strehl and Ghosh 2002), accuracy (ACC), rand index (RI) (Rand 1971), for 1,000 test data samples, increasing the number of users from 1 to 50. We used the Gaussian kernel for all methods.
Fig. 4 presents the average and standard error of the recognition performance for 20 trials for each method. It can be seen that the recognition performance of the proposed data collaboration analysis increases with an increasing number of users and achieves a significantly higher recognition performance than individual analysis.
Fig. 4. Recognition performance for MNIST: (a) NMI; (b) ACC; and (c) RI.

Gene Expression Data

In this numerical experiment, we used a three-class classification problem for cancer data from a previous study (Golub et al. 1999). The data set has 38 training and 34 test data samples with m=7,129 features. Here, we considered the case of two users and allocated 19 data samples for each user. Then, we evaluated the recognition performance for 20 trials. A linear kernel was used for all methods.
Fig. 5 presents a three-dimensional visualization of the training + and test ∘ data samples for each method. Table 3 summarizes the recognition performance (average±standard error). In three-dimensional visualization, three classes are well separated in low-dimensional space constructed by the proposed data collaboration analysis as well as centralized analysis. We observed that the proposed data collaboration analysis achieved higher recognition performance than individual analysis for real-world problems.
Fig. 5. Three-dimensional visualization for gene expression data: (a) centralized analysis; (b) individual analysis; and (c) data collaboration analysis.
Table 3. Recognition performance for gene expression data
MethodNMIACCRI
Centralized0.92±0.0097.1±0.0096.6±0.00
Individual0.48±0.0483.7±1.5274.2±2.29
Data collaboration0.76±0.0393.1±0.9189.3±1.68

Remarks of Numerical Results

The results of numerical experiments reveal that the proposed data collaboration analysis achieves higher recognition performance for artificial and real-world data sets than individual analysis. It should be noted that because centralized analysis is ideal, the recognition performance of the proposed data collaboration analysis is not required to be higher than that of centralized analysis.

Conclusions

This paper has proposed a data collaboration analysis framework for distributed data sets based on centralizing individual intermediate representations, while the original data sets and mapping functions remain distributed. This paper has also proposed a practical algorithm within the framework and a practical operation strategy regarding privacy concerns. The proposed framework differs from existing approaches in that it does not use privacy-preserving computations and does not centralize mapping functions. Numerical experiments demonstrate that the proposed method achieves higher recognition performance for artificial and real-world data sets than individual analysis.
In future works, we will investigate the usage of a nonlinear mapping function gi and how to set anchor data to improve recognition performance for large real-world problems.

Data Availability Statement

Some or all data, models, code-generated or used during the study are available from the corresponding author by request. Available items: program codes, data sets used in the numerical experiments.

Acknowledgments

The present study is supported in part by the Japan Science and Technology Agency (JST), ACT-I (No. JPMJPR16U6), the New Energy and Industrial Technology Development Organization (NEDO) and the Japan Society for the Promotion of Science (JSPS), Grants-in-Aid for Scientific Research (Nos. 17K12690 and 18H03250).

References

Abadi, M., A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. 2016. “Deep learning with differential privacy.” In Proc., 2016 ACM SIGSAC Conf. on Computer and Communications Security, 308–318. New York: Association for Computing Machinery.
Bishop, C. M. 2006. Pattern recognition and machine learning (Information science and statistics). Berlin: Springer.
Chillotti, I., N. Gama, M. Georgieva, and M. Izabachene. 2016. “Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds.” In Proc., Int. Conf. on the Theory and Application of Cryptology and Information Security, 3–33. Berlin: Springer.
Cho, H., D. J. Wu, and B. Berger. 2018. “Secure genome-wide association analysis using multiparty computation.” Nat. Biotechnol. 36 (6): 547. https://doi.org/10.1038/nbt.4108.
Dwork, C. 2006. “Differential privacy.” In Vol. 4052 of Automata, Languages and Programming. ICALP 2006. Lecture Notes in Computer Science, edited by M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener. Berlin: Springer.
Fisher, R. A. 1936. “The use of multiple measurements in taxonomic problems.” Ann. Hum. Genet. 7 (2): 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.
Gentry, C. 2009. “Fully homomorphic encryption using ideal lattices.” In Vol. 9 of Proc., 41 Annual ACM Symp. on Theory of Computing, 169–178. New York: Association for Computing Machinery.
Gilad-Bachrach, R., N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing. 2016. “Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy.” In Proc., Int. Conf. on Machine Learning, 201–210. Washington, DC: American Association for the Advancement of Science.
Golub, T. R., et al. 1999. “Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring.” Science 286 (5439): 531–537. https://doi.org/10.1126/science.286.5439.531.
Halko, N., P. G. Martinsson, and J. A. Tropp. 2011. “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.” SIAM Rev. 53 (2): 217–288. https://doi.org/10.1137/090771806.
He, X., and P. Niyogi. 2004. “Locality preserving projections.” In Proc., Advances in Neural Information Processing Systems, 153–160. London: MIT Press.
Imakura, A., M. Matsuda, X. Ye, and T. Sakurai. 2019. “Complex moment-based supervised eigenmap for dimensionality reduction.” In Vol. 33 of Proc., 33rd AAAI Conf. on Artificial Intelligence (AAAI-19), 3910–3918. Palo Alto, CA: AAAI Press.
Ito, S., and K. Murota. 2016. “An algorithm for the generalized eigenvalue problem for nonsquare matrix pencils by minimal perturbation approach.” SIAM J. Matrix Anal. Appl. 37 (1): 409–419. https://doi.org/10.1137/14099231X.
Jha, S., L. Kruger, and P. McDaniel. 2005. “Privacy preserving clustering.” In European Symp. on Research in Computer Security, 397–417. Berlin: Springer.
Ji, Z., Z. C. Lipton, and C. Elkan. 2014. “Differential privacy and machine learning: A survey and review.” Preprint, submitted December 24, 2014. https://arxiv.org/abs/1412.7584.
Jolliffe, I. T. 1986. “Principal component analysis and factor analysis.” In Principal component analysis, 115–128. New York: Springer.
Jurs, P. C., G. A. Bakken, and H. E. McClelland. 2000. “Computational methods for the analysis of chemical sensor array data from volatile analytes.” Chem. Rev. 100 (7): 2649–2678. https://doi.org/10.1021/cr9800964.
Kerschbaum, F. 2012. “Privacy-preserving computation.” In Annual Privacy Forum, 41–54. Berlin: Springer.
Konečnỳ, J., H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, and D. Bacon. 2016. “Federated learning: Strategies for improving communication efficiency.” Preprint, submitted October 18, 2016. http://arxiv.org/abs/1610.05492.
Lasisi, A., and N. Attoh-Okine. 2018. “Principal components analysis and track quality index: A machine learning approach.” Transp. Res. Part C: Emerg. Technol. 91 (Jun): 230–248. https://doi.org/10.1016/j.trc.2018.04.001.
Lasisi, A., and N. Attoh-Okine. 2020. “An unsupervised learning framework for track quality index and safety.” Transp. Infrastruct. Geotechnol. 7 (1): 1–12. https://doi.org/10.1007/s40515-019-00087-6.
LeCun, Y. 1998. “The MNIST database of handwritten digits.” Accessed January 15, 2019. http://yann.lecun.com/exdb/mnist/.
Li, X., M. Chen, F. Nie, and Q. Wang. 2017. “Locality adaptive discriminant analysis.” In Proc., 26th Int. Joint Conf. on Artificial Intelligence, 2201–2207. Palo Alto, CA: AAAI Press.
McMahan, H. B., E. Moore, D. Ramage, S. Hampson, and B. Agüera y Arcas. 2016. “Communication-efficient learning of deep networks from decentralized data.” Preprint, submitted February 17, 2016. https://arxiv.org/abs/1602.05629.
Pearson, K. 1901. “LIII. On lines and planes of closest fit to systems of points in space.” London, Edinburgh, Dublin Philos. Mag. J. Sci. 2 (11): 559–572. https://doi.org/10.1080/14786440109462720.
Rand, W. M. 1971. “Objective criteria for the evaluation of clustering methods.” J. Am. Stat. Assoc. 66 (336): 846–850. https://doi.org/10.1080/01621459.1971.10482356.
Saunders, C., A. Gammerman, and V. Vovk. 1998. “Ridge regression learning algorithm in dual variables.” In Proc., 15th Int. Conf. on Machine Learning (ICML’98), 515–521. Burlington, MA: Morgan Kaufmann Publishers.
Strehl, A., and J. Ghosh. 2002. “Cluster ensembles—A knowledge reuse framework for combining multiple partitions.” J. Mach. Learn. Res. 3: 583–617. https://doi.org/10.1162/153244303321897735.
Sugiyama, M. 2007. “Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis.” J. Mach. Learn. Res. 8: 1027–1061.
Sugiyama, M., T. Idé, S. Nakajima, and J. Sese. 2010. “Semi-supervised local Fisher discriminant analysis for dimensionality reduction.” Mach. Learn. 78 (1–2): 35. https://doi.org/10.1007/s10994-009-5125-7.
Tarca, A. L., R. Romero, and S. Draghici. 2006. “Analysis of microarray experiments of gene expression profiling.” Am. J. Obstetrics Gynecol. 195 (2): 373–388. https://doi.org/10.1016/j.ajog.2006.07.001.
Tichy, N. M., M. L. Tushman, and C. Fombrun. 1979. “Social network analysis for organizations.” Acad. Manage. Rev. 4 (4): 507–519. https://doi.org/10.5465/amr.1979.4498309.
Yang, Q. 2019. “GDPR, data shortage and AI.” In Invited Talk of the 33rd AAAI Conf. on Artificial Intelligence (AAAI-19). Palo Alto, CA: AAAI Press.

Information & Authors

Information

Published In

Go to ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering
Volume 6Issue 2June 2020

History

Received: Jul 3, 2019
Accepted: Nov 20, 2019
Published online: Feb 28, 2020
Published in print: Jun 1, 2020
Discussion open until: Jul 28, 2020

Authors

Affiliations

Associate Professor, Dept. of Computer Science, Univ. of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan (corresponding author). ORCID: https://orcid.org/0000-0003-4994-2499. Email: [email protected]
Tetsuya Sakurai [email protected]
Professor, Dept. of Computer Science, Univ. of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share