Parallel processing with treeClust
McKechnie, I. Taylor
Buttrey, Samuel E.
Whitaker, Lyn R.
MetadataShow full item record
Clustering data is one of the most common statistical and machine learning techniques for analyzing big data. Clustering can be particularly difficult when the data sets include categorical, missing, or noise variables. The tree clustering algorithm developed by Samuel Buttrey and Lyn Whitaker, as described in the December 2015 issue of The R Journal, seems to provide a solution to these problems, but it requires a large set of overhead computations. This issue is intensified when working with high-dimensional data because the extent of treeClust’s overhead computations are based on the dimensions of the data. High performance computing (HPC) and parallel processing present a solution to this overhead computation burden, but treeClust’s existing parallel processing method does not work on the Naval Postgraduate School’s HPC, the Hamming Supercomputer (HSC). Furthermore, correctly determining what HPC resources to use can be a difficult task. In this thesis, we present a new HSC-specific method for parallel processing data using the treeClust R package developed by Buttrey and Whitaker. Based on the results of our experiments, our method approximates the optimal resource HPC request, so that users realize the best run time when using treeClust on the HSC.
Approved for public release; distribution is unlimited
Showing items related by title, author, creator and subject.
Oral, Sabri Onur (Monterey, California. Naval Postgraduate School, 1991-09);The process of finding an exact minimization for a multiple-valued logic (MVL) expression requires an extensive search and enormous computation time. One of the heuristics to reduce this computation time is the Neighborhood ...
Zhou, Hong; Wang, Qi; Sircar, Sarthok (2005);We solve the Smoluchowski equation for steady state solutions of rigid nematic polymers and suspensions under imposed elongational flow, magnetic or electric fields, respectively. Under the three imposed fields, we show ...
Neta, Beny (Monterey, California. Naval Postgraduate School, 1988-11); NPS-53-89-001There are many articles discussing the solution of boundary value problems on various parallel machines. The solution of initial value problems does not lend itself to parallelism, since in this case one uses methods that ...