Introduction

Multidimensional Scaling (MDS) is an algorithm for constructing mappings of the given points in a target dimensional space based on pairwise proximity information. It is a compute and data intensive application, which requires O(N2) computation as well as memory. For the purpose of large-scale data visualization via MDS algorithm, In addition to parallelization of MDS as shown in HP-MDS, we also develop interpolation method (a.k.a. out-of-sample method) of MDS, which reduce computational complexity and memeory requirement of MDS dramatically.

As shown in the above figure, the idea of MDS interpolation method is two-step approach. As a preprocessing, we divide the given data set (total N points) into In-Sample data set, which has n points, and the remaining Out-of-Sample data set, which has N - n points. First, we run full MDS with In-Sample data set, and the points in the Out-of-Sample data set are mapped in the target dimension based on the mapping result of In-Sample data set via full MDS method. The interpolation method (we called MI-MDS) reduces the computational complexity from O(N2) to O(n(N-n)). It also reduces the memory requirement from O(N2) to O(n), since the proposed interpolation method finds a mapping for an interpolated point based on the mappings of In-Sample data points and the distances between the interpolated point and the points in In-Sample data set.

For the details of the proposed MDS interpolation method, please refer to our HPDC 2010 paper.

Download

Source code is available from here.

Building and Dependency

MI-MDS implementation is developed under .NET framework 3.5 and you can recompile or build through MS Visual Studio 2008 or higher version. As shown in the paper, we implemented MI-MDS in hybrid parallel model, which combines MPI and threading, so you need to configure the following runtimes appropriately:


The DLL file for CCR library could be found in bin/Debug/ directory.

User Guide

You can run the MI-MDS application by using command as follows:

[USAGE]: > mpiexec -n [#Proc] hybrid_MIMDS.exe [sampleMappingFile] [sampleDataFile] [outOfSampleDataFile] [labelFile("NoLabelFile")] [outputFile] [threshold] [origDim] [targetDim] [#NN] [numSample] [#thread] [ignore1?] [ignore2?] [stress?]

The following is the parameter list which should be included to run MI-MDS:

arguments:
   args[0]: sampleMappingFile - prior MDS mappings of the in-sample data.
   args[1]: sampleDataFile - in-sample data.
   args[2]: outOfSampleDataFile - out-of-sample data to each process.
   args[3]: labelFile - label file (or "NoLabelFile")
   args[4]: outputFile - output file name.
   args[5]: threshold - threshold value for the stop condition.
   args[6]: origDim - the original dimension number
   args[7]: targetDim - the target dimension number
   args[8]: #NN - the number of nearest neighbors for interpolation.
   args[9]: numSample - the number of points of in-sample data.
   args[10]: numThread - the number of threads in each process.
   args[11]: ignore1? - 1: ignore first line of sampleData, 0: otherwise
   args[12]: ignore2? - 1: ignore first line of outSampleData, 0: otherwise
   args[13]: stress? - 1: skip STRESS calculation, 0: not skip