Home

User login

 

Introduction

The purpose of MDS is dimension reduction for mapping high dimensional data into low dimensional space, normally two- or three-dimensional space, to visualize them using dissimilarity information between each data points. We will demonstrate the highly scalable parallel implementation of SMACOF, which is an MDS algorithm based on EM like iterative method, using MPI and C# language. Using the MPI_SMACOF implementation, we visualize two real application data of chemical compound and biological data, which are in 155-dimensional space and 957–dimensional space respectively, into 3–dimensional space.

 
Biological sequence data (distance [dissimilarity] matrix data).

Mapping result with 3000 data points in 957-dimensional space.

Chemical informatics data (vector-based or distance matrix data).

Mapping result with 333 data points in 155-dimensional space.

Randomly generated Gaussian Distribution data (with 8 different centers)

Mapping result with 4096 data points in 4D space into 3D space.

Performance results of benchmark data (4D Gaussian Distribution data)

Single multicore machine(Shared memory machine)

Multicore clusters (Distributed-memory machine)

 

 

Parallel Implementation

  • Shared-memory Parallelism: for single multicore machine, using thread dealt with CCR.
    • The Concurrency and Coordination Runtime (CCR) is a lightweight port-based concurrency library for C# 2.0 developed by Microsoft.
    • Less communication and High efficiency.
    • Limited by the number of cores and the memory size.
  • Distributed-memory Parallelism: for (multicore) clusters, using MPI.
    • The program can increase the number of cores as much as given in a multicore cluster unless the parallel overhead will be more significant than computation gain.
    • Also, it can run with bigger data than the shared-memory version since it can divide the distance matrix into a number of sub-matrices and distribute them to corresponding processes.
    • More communication than the shared-memory parallelism.