Every structure should divide the information that they store across processors, considering that in our context multiple distributed structures can interact with each other and dynamically change, our particular focus is in mapping structures into common domain and decomposition and distribution strategies over this domain. Once this domain is defined two main approach are considered inside OpenFPM.
- Domain decomposition
- Generate graph model
- Find optimal decomposition
- Post process the decomposition
- Domain decomposition
Data driven decomposition
- Divide the total information in equal chunks
The Model decomposition approach try to model the computation and comunication across processor given a certain decomposition and try to find the decomposition that minimize the communication balancing the computation.
The Fixed decomposition just divide the domain as the user specified and assign 1 sub-domain to each processor
The data driven decomposition instead consider the volume of the information the structure store and divide the volume equally them equally regardless of the communication.
In general the concept of common domain is not defined in general, can go from different way to re-index the information so from multi-index concept to mapping the information into common spaces discrete or continuos, In the following we will consider structures that map over N-dimensional spaces because at the moment is our main but not only interest
In a distributed-memory setting, where data are scattered across processors, two factors are important: equal division of work across processors and reduction of the communication overhead. A typical approach, is to formulate the problem as a graph-partitioning problem: the domain is divided into sub-sub-domains (Domain decomposition) (vertices of the graph), each of them carrying a weight modelling the computational cost. The communication pattern between sub-domains is represented as links between the sub-sub-domains (edges of the graph) with weights formalizing the communication cost (Generate graph model). The requirement of balanced computation with minimal communication then translates to an optimization problem of finding a graph partitioning where each group contains the same sum of weights, and the sum of the cut edges is minimal (Find optimal decomposition). As final step the decomposition can be post-process further more to be more optimal and based on factors not considered by the optimization process, like merging vertex (sub-sub-domain) to create bigger sub-domains. Even if a model decomposition it is not bind to a graph model it is true that until now seem to be the best approach in scientific computation
The decomposition is chosen by the user and kept fixed.
Data Driven decomposition
Data is divided across processors, It is the default decomposition when a decomposition is not available yet, it basically divide the data consistently across processors regardless of geometrical meaning or communication
Every sub-domain has an extended part called ghost
With the dashed red box we indicate the local domain while with the othe color boxes we indicate the external Ghost boxes that cover this extended part, these boxes are indicated by G5_0 G8_0 G9_0 and G9_1. These boxes can be calculated as the intersection of the extended domain with the adjacent sub-domains. Note also that each external ghost box of a sub-domain is an internal ghost box on another processor sub-domain
The following image show the internal ghost boxes of the same local processor indicated with B5_0 B8_0 B9_0 and B9_1. The production of these ghost boxes can be produced as the intersection of the local sub-domain with the expanded adjacent processor sub-domains.
Each distributed structure in general define a function to synchronize the ghost part