Skip to content

[Distributed] Actor Hierarchy

murmex edited this page Jun 2, 2011 · 3 revisions

Actor Hierarchy

The graph processing uses an internal topology of actors to manage the communication between the vertices and the graph. Two cases depending on the configuration of the graph are possible: the parallel processing case, specific to when the processing occurs on only one node, and the distributed processing case, when the processing is distributed between several nodes.

In both cases, we assume a mapping of one worker per core on each node.

Parallel Processing

In parallel processing, we have three types of actors:

The role of the graph master is to instantiate several workers and split the vertices between those workers. It only lives in the setup phase/

The roleof the graph is to handle the synchronization of the processing steps.

The role of a worker is to execute the processing of each of its vertices as well as the messaging between its vertices and the workers of other vertices. Its parent in the actor topology is directly the graph so vertical messages are sent/received directly to/from the graph.

Distributed Processing

The distributed processing is a generality of the parallel processing. We have four types of actors:

  • A graph
  • A graph master
  • One cluster service per node (for all the graphs)
  • One foreman per node (for each graph)
  • Several workers on each node

The graph master is in charge of contacting the cluster services of the nodes given in the configuration in order to create a foreman specific to the graph on each node. The foreman instantiates the workers on the node and sends their ActorRef to the graph master so that it can handle the partitioning of the vertices.

The graph controls the synchronization of the whole processing. It only needs to be linked to its children to pass along the synchronization messages.

The foreman merely act as the post-office for all the communication occurring between the nodes. It is also used to limit redundancy in the messaging between the graph and the workers. The parent of the foremen is the graph.

The role of the workers in the distributed processing is the same as in the parallel processing, with the exception that their parent is the foreman running on the same node.

Possible Alternative

This section describe a different topology for the actors based on some supposition made about the current implementation. See also the similar section concerning the Message Routing that could be applied to inter-vertices communication instead of inter-worker communication.

One issue with this design is that the processing of the vertices managed by the same worker is still sequential. If the workload of one worker is more important than another, it means that some cores of the node might not be fully used during the extra work.

The proposed alternative would be to make the vertices themselves actors and integrate the different roles of the worker in them. Instead of mapping one worker/actor per thread, we would use an Akka Dispatcher that would manage the processing of the vertices through a thread pool.

A new type of actor could be introduced for the crushing step. There would be one per core and they would perform the reduction on a partition of the local vertices.

The foreman would still be used for communicating between graph and the nodes to avoid the duplication of messages.