During 1980s, a lot of parallel architectures were developed to meet commercial needs but each architecture had to be modified according to the architectural properties of the machine. All of these architectures failed to achieve what von Neumann [1] computer achieved for sequential computing. Hardware designers were focused on meeting the requirements of von Neumann machines without much concern about the software. Software industry on other hand was creating software’s that worked on this model without explicitly worrying about the hardware layer. So, this model acted as a bridge between software and hardware.

Bulk Synchronous Parallel (BSP) programming model is also an analogous bridge but for parallel computation, introduced by Valiant and Leslie G [2]. A BSP computer can be defined by p processors, each with its local memory, connected via some means of point-to-point communication [3]. A BSP solution runs on a BSP computer as a series of parallel running computation units. These computation units can be termed as processes. The overall computation done in a BSP based solution can be divided into multiple supersteps and each consists of the following 3 steps in sequential order. Figure below shows a single superstep.

  • Concurrent Local Computation Each process performs some execution locally on the data stored in its local storage.
  • Global Communication After the local processing, processes have the option to send some data to other processes in the same machine or to a different one in a parallel computer.
  • Barrier Synchronization When all the processes which want to send the data are done sharing, BSP model enters the barrier synchronization mode. During this mode, it is made sure that all the communication is performed successfully and the new data becomes available to the processing units for the next superstep in their local memory.


This picture represents a supersteps. First all the processes perform local computation then they communicate with other processes and finally during barrier synchronization all the data is transferred to the local memory of processes. After barrier synchronization next superstep starts

When a process enters the barrier synchronization, it implicitly means that it has completed its local computation, all the messages have been sent out and it is waiting for other processes to finish. On the other hand when a process leaves the barrier, it means that the process will start a new superstep, all other processes have also moved to new superstep and all the messages that were directed to this process by others have been received as input to this new superstep [4]. These supersteps abstract the complexities of managing local memory, communication between machines and synchronization. BSP model makes writing parallel applications simple, programs become portable (can work on any hardware architecture without change) and the performance of application becomes predictable [2].

Apart from the properties mentioned above, estimating the running time of BSP based algorithm is also simple. Let ws be the total time spent by a process to perform local computation in a superstep s. Let hs be the total communication cost of superstep and l be the average time required for barrier synchronization of a process. Then the algorithm running S supersteps will have a total computation cost of


, total communication cost of


and total synchronization cost of S.l. So, an algorithm based on BSP model will have the following running time.

W + H.g + S.l

Where g is the ability of communication network to deliver data. g is defined as the time is takes a processor to deliver a message of size 1. So, if the size of message is bigger than 1 then it will take more time to be delivered. Practically, the values of g and l are calculated empirically. Since g and l are dependent on the underlying system, algorithm design should focus on improving W, H and S.

[1] – https://en.wikipedia.org/wiki/Von_Neumann_architecture

[2] – L. G. Valiant. “A bridging model for parallel computation.” In: Communica- tions of the ACM 33.8 (1990), pp. 103–111.

[3] – M. Felice Pace. “BSP vs MapReduce.” In: arXiv preprint arXiv:1203.2081 (2012).

[4] – https://people.apache.org/~tjungblut/downloads/hamadocs/ApacheHamaBSPProgrammingmodel_06.pdf