Spark is an opensource, in-memory and iterative computing engine that can run on any Hadoop cluster, just like Hama. Recently, it got a lot of attention due to its capabilities to outperform Hadoop in iterative algorithms. Spark provides a clean programming interface and users write distributed programs as if they were doing serial implementation. This clean design makes Spark more appropriate than Hama from usability perspective. However, in Hama, developer has a fine control on synchronization and communication between nodes whereas in Spark, everything is handled internally. Furthermore, Hama being based on BSP has strong theoretical background where Spark is still evolving in this domain. From performance perspective, both perform equally more or less for batch and iterative algorithms and it depends on the prob- lem at hand. For example in [1], both Spark and Hama perform equally while performing range join, where as Hama is better in computing top-k join on a large dataset.

Old Spark vs Hama benchmark:


[1] – J. Huang, R. Zhang, R. Buyya, J. Chen, and Y. Wu. “HEADS-JOIN: Efficient Earth Mover’s Distance Similarity Joins on Hadoop.”