This masters thesis gives an introduction of Apache Hadoop and Apache Hama. Hadoop and Hama are free cluster systems, which support data-intensive distributed applications. The thesis compares the MapReduce framework of Hadoop to the BSP framework of Hama. Both programming models help users to develop applications for huge datasets.
The main focus of this thesis lies on the support of Graphics Processing Units (GPUs) on Hadoop and Hama. Therefore it presents the Rootbeer GPU compiler, which is able to convert Java code into CUDA and provides a (de)serialization framework. The thesis shows how Rootbeer can be seamlessly integrated into Hadoop and Hama. It introduces multiple Rootbeer extensions to support the hybrid execution of CPU and GPU tasks on Hama. The com- munication between CPU and GPU in Hama is realized by Hama Pipes. Hama Pipes has also been developed in the scope of this masters thesis and provides a connection between C++ and Java by a socket connection. Hama Pipes has been adapted from Hadoop Pipes and includes extensions such as the support of C++ function templates and generic type parameters.
Finally this thesis presents four different scientific applications: Monte Carlo Pi Estimation, Matrix Multiplication, K-Means clustering and Collaboration Filtering for Recommendation Systems. The CPU and GPU performance of these experiments is evaluated and concluded by the possible speedup and efficiency.