Today, with the growth of highly parallel and heterogeneous architectures, systems composed of a combination of multicore CPUs, GPUs, and accelerators are becoming more common in HPC. Although heterogeneous architectures bring considerable benefits from a performance and energy perspective, they also make application development very challenging introducing the necessity of different parallel programming paradigms.
Recently, in order to fully harvest the computational capabilities of such architectures, researchers focused their attention on software development tools to simplify the daunting programming task. In a similar line of investigation, this dissertation tackles the optimization and simplification of programs for heterogeneous computing systems. In the context of low-power architectures, we analyze the performance and energy advantages of embedded GPUs showing the benefits of this architecture for HPC workloads. In order to maximize the performance of heterogeneous compute nodes, we investigate a new compiler/runtime approach to generate programs that concurrently use all the heterogeneous resources and we propose two low-complexity heuristics addressing the problem of scheduling independent tasks. Finally, to simplify the development of heterogeneous distributed applications, we present libWater, a library-based extension of the OpenCL programming model that, with a simple interface, abstracts the underlying distributed architecture without losing control over performance.