High Efficiency Video Coding (HEVC), the latest video compression standard, will play an important role in many multimedia applications in the foreseeable future. Its superior compression performance enables HEVC to be particularly suitable for high-definition videos in consumer electronics environments; however, it comes with the price of substantially high computational complexity. The HEVC encoding process, especially the Motion Estimation (ME), is very time consuming, which makes HEVC impractical for real-time applications at the moment. In this work, a hybrid encoding architecture with a set of algorithms is proposed, exploring a Graphics Processing Unit (GPU) to perform both uni-predictive and bi-predictive ME in a highly parallel manner. This reduces the complexity of the uni- and bipredictive ME on the Central Processing Unit (CPU) by up to 99% and 95% respectively, and brings significant overall time savings of up to 57.65% and 54.16% for the low delay P and the random access coding configurations respectively. The Rate-Distortion (RD) performance is only marginally affected in both cases.