This paper presents an improved hardware implementation of a 16-bit ARX (Add, Rotate, and Xor) engine for one of the CAESAR second-round competition candidates, Pi-Cipher, implemented on an FPGA. Pi-Cipher is a nonce-based authenticated encryption cipher with associated data. The security of the Pi-Cipher relies on an ARX based permutation function, which is denoted as a Pi-function. The proposed ARX engine has been implemented in just 266 slices, which includes the buffers of the input and the output. It can be clocked at 347 MHz. Also, in this paper, a message processor based on the proposed ARX engine is introduced. The message processor has been implemented in 1114 slices and it can be clocked at 250 MHz. The functionality of the proposed ARX engine was verified on the Xilinx Virtex-7. The new design of the ARX engine allows for almost four times speedup in performance while consuming only 17% larger area than previously published work. We extend our message processor implementation by using parametrized reconfiguration technique after which an area reduction of 27 slices is observed.