In this work, the focus is on price prediction and concurrent strategy building. The modelling approach chosen for this is of the deep reinforcement learning type, and actor-critic class. Specifically, in this work the proximal policy optimization (PPO) architecture is employed individually on each stocks market history in order to try and solve the price prediction problem. A custom RL environment was built to run the proposed experimental sequence and to test which parameter values should be used in regards to learning rate, discount factor, feature space, action space, and look-back length. These values were subsequently used for experiments on different datasets, exploring the portability of the model, effect of transfer learning, as well as portability of the parameter configuration. The results show our experimental sequence can be effectively used for the price prediction problem, and in some instances outperform a practical B&H strategy.