CNNs are the most common branch of Deep Neural Networks (DNNs), and they are structures with a strong capability for feature extraction. By using CNNs, a nonlinear model is trained to map an input space to a corresponding output space. These high-performance CNNs come with a high computational cost and the need for huge memory storage due to the chains of many Convolutional Layers (usually more than 50 layers). To address these issues, a variety of algorithms have been proposed in recent years. In this research, we present a solution that is a combination of several different approaches. and based on matrix optimization, parameters binary quantization, and data parallelism programming techniques. We show that our method significantly outperforms the current conventional PyTorch convolution operation with less memory usage and better computational budget when tested in different scenarios.