© 2020 by the authors.Accurate object classification and position estimation is a crucial part of executing autonomous pick-and-place operations by a robot and can be realized using RGB-D sensors becoming increasingly available for use in industrial applications. In this paper, we present a novel unified framework for object detection and classification using a combination of point cloud processing and deep learning techniques. The proposed model uses two streams that recognize objects on RGB and depth data separately and combines the two in later stages to classify objects. Experimental evaluation of the proposed model including classification accuracy compared with previous works demonstrates its effectiveness and efficiency, making the model suitable for real-time applications. In particular, the experiments performed on the Washington RGB-D object dataset show that the proposed framework has 97.5% and 95% fewer parameters compared to the previous state-of-the-art multimodel neural networks Fus-CNN, CNN Features and VGG3D, respectively, with the cost of approximately 5% drop in classification accuracy. Moreover, the inference of the proposed framework takes 66.11%, 32.65%, and 28.77% less time on GPU and 86.91%, 51.12%, and 50.15% less time on CPU in comparison to VGG3D, Fus-CNN, and CNN Features. The potential applicability of the developed object classification and position estimation framework was then demonstrated on an experimental robot-manipulation setup realizing a simplified object pick-and-place scenario. In approximately 95% of test trials, the system was able to accurately position the robot over the detected objects of interest in an automatic mode, ensuring stable cyclic execution with no time delays.