CPU Performance Challenges , Parallel Computing and Computer Vision

This particular post is aimed to make some background related to Deep Learning  in edge computing, Computer Vision, AI etc. This covers the various industry standards to meet high performance demand of advanced applications.

First of all, Lets try to understand the  challenges .

Processor Performance challenges

As you know , Any application program consists of number of instructions . The primary goal of cpu performance is to reduce the  application execution time which depends on instruction execution time. Instruction execution depends on cpu clock cycle.

However increase in CPU clock frequency is almost stuck now and not even reached  10Ghz.

What is stopping cpu frequency scaling?

Lets look into  Power consumption formula of a  silicon device  :   P  =  Cx V^2 x F

Where P= Power consumption, C is the  dynamic capacitance being switched per clock cycle, V is voltage, and F is the processor frequency (cycles per second).

Interestingly switching of transistor depends on charge which in turn depends on the current. Current is proportional to voltage . So this switching speed (frequency)  is proportional to Voltage.  Increase in frequency will increase multiple time power consumption. This resulted  halt in cpu frequency scaling.

This resulted industry shift to parallel scaling or computing in the form of multi-core processor.

Another  challenge of silicon industry is  fading of Moore’s law .

What is Moore’s Law ?

Moore’s law is the prediction , made by  Gordon Moore (co-founder of Intel) , that number of transistors per square inch on  a silicon device will double every two year with cost reduced to half.   Initially , in 1965 , he predicted the number of transistors to be doubled every year and it was modified decade later.

Gordon Moore; Electronics Magazine Vol. 38, No. 8 (April 19, 1965)

The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer.

Many experts believe that Moore’s law has slowed due to physical limitation.

Industry shift to low core and IO voltages is helping to reduce power consumption. This is based on the fact that energy required to switch a transistor  from 0V(Logic 0)  to 3.3V/1.8V (logic 1) is less compared to 5.0V(logic 1) .  This reflects in various process geometry    40nm(5V),  28nm(3.3V)  , 14nm(1.8V) and 10nm (1.2V) 

As I discussed above shift from frequency scaling to parallel computing , Lets try to understand the parallel computing trends .

Parallel Computing

As there are significant developments in various technologies  such as high memory bandwidth ,  5g technology, Media processing  in Artificial Intelligence etc .  The importance of computational power has increased.  Following are the relevant topics based on WiKi input:

  1. GPU (Graphic Processing Unit)-1970 is a parallel multi-core computing device for image processing. It can be part of a CPU , Video card or PC mother board. It is used in embedded systems, Mobiles, PC, Gaming consoles etc . GPU contains specialized hardware for Vector Graphics.
  2. GPGPU or GPGP (General Purpose GPU)-1970  is basically use of GPU in traditional computation by CPU for applications
  3. OpenGL ( Open Graphics Library ) -1992 is a cross-language, cross-platform API for  2D /3D vector graphics. It is generally used to interact with GPU for  achieving hardware-accelerated performance.
  4. Microsoft DirectX -1995 is a collection of APIs for handling tasks related to multimedia.
  5. OpenMP (Open Multi-Processing) -1997 is an API which supports muilti-platform shared memory multi-processing programming in C, C++ and Fortran. It consists of Compiler Directives, Run-time Library routines and environment variables. It has been implemented in many major platforms such as Linux, Windows , MacOS etc.
  6. VPU(Video Processing Unit) is used for video encoding and decoding .
  7. VPU(Vision Processing Unit) is similar to VDU with  video processing unit with capabilty to run machine vision algorithms such as CNN (convolutional neural networks), SIFT (Scale-invariant feature transform),…, etc.

Current Trends in Parallel Computing

  1. CUDA -2007 Nvidia introduced this in the beginning as Compute Unified Device Architecure. It is a parallel computing architecture which provides a programming model for GPU utilization. It supports  programming languages like C, C++ , Fortran and environment such as OpenCL , DirectX Compute etc.
  2. OpenCL(Open Computing Language) -2008:is an open standard for writing programs to be executed in heterogeneous platforms consisting of CPU, GPU, DSP and FPGA . It supports C and C++. There is a Python support for this known as PyOpenCL
  3. OpenVX -2014 is a open, royalty free standard for cross-platform acceleration of Computer Vision applications. This is aimed for embedded and real-time programs within computer vision and related scenarios. It uses a connected graph representation of operations.
  4. Intel OpenVINO (Open Visual Inference & Neural Network Optimization)-2018 is a  toolkit designed to achieve fast track development  of  Computer Vision and deep learning at the edge . It was earlier known as  Intel Computer Vision SDK. It includes optimized calls for OpenCL and OpenVX.

Based on above , One thing I can conclude that there  is still scope of programming languages like C, C++ and Fortran in implementing optimized software to achieve high performance.

Reference :

      CUDA Developer Zone 

       Moore Law

Thanks for reading till end. I am trying to improve usability of my  site. Did you find this discussion helpful ? If so,  Please subscribe to YouTube channel Embedkarias well for additional embedded related stuff. This particular post is discussed  at Parallel Computing Video.



Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.