Tuesday, July 12, 2011

Performance Enhancement with Core Technologies

Introduction
When I was a 10 year old boy, I made a shooting game running on Z-80a processor. It was mostly written in BASIC and little bit in Assembly. I designed all the game scenario, bitmaps pixels for the game characters displayed on the graphic unit, V9938D and sound effects played through the PSG sound device. It was my first programming.

As I remember, the the Z-80a processor had a clock rate 4.77 MHz. Since then the processor speed had incredibly upgraded. In 2002, an Intel Pentium 4 model was introduced as the first CPU with a clock rate of 3 GHz. But recently it seems that the clock rate almost reached to its limit because it is hard to see any CPU faster then 3.5 GHz today even though the highest clock speed microprocessor ever sold commercially to date is found inside IBM's zEnterprise 196 mainframe, introduced in July, 2010. The z196's cores run continuously at 5.2 GHz.

To avoid the limit of the clock rate, CPU manufactories are approaching to SIMD and multi-core technologies and now we even have dual core smart phones.


SIMD extension

Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. from WikiPedia
Flynn's Taxonomy of a SImD design: single instruction, multiple data. Each "PU" (processing unit) does not necessarily correspond to a processor, just some functional unit that can perform processing. The PU's are indicated as such to show relationship between instructions, data, and the processing of the data.
SIMD extension is a very useful technology when it is applied to multimedia processing. I have done some of multimedia related software development and have experienced the advantage of SIMD extension. You can see an example of my SIMD extension related work here.


Parallel Processing
As I mentioned above, The clock rate has reached almost its limit thus microprocessor manufactures are finding their solution in multi-core technology.
OpenMP is one of famous parallel processing library and Microsoft recently introduced parallel pattens library in Visual Studio 2010. I introduce these two libraries here.

OpenMP
The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows NT platforms. Jointly defined by a group of major computer hardware and software vendors, OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.

int main(int argc, char *argv[])
 {
   #pragma omp parallel  
   printf("Hello, world.\n");
   return 0;
 }

You can see more detail about OpenMP and download it at http://openmp.org/

Microsoft Parallel Patterns Library
The PPL introduces a set of task-oriented parallelism constructs as well as a number of parallel algorithms similar to what is available with OpenMP today. The PPL algorithms are, however, written using C++ templates rather than pragma directives and, as a result, are far more expressive and flexible. The PPL is, however, fundamentally different from OpenMP in the sense that the PPL promotes a set of primitives and algorithms that are more compassable and reusable as a set of patterns. Meanwhile, OpenMP is inherently more declarative and explicit in matters such as scheduling and ultimately is not part of C++ proper. The PPL is also built on top of the Concurrency Runtime, allowing for greater potential interoperability with other libraries based on the same runtime. Let's look at the PPL algorithms and then see how you can use the underlying functionality directly for task-oriented parallelism.


 - writing in progress.... to be finished soon...

No comments:

Post a Comment