iCosts of optimising
- hard to write nice software, its what universities teach against
- opposite of clean code 

Choosing the optimal platform
2.1 Choice of hardware platform
- Less important than it used to be
The distinctinos between RISC and CISC processors, between PCS mainframes 

Graphics Accelerators
- the cjoice of platform is influenced by the requiremens of the task in quwestion

2.2 Choice of Microprocessor
The benchmark performance of comppeting brachs of microprocessors are very similar 
Processors with multiple cores are advantageous for applications that can be divided into multiple threads that run in parallel.
Small lightweight processors with low power consumption are actually quite powerful and may be sufficient for less intensive applicainots

x86 microprocessors can run in both 16-bit, 32-bit and 64-bit mode


2.5 Choice of C++ compiler
c++ comopilers are becoming moe and more complicated
new features in the c++ language have also added to the complexity 
1) Microsoft Visual Studio Comiler
2) Gnu or Gcc
3) Clang

2.6 Choice of function libraries
Some applications spend most of their execution time on executing library functions
time consuimng library functions are usually:
1) File input/output
2) Graphics and sound processing
3) Memoruy and string manipulation
4) Mathematical functinos
5) Encryption and data compression

Standard libraries are not always fully optimised


Drawbacks of C++

Portability
C++ is fully portable in the sense that the syntax is fully standardixed and supported on all major platforms
C++ also allows direct access to hardware interfaces and system calls, and these are of course system specific.
In order to facilitate porting between platforms, it is recommened to place the user interface and ogher system specific parts of the code in a separate module, and out the task specific part of the code, which supposedly is system-indepnedent in another module

Development Time
Use reusable classes and consistent modularity to develop quicker

Security
The most serious problem with C++ is security.
C++ implementsations have no checking for array bound violations and invalid pointers
It is necessary to adhere to certain practices to prevent such matters

- user references instead of pointers, initalise pointers to zero, setting pointers to zero when objects they point to become invalid, avoid pointer arithmetic and pointer type casting
Linkedlists and other data structures tupically user pointers may be replaced and more efficient with container class templates
A missing check for a buffer overflow of input data is an error often exploited, overflowing a buffer means you can view information in other parts of the system
To prevent such errors, use the STL library, however, the STL library often allocates memory in an inefficient way. It does it by dynamic memory allocation


3 Finding the biggest time consumers
3.1 How much is a clock cycle
CPU is not quite related to time, but rather the speend of the CPU
CPU is the reciprocal of the clock frequency

3.2 User a profiler to find hotspots
Before you optimse anything, you have to use a profiler on your system
99% of the time is spend on innermost loop doing, or in reading input or output, or mathematical functinos
optimise the parts of the code that matter the most

Profiling methods:
1) Instrumentation: The compiler inserts extra code at each functino call to count how many times the function is called and how much time it takes
2) Debugging: The profiler inserts temporary debug breakpoints at every funciton or every code line
3) Time based sapling, event based sampling 

Dynamic linking and position independent code: dynamic is slower


5 Choosing the optimal algorithm
sorting, searching, mathematical calculations

7 The Efficiency of different C++ constructs
You gotta know how program code is translated into machine code and how the microprocessor handles this code

Data Caching is important: data caching is poor if data is scattered randomly around in the memory. It is therefore important to understand how variables are stored. the storage principles are the same for variables, arrays, and objects

Storage on the stack
- variables and obejcts declared inside a function are stored on the stack
- stack is first in, last out fashion
- it is used for storing function return addresses , function paramaters, local variables, and for sainv registers that have to be restored before the functionr eturns 
- everytime a function is called, it allocates the required amount of space on the stack for all these purposes
- this memory space is freed twhen the function returns 
- The net time a function is called, it can use the same space for the parameters of the new functino
- The stack is the most efficient memroy space to store data in, becuse the same range of memory addresses is reused again and again
This part of the memory is mirrored in the level01 data cache if there are no big arrays
- The lession we can learn from this is that all variables and objects hould preferably be declared inside the funciton in which thye are used

Global or Static storage
Variables that are declared outside of  any function are called global variables
They can be accessed from any function 
Glibal variables are stored in the static part of the memory
The static memory is also used for variables declared with the static keyword, for floating point constatncs, string constancs, array lists, switch statement ump tables, and virtual functio tables
The static data area is usually divided into three parts:
	1) Constatns that are never modified by the program, one for initialised variables that my be modified by the program, and one for unitialised variables which could be modified
The advantage of static data is that it can be initalised to desired vales before the program starts
The disadvantage is that the memory space is occupided theought the whole pgoram execution, even if the variable is aonly used in a small part of the program
This makes data caching less efficient because the memory space cannot be reused for another purpose

Do not make variables global if you can avoid it
Gloable variables may be needed for communication ebtween different threads, but that is the only situation where they are unavoidable

It is often preferable t declare a lookup table costatn
The advantage of using const here is that the list oes not need to be initalised everytime the funciton is ccalled, the compiler likely put the values in static memory instead
A static declaraion inside a function means that the variable or array has to be initalised for the first time the function is called, but not on later calls. This is inefficient, as the function then needs to check whether it has been called before
all identical constants will be joined together in order to minimise the amount of cache space used for constatnts

Register Storage
A limited number of variables can be sotred in registers instad of the main memroy
A register is a small pieve of memory inside the CPI used for temporary storage.
Variables that are sotred in registers are accessed ver fast, all optimising compilers will automatically choose the most often used variables in a function for refister storage
The same register can be used for multiple variables as long as their uses do not overlap
Local variables are particulary suited for register storage, this is another reason for preferring local variables
registers are limited, there are approx 6 integer registers avaiable for general purpose 32-bit x86, but 14 in 64-bit systems
Floating point variables use a different kind of register, there are 8 floating point registers avaiable in 32 bit operating systems 

Volatile
The volatile keyword specifes that a variable can be changed by anothe thread
This prevents the compiler from making optimisations that reply on the assymption that the variable always has the value it was assigned to previously in the code
The effect of volatile makes sure that the variable is stored in memory rather than in a register and prevents optimisations on the variable, 
Volatile does not mean atomic

Thread local storage:
most compilers can make thread local storage of static and global variables by using the keyword thread_local, __thread

Dynamic Memory Allocation
Dynamic memory allocation is does with the operations new and delete
These operators and function consume a significant amount of time
A part of memory called the heap is reserved for dynamic allocation
The heap can easiily ebcfore fragmented
The heap manager can spend a lot of time cleanin gup spaces that are no longer used ans earching for vacant spaces <-this is called garbage collection
Objects that are allocated in sequence are not necessarilt stored in sequenti memroy, they can be scatted around at different plaecs when the heap has become gragmented, this makes data caching inefficient
DMA alos tends to make the code more compilated and error-prone, the program has to keep pinters to all allocated objeccts, and keep track of when they are no longer used.
If you don't deallocate objects on the heap, you get a memeoru leak
Java uses dynami memory allocation for all objects, which is very inefficient 
Car car(1); <- this makes an object on the stack. Car *car = new Car(1) <- would make an object on the heap

Variables Declared Inside a Class
- Variables decalred inside a class are stored in the order in which they appear in the class declaration
- The type of storage is determined when the object of the class i decalred
- An obect of a class, structure, or unioin can use any of the storage methods metioned above
An object cannot be stored in a register, except for the simplest of cases, but its data memebrs can be copied into registers
A calls member with the static modifier will be stored in static memeory and will have one and only one instance, non-static memebers of the same class will be stored with each instance of the class
Storing variables in a class or a structure is a good way of making sure that varirables are used int ehs same part of the pgram, but more imporatnly, they are stored near earch other


7.2 Integers variables and operators

Integer Sizes
Integeres can be different sizes, and they can be signed or unsigned
The followeing table summarizes the different integer tpes avaiable
char:8, short int: 16, int 32, long:64

Signed vs Unsigned Ints
No real speed diff, unless
In division by a constatn, the unsigned is faster than signed
Conversion to sloating point is faster with signed than with unsigned integers 
overflow behaves differently on signed and unsigned variables
conversion between the two is costless, it is simpley a matter of interpreting the same bits differently 

Integer division speeds vary depending on the processsor used

Speed difference between ++i and i++
-> no diff in speed unless used in an expression
array[++i] is slower than array[i++]
this is because the availability of i is delayed for approx two clock cycles

Enums are as efficient as integers, as they are integers


Boolean Operands
- if a condition in && / || is most likely, then put it last for && and first for ||
- This is because if it is most likely first in an &&, then it will more likley have to be evaulated, and the second evaluated, so bear in mind predictibility

Booleans are over determined
All operators that have booleans as output can produce no other alue that 1 or 0, still check for other values
its faster to use 0 or 1 instead, so that you can just use bitwise operators 

Pointers vs References
Pointers and references are equally ineffeicnet as they are effectiely doing the same thing
The difference only comes in programng style

Advantages of Pointers:
pointers are more obivous than references
pointers have more funcitonlaity that references, you can change where a pointer points to , and you can do arithmetic opertations with a pointer
Advantages of References
syntax is easier
references are safer to yser than pinters becase in most cases they are ysure to void to a valid address

the 'this' keyword is an implicit pointer
all (nonstatic) variables and objects declared inside a function are stored on the stack and are infact addressed relative to the tack pointer

Smart Pointers: behaves like a pointer, has the special feature that the object it points to is deleted when the pointer is deleted

7.10 Arrays
An array is impleted simply by storing the elements consecutiley in memeoy
no information about the dimensions of the array is stored, this makes the use of arrays in C and C++ faster than in other pgoramming lanuages, but also less sade
This sadety problem can be overcome by defined a container class that behaves like an array with bounds checking

An array using the above template class is declared by specifying the type and size as template parameters

7.11 Type conversions
implicit
c style 
constructor style
c++ casting operator

Branches and Switch Statements
The highspeed of modern microprocessors is obtained by using a pipline where instructions are fetched and decoded in several stages before they are executed
-> This has one big problem. Whevenever the code has a branch, the microprocessor does not know in advance whihc of the two branches to feed into the pipline.
If the wrong branch is fed into the pipeline, then the error is not detected untile a while later
Microprocessors use branch prediciton to reduce this problem.
They use advanced algorithms to solve this problem

Switch statements are a type of branching
They are most efficient when the case statements increment sequentially, as it can be implemented as a table of ump targets
A switch statement with many labels that has values far from each toehr is inefficient because the compiler must convert it to a branch tree
New processors are somtimes able to prefict a switch stateemnt if it follows a simple periodic pattern, or if it is correlated with preceding branches and the number of different targets is small
The number off branches and switch statements shuold preferable be kept small in the critical part of a program, especially if the branches are poorly predictable 

The target of branches and functino calls are saved in a special cache called the branch target buffer
Contention in the branch target buffer can occur if a program 

In some cases it is possible to replace a poorly predictable branch by a table lookup, for example
a = b ? 1.5 : 2.6 is a branch and is slow, however, if you use a table
const lokup[2] = { 2.6, 1.5}
a = lookup[b]

In some cases the comiler can automaticlaly replace a branch by a conditional move


Loops

The efficientcy of a loop depends on how well the microprocessor can predict the lool control branch
A loop with a small and fixedd repeat count and no branches inside can be predicted perfectly
The maximum loop count can b predicted depends on the proceessor
nested loops are harder

use loop unrolling (basically incremennt by different numbers, if your loop goes straight into a load of if stateemnts)
it does take up more space in the code cache 


Copying or clearing arrays
it may not be optimal to use a loop for trivial tasks such as copying an array or setting an array to all zeroes
copilers ofter automaticlaly do this


7.14 Functions
Function calls may slow down a program for the following reasons
- The function call makes the microprocessor jump to a different code address and back again, this can take 4 clock cycles
- The code cache works less efficiently if the code is fragmanted and in scattered memeory
- Function paramteres are sotred on the stack, and it takes ime reading them again,
- Extra time is needed for setting up a stack frame, saving and restoring registers, and possible save exception handling information
- Each function call occupies space in the branch target buffer which can cause contentions after a while

TOFIX
- aoiv uneccesary functions
- use inline function (basically expands a funciton body wherethe inline funciton is) 
- Avoid nested funciton calls in the inner most loop
- use macros instead of functions, thes are certian to be inlined (beware macro parameters are evaluated each time they are used) 

page 49