程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Why is Pythons numpy vectorization statement faster than for?

編輯:Python

Let's take a look first ,python Such language for loop , Compared with other languages , What extra cost .

We know ,python It's the interpretation of execution .

for instance , perform x = 1234+5678 , For compiled languages , Is to read two... From memory short int To register , Then read in the addition instruction , notice CPU Internal adder action , Finally, the output of the adder is stored in x Corresponding memory unit ( In essence , Finally, this action is almost always automatically optimized to “ Temporarily store the adder output to a register instead of a memory unit , Because the time consumption of accessing memory is often dozens of times that of accessing registers ”). altogether 2~4 Orders ( It depends CPU Depending on the instruction set ).

Changed the explanatory language , It's a different story .

It has to put “x = 1234+5678” As a string , Compare character by character to analyze the grammatical structure —— Regardless of spaces, this is also 11 Characters , At least do 11 Cycle ; The minimum instructions that need to be executed in each loop are : Take the data ( As read 'x' This character )、 Comparative data 、 Jump to... According to the comparison results ( You may have to jump back )、 Cumulative cycle counter 、 Check whether the cycle counter reaches the final value 、 Jump to... According to the comparison results . This is at least 6 Orders , Memory contains one read at a time 、 Branch instruction at least twice ( modern CPU There are branch predictions , If there is no additional cost on hit , otherwise ……). A total of 66 Orders , Slower than compiled languages at least 17 times ( Suppose that each instruction takes the same execution time . But in fact , Visiting and depositing / Jump instructions often consume ten or even a hundred times the time of addition instructions ).

This is just the consumption of reading the source code , Not included in “ Syntax analysis ” This big head ; And after , At least hundreds of times more instructions ( It takes time …… I guess it's at least thousands of times more ).

however ,python It is much better than other explanatory languages . Because it can compile the text code into “ Bytecode ”( Stored in a file with the extension pyc In the file ), So as to directly deal with integer “ Instruction code ”, No longer need to analyze text from scratch .

however , from “ Bytecode ” Translate to reality CPU Code this step , Still can't save .

This consumption , Can be regarded as “ Leveraging virtual machines ” Execute heterogeneous CPU Procedure on . It has been proved that , Even if optimized to the extreme , It also requires 10 Times the performance consumption .

There are also ways to reduce this consumption . This is it. JIT technology .

JIT To put it bluntly , Just before executing a piece of code for the first time , Execute the compile action first , Then execute the compiled code .

If there is no loop in the code , Then it will cost a lot of extra time in vain ; But if there are cycles above a certain scale , It's possible to save a little time .

The best of them is Java. It can even run in real time based on the results of the last run profile, Then make great efforts to optimize the key code , So as to get a better result than C Faster execution .

however , The ideal is full , Reality is thin . Although local hot spots may be faster , but Java The overall efficiency is still better than C/C++ Much worse —— This reason is more complicated .

and C/C++/Java The kind of compiler that invests a lot of resources and has been tempered is different ,python Of JIT It can even be called “ Poor ”.

Add and subtract subtract , Just one cycle , It's normal to slow down more than ten or even dozens of times .

The above discussion , Just thinking about for The control structure of the loop itself . in fact ,“ slow ” Often all-round .

for instance , To calculate a set of vectors , First, store it .

How to store ?

Yes C/C++ Come on , There is a “ Array ” in ; And its array , It's a bare continuous memory area ; A numeric data is stored for every few bytes in the area .

This structure CPU It's the most convenient and fast to handle , And cache friendly ( if cache Unfriendly can be several times or even dozens of times slower ).

Java Other languages will be a little inferior . Because of its “ Array ” yes “ Real arrays ”; be relative to “ Continuous memory area ”,“ Real arrays ” You have to check whether the array subscript is out of bounds every time you access . This check is not expensive , But it's not small ……

Of course , This is also good . At least not like C/C++ like that , Worried about buffer overflow all day .

and python And so on ……

To accommodate beginners , It takes out “ Variable declarations ” as well as “ data type ”—— So its users don't have to 、 I can't write int xxx 了 . Any data , We can save if we want , Wula !

however , If I told you , Variable data types are actually C/C++ That's what it says :

typedef struct tagVARIANT {
union {
struct __tagVARIANT {
VARTYPE vt;
WORD wReserved1;
WORD wReserved2;
WORD wReserved3;
union {
LONGLONG llVal;
LONG lVal;
BYTE bVal;
SHORT iVal;
FLOAT fltVal;
DOUBLE dblVal;
VARIANT_BOOL boolVal;
_VARIANT_BOOL bool;
SCODE scode;
CY cyVal;
DATE date;
BSTR bstrVal;
IUnknown *punkVal;
IDispatch *pdispVal;
SAFEARRAY *parray;
BYTE *pbVal;
SHORT *piVal;
LONG *plVal;
LONGLONG *pllVal;
FLOAT *pfltVal;
DOUBLE *pdblVal;
VARIANT_BOOL *pboolVal;
_VARIANT_BOOL *pbool;
SCODE *pscode;
CY *pcyVal;
DATE *pdate;
BSTR *pbstrVal;
IUnknown **ppunkVal;
IDispatch **ppdispVal;
SAFEARRAY **pparray;
VARIANT *pvarVal;
PVOID byref;
CHAR cVal;
USHORT uiVal;
ULONG ulVal;
ULONGLONG ullVal;
INT intVal;
UINT uintVal;
DECIMAL *pdecVal;
CHAR *pcVal;
USHORT *puiVal;
ULONG *pulVal;
ULONGLONG *pullVal;
INT *pintVal;
UINT *puintVal;
struct __tagBRECORD {
PVOID pvRecord;
IRecordInfo *pRecInfo;
} __VARIANT_NAME_4;
} __VARIANT_NAME_3;
} __VARIANT_NAME_2;
DECIMAL decVal;
} __VARIANT_NAME_1;
} VARIANT, *LPVARIANT, VARIANTARG, *LPVARIANTARG;

In short , The idea of this thing is “ Using a tag Indicates the data type , The real data is stored in the following union in ; During the interview , basis tag Indicates conversion / Return the appropriate type ”.

Obviously , Yes C/C++/Java For the programmer , This thing, both in time and space , It's all a disaster .

also , It's also extremely cache unfriendly —— It could have been stored continuously , Now? …… Into a structure ; And once some type of data is saved , You have to jump to another area through the pointer to access ( If stored in place , The wasted space is terrible ).

So you see , Let's talk about efficiency based on this structure , Is it a little ……

Even if you only know this degree, it has been very shocking : Explain to perform + Bytecode optimization is at least 10 Times to tens or hundreds of times ,“ Beginners are friendly ” The basic data is several times to dozens of times slower , Access through containers ( Not better 、 Fixed size arrays don't even check subscripts, pretending to be an array “ Memory area ”) Slow down a few times to dozens of times …… Even if we don't consider the overhead of other mechanisms for the time being , Just put these things together ( In certain circumstances , These are different “ slow ” Points can also interact 、 Play a “ Retardation multiplication amplification ” The effect of )……

besides , also python How to manage internally / Indexes / Access the global in the script / The problem of local variables ( Tend to use dict)、 Cache miss caused by serious mismatch between user data and physical machine memory 、python Internal state machine / Implement on-site management and other management problems —— For compiled languages , None of this exists ,CPU/ Memory takes good care of itself ; But for explanatory language , These will become “ The retardation is doubled ” The culprit of .

The interaction of these things is extremely complex and subtle , Almost no one can fully understand it .

You see , Understand the cause and effect , Can we just say “python The optimization is really good , It's just slow 20 Ten thousand times ” Well ?( laugh ~

Of course , If you don't do such complex processing , If it's just something procedural , The processing speed of this kind of language is enough —— At least the people who interact with them don't feel the slightest delay .

even to the extent that , Even if it requires complex processing , This kind of language can also ask for help from other languages . It's like there's a numpy, Who dares to say python Can't do vector operations ?

—— Of course , When talking to experts , You have to understand , This is looking for C Such language moved to the rescue . Open your eyes and tell a lie. Think of it as python The ability of language itself is a bit embarrassing . But if you just mix python The words of the circle , It won't delay anything .

————————————————————————————

If you want to expose , Professional programmers will also list the disadvantages such as no data type leads to fuzzy interface, so they can't write more complex programs for you . But these are unnecessary digressions .

After all ,python Just a glue language , It's more than enough for beginners to be friendly and deal with common simple application scenarios , That's enough .

It's like putting office Do it foolishly , This is the job of professional programmers —— The user thinks it's easy to use 、 Just be willing to pay , Why care “ Make a set of office The money that needs to be thrown in is enough to cover N Dubai Tower ” Well .

Of course , If you want to further develop , please remember “ Use the right tools in the right place ” this sentence —— Then find a way to understand the limitations of each tool .

After all , Even if it is C/C++, When doing matrix and other operations , Also turn to SIMD Of MMX Instructions 、 hyper-threading / Multi core CPU Even GPU, So that you can “ Supplement ” Parallel processing capability on .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved