An update to this thread: I found that even intpow() above is not as well optimized as is a direct command-line call to repeated multiplies, even when grouped identically to those done in intpow. In the script output below, I compare using .^, intpow, a series of ungrouped multiplies, and a series of multiplies grouped to match intpow. You can see that grouping a series of multiplications does affect the computation, but does not seem to affect the time of the computation, and repeated multiplies are much faster than intpow even though it seems to be doing the same or even fewer multiplies. I've also shown peak and rms errors between the various methods, to give some idea of how the errors go with the exponent. From this sample it seems that ungrouped multiplies is slightly more accurate with respect to the .^ operator than is intpow().
This was done on both R2012b and R2013a_pre on 64 bit linux with a dual-core processor. I also did this for X=randn(1e8,1) with similar results, to rule out overhead issues.
X=randn(1e7,1);
tic; X2=X.^2; toc
Elapsed time is 0.036954 seconds.
tic; X3=X.^3; toc
Elapsed time is 0.619918 seconds.
tic; X7=X.^7; toc
Elapsed time is 0.621204 seconds.
tic; X8=X.^8; toc
Elapsed time is 0.614439 seconds.
tic; X2a=intpow(X,2);toc
Elapsed time is 0.072780 seconds.
tic; X3a=intpow(X,3);toc
Elapsed time is 0.080734 seconds.
tic; X7a=intpow(X,7);toc
Elapsed time is 0.122126 seconds.
tic; X8a=intpow(X,8);toc
Elapsed time is 0.106735 seconds.
tic; X2b=X.*X;toc
Elapsed time is 0.037896 seconds.
tic; X3b=X.*X.*X;toc
Elapsed time is 0.037768 seconds.
tic; X7b=X.*X.*X.*X.*X.*X.*X;toc
Elapsed time is 0.043093 seconds.
tic; X8b=X.*X.*X.*X.*X.*X.*X.*X;toc
Elapsed time is 0.046745 seconds.
tic; X2c=X.*X;toc
Elapsed time is 0.037059 seconds.
tic; X3c=X.*(X.*X);toc
Elapsed time is 0.036769 seconds.
tic; X7c=X.*(X.*X).*((X.*X).*(X.*X));toc
Elapsed time is 0.044292 seconds.
tic; X8c=((X.*X).*(X.*X)).*((X.*X).*(X.*X));toc
Elapsed time is 0.048338 seconds.
max(abs(X7-X7a))
ans = 2.9104e-11
max(abs(X7-X7b))
ans = 2.9104e-11
max(abs(X7a-X7b))
ans = 1.4552e-11
max(abs(X7a-X7c))
ans = 0
rms(X7-X7a)
ans = 6.6114e-14
rms(X7-X7b)
ans = 4.8226e-14
rms(X7a-X7b)
ans = 5.4687e-14
max(abs(X8-X8a))
ans = 2.3283e-10
max(abs(X8-X8b))
ans = 1.1642e-10
max(abs(X8a-X8b))
ans = 1.1642e-10
max(abs(X8a-X8c))
ans = 0
rms(X8-X8a)
ans = 3.3424e-13
rms(X8-X8b)
ans = 1.9447e-13
rms(X8a-X8b)
ans = 2.9954e-13
This was done on both R2012b and R2013a_pre on 64 bit linux with a dual-core processor. I also did this for X=randn(1e8,1) with similar results, to rule out overhead issues.
X=randn(1e7,1);
tic; X2=X.^2; toc
Elapsed time is 0.036954 seconds.
tic; X3=X.^3; toc
Elapsed time is 0.619918 seconds.
tic; X7=X.^7; toc
Elapsed time is 0.621204 seconds.
tic; X8=X.^8; toc
Elapsed time is 0.614439 seconds.
tic; X2a=intpow(X,2);toc
Elapsed time is 0.072780 seconds.
tic; X3a=intpow(X,3);toc
Elapsed time is 0.080734 seconds.
tic; X7a=intpow(X,7);toc
Elapsed time is 0.122126 seconds.
tic; X8a=intpow(X,8);toc
Elapsed time is 0.106735 seconds.
tic; X2b=X.*X;toc
Elapsed time is 0.037896 seconds.
tic; X3b=X.*X.*X;toc
Elapsed time is 0.037768 seconds.
tic; X7b=X.*X.*X.*X.*X.*X.*X;toc
Elapsed time is 0.043093 seconds.
tic; X8b=X.*X.*X.*X.*X.*X.*X.*X;toc
Elapsed time is 0.046745 seconds.
tic; X2c=X.*X;toc
Elapsed time is 0.037059 seconds.
tic; X3c=X.*(X.*X);toc
Elapsed time is 0.036769 seconds.
tic; X7c=X.*(X.*X).*((X.*X).*(X.*X));toc
Elapsed time is 0.044292 seconds.
tic; X8c=((X.*X).*(X.*X)).*((X.*X).*(X.*X));toc
Elapsed time is 0.048338 seconds.
max(abs(X7-X7a))
ans = 2.9104e-11
max(abs(X7-X7b))
ans = 2.9104e-11
max(abs(X7a-X7b))
ans = 1.4552e-11
max(abs(X7a-X7c))
ans = 0
rms(X7-X7a)
ans = 6.6114e-14
rms(X7-X7b)
ans = 4.8226e-14
rms(X7a-X7b)
ans = 5.4687e-14
max(abs(X8-X8a))
ans = 2.3283e-10
max(abs(X8-X8b))
ans = 1.1642e-10
max(abs(X8a-X8b))
ans = 1.1642e-10
max(abs(X8a-X8c))
ans = 0
rms(X8-X8a)
ans = 3.3424e-13
rms(X8-X8b)
ans = 1.9447e-13
rms(X8a-X8b)
ans = 2.9954e-13