How to compute the sum of squares of outer products of two matrices minus a common matrix in Matlab?
Suppose there are three n * n matrices X
, Y
, S
. How to fast compute the the following scalars b
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C
for i = 1:n
C = C + X(i,:)' * Y(i,:);
end
can be calculated without for loop C = A.'*B
which is only O(n^2). Is there exists a faster way to compute b
?
matlab matrix-multiplication
add a comment |
Suppose there are three n * n matrices X
, Y
, S
. How to fast compute the the following scalars b
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C
for i = 1:n
C = C + X(i,:)' * Y(i,:);
end
can be calculated without for loop C = A.'*B
which is only O(n^2). Is there exists a faster way to compute b
?
matlab matrix-multiplication
did you meanX(i,:)'*Y(i,:)
? otherwise there is no mention ofY
– bla
Nov 12 at 11:52
yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57
add a comment |
Suppose there are three n * n matrices X
, Y
, S
. How to fast compute the the following scalars b
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C
for i = 1:n
C = C + X(i,:)' * Y(i,:);
end
can be calculated without for loop C = A.'*B
which is only O(n^2). Is there exists a faster way to compute b
?
matlab matrix-multiplication
Suppose there are three n * n matrices X
, Y
, S
. How to fast compute the the following scalars b
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C
for i = 1:n
C = C + X(i,:)' * Y(i,:);
end
can be calculated without for loop C = A.'*B
which is only O(n^2). Is there exists a faster way to compute b
?
matlab matrix-multiplication
matlab matrix-multiplication
edited Nov 12 at 11:55
asked Nov 12 at 11:30
messcode
186
186
did you meanX(i,:)'*Y(i,:)
? otherwise there is no mention ofY
– bla
Nov 12 at 11:52
yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57
add a comment |
did you meanX(i,:)'*Y(i,:)
? otherwise there is no mention ofY
– bla
Nov 12 at 11:52
yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57
did you mean
X(i,:)'*Y(i,:)
? otherwise there is no mention of Y
– bla
Nov 12 at 11:52
did you mean
X(i,:)'*Y(i,:)
? otherwise there is no mention of Y
– bla
Nov 12 at 11:52
yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57
yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57
add a comment |
2 Answers
2
active
oldest
votes
You can use:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Given your example
b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end
We can first bring the summation out of the loop:
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))
Knowing that we can write (a - b)^2
as a^2 - 2*a*b + b^2
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))
And we know that (a * b) ^ 2
is the same as a^2 * b^2
:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))
Now we can compute each term separately:
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500
:
===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.
===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.
===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.
===Loop solution===
Elapsed time is 2.20357 seconds
n=500;
X= rand(n);
Y= rand(n);
S= rand(n);
disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc
1
Very nice! -- I guesssum(reshape(...))
is more expensive thansum(sum(...))
? Is this because of the overhead in thereshape
function? -- To avoid the double sum, in Octave you can dosum((...)(:))
, and in MATLAB R2018b you can now dosum(...,'all')
, which both are quite a bit more elegant than the double sum, and should be faster.
– Cris Luengo
Nov 12 at 16:17
Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable.sum((...)(:))
may complicate readability andsum(...,'all')
uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
– rahnema1
Nov 12 at 16:40
The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something likesumall=@(x)sum(x(:))
. But that does imply 20 or so new operators (meanall
,stdall
,maxall
, etc. etc.), you can't do one and not all the others. :)
– Cris Luengo
Nov 12 at 16:44
add a comment |
You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:
% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);
% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
% check
b - b2 % close to machine epsilon i.e. zero
What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n]
against one with [n, n]
, the latter being implicitly the same as [n, n, 1]
. The overlapping first index corresponds to the i
in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i
. Then we permute the indices in order to put the "i
" index last, so that we can again broadcast the result with S
of (implicit) size [n, n, 1]
. What we then have is a matrix of size [n, n, n]
where the first two indices are matrix indices in your original and the last one corresponds to i
. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).
A slight variation of the above transposes S
instead of the 3d array which might be faster (again, you should time it):
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
In terms of performance, reshape
is free (it only reinterprets data, it doesn't copy) but permute
/transpose will often lead to a perforance hit when data gets copied.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261260%2fhow-to-compute-the-sum-of-squares-of-outer-products-of-two-matrices-minus-a-comm%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Given your example
b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end
We can first bring the summation out of the loop:
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))
Knowing that we can write (a - b)^2
as a^2 - 2*a*b + b^2
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))
And we know that (a * b) ^ 2
is the same as a^2 * b^2
:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))
Now we can compute each term separately:
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500
:
===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.
===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.
===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.
===Loop solution===
Elapsed time is 2.20357 seconds
n=500;
X= rand(n);
Y= rand(n);
S= rand(n);
disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc
1
Very nice! -- I guesssum(reshape(...))
is more expensive thansum(sum(...))
? Is this because of the overhead in thereshape
function? -- To avoid the double sum, in Octave you can dosum((...)(:))
, and in MATLAB R2018b you can now dosum(...,'all')
, which both are quite a bit more elegant than the double sum, and should be faster.
– Cris Luengo
Nov 12 at 16:17
Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable.sum((...)(:))
may complicate readability andsum(...,'all')
uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
– rahnema1
Nov 12 at 16:40
The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something likesumall=@(x)sum(x(:))
. But that does imply 20 or so new operators (meanall
,stdall
,maxall
, etc. etc.), you can't do one and not all the others. :)
– Cris Luengo
Nov 12 at 16:44
add a comment |
You can use:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Given your example
b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end
We can first bring the summation out of the loop:
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))
Knowing that we can write (a - b)^2
as a^2 - 2*a*b + b^2
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))
And we know that (a * b) ^ 2
is the same as a^2 * b^2
:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))
Now we can compute each term separately:
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500
:
===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.
===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.
===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.
===Loop solution===
Elapsed time is 2.20357 seconds
n=500;
X= rand(n);
Y= rand(n);
S= rand(n);
disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc
1
Very nice! -- I guesssum(reshape(...))
is more expensive thansum(sum(...))
? Is this because of the overhead in thereshape
function? -- To avoid the double sum, in Octave you can dosum((...)(:))
, and in MATLAB R2018b you can now dosum(...,'all')
, which both are quite a bit more elegant than the double sum, and should be faster.
– Cris Luengo
Nov 12 at 16:17
Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable.sum((...)(:))
may complicate readability andsum(...,'all')
uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
– rahnema1
Nov 12 at 16:40
The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something likesumall=@(x)sum(x(:))
. But that does imply 20 or so new operators (meanall
,stdall
,maxall
, etc. etc.), you can't do one and not all the others. :)
– Cris Luengo
Nov 12 at 16:44
add a comment |
You can use:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Given your example
b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end
We can first bring the summation out of the loop:
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))
Knowing that we can write (a - b)^2
as a^2 - 2*a*b + b^2
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))
And we know that (a * b) ^ 2
is the same as a^2 * b^2
:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))
Now we can compute each term separately:
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500
:
===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.
===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.
===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.
===Loop solution===
Elapsed time is 2.20357 seconds
n=500;
X= rand(n);
Y= rand(n);
S= rand(n);
disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc
You can use:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Given your example
b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end
We can first bring the summation out of the loop:
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))
Knowing that we can write (a - b)^2
as a^2 - 2*a*b + b^2
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))
And we know that (a * b) ^ 2
is the same as a^2 * b^2
:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))
Now we can compute each term separately:
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500
:
===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.
===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.
===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.
===Loop solution===
Elapsed time is 2.20357 seconds
n=500;
X= rand(n);
Y= rand(n);
S= rand(n);
disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc
edited Nov 12 at 13:20
answered Nov 12 at 12:56
rahnema1
9,8752922
9,8752922
1
Very nice! -- I guesssum(reshape(...))
is more expensive thansum(sum(...))
? Is this because of the overhead in thereshape
function? -- To avoid the double sum, in Octave you can dosum((...)(:))
, and in MATLAB R2018b you can now dosum(...,'all')
, which both are quite a bit more elegant than the double sum, and should be faster.
– Cris Luengo
Nov 12 at 16:17
Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable.sum((...)(:))
may complicate readability andsum(...,'all')
uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
– rahnema1
Nov 12 at 16:40
The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something likesumall=@(x)sum(x(:))
. But that does imply 20 or so new operators (meanall
,stdall
,maxall
, etc. etc.), you can't do one and not all the others. :)
– Cris Luengo
Nov 12 at 16:44
add a comment |
1
Very nice! -- I guesssum(reshape(...))
is more expensive thansum(sum(...))
? Is this because of the overhead in thereshape
function? -- To avoid the double sum, in Octave you can dosum((...)(:))
, and in MATLAB R2018b you can now dosum(...,'all')
, which both are quite a bit more elegant than the double sum, and should be faster.
– Cris Luengo
Nov 12 at 16:17
Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable.sum((...)(:))
may complicate readability andsum(...,'all')
uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
– rahnema1
Nov 12 at 16:40
The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something likesumall=@(x)sum(x(:))
. But that does imply 20 or so new operators (meanall
,stdall
,maxall
, etc. etc.), you can't do one and not all the others. :)
– Cris Luengo
Nov 12 at 16:44
1
1
Very nice! -- I guess
sum(reshape(...))
is more expensive than sum(sum(...))
? Is this because of the overhead in the reshape
function? -- To avoid the double sum, in Octave you can do sum((...)(:))
, and in MATLAB R2018b you can now do sum(...,'all')
, which both are quite a bit more elegant than the double sum, and should be faster.– Cris Luengo
Nov 12 at 16:17
Very nice! -- I guess
sum(reshape(...))
is more expensive than sum(sum(...))
? Is this because of the overhead in the reshape
function? -- To avoid the double sum, in Octave you can do sum((...)(:))
, and in MATLAB R2018b you can now do sum(...,'all')
, which both are quite a bit more elegant than the double sum, and should be faster.– Cris Luengo
Nov 12 at 16:17
Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable.
sum((...)(:))
may complicate readability and sum(...,'all')
uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?– rahnema1
Nov 12 at 16:40
Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable.
sum((...)(:))
may complicate readability and sum(...,'all')
uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?– rahnema1
Nov 12 at 16:40
The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like
sumall=@(x)sum(x(:))
. But that does imply 20 or so new operators (meanall
, stdall
, maxall
, etc. etc.), you can't do one and not all the others. :)– Cris Luengo
Nov 12 at 16:44
The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like
sumall=@(x)sum(x(:))
. But that does imply 20 or so new operators (meanall
, stdall
, maxall
, etc. etc.), you can't do one and not all the others. :)– Cris Luengo
Nov 12 at 16:44
add a comment |
You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:
% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);
% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
% check
b - b2 % close to machine epsilon i.e. zero
What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n]
against one with [n, n]
, the latter being implicitly the same as [n, n, 1]
. The overlapping first index corresponds to the i
in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i
. Then we permute the indices in order to put the "i
" index last, so that we can again broadcast the result with S
of (implicit) size [n, n, 1]
. What we then have is a matrix of size [n, n, n]
where the first two indices are matrix indices in your original and the last one corresponds to i
. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).
A slight variation of the above transposes S
instead of the 3d array which might be faster (again, you should time it):
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
In terms of performance, reshape
is free (it only reinterprets data, it doesn't copy) but permute
/transpose will often lead to a perforance hit when data gets copied.
add a comment |
You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:
% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);
% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
% check
b - b2 % close to machine epsilon i.e. zero
What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n]
against one with [n, n]
, the latter being implicitly the same as [n, n, 1]
. The overlapping first index corresponds to the i
in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i
. Then we permute the indices in order to put the "i
" index last, so that we can again broadcast the result with S
of (implicit) size [n, n, 1]
. What we then have is a matrix of size [n, n, n]
where the first two indices are matrix indices in your original and the last one corresponds to i
. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).
A slight variation of the above transposes S
instead of the 3d array which might be faster (again, you should time it):
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
In terms of performance, reshape
is free (it only reinterprets data, it doesn't copy) but permute
/transpose will often lead to a perforance hit when data gets copied.
add a comment |
You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:
% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);
% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
% check
b - b2 % close to machine epsilon i.e. zero
What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n]
against one with [n, n]
, the latter being implicitly the same as [n, n, 1]
. The overlapping first index corresponds to the i
in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i
. Then we permute the indices in order to put the "i
" index last, so that we can again broadcast the result with S
of (implicit) size [n, n, 1]
. What we then have is a matrix of size [n, n, n]
where the first two indices are matrix indices in your original and the last one corresponds to i
. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).
A slight variation of the above transposes S
instead of the 3d array which might be faster (again, you should time it):
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
In terms of performance, reshape
is free (it only reinterprets data, it doesn't copy) but permute
/transpose will often lead to a perforance hit when data gets copied.
You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:
% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);
% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
% check
b - b2 % close to machine epsilon i.e. zero
What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n]
against one with [n, n]
, the latter being implicitly the same as [n, n, 1]
. The overlapping first index corresponds to the i
in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i
. Then we permute the indices in order to put the "i
" index last, so that we can again broadcast the result with S
of (implicit) size [n, n, 1]
. What we then have is a matrix of size [n, n, n]
where the first two indices are matrix indices in your original and the last one corresponds to i
. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).
A slight variation of the above transposes S
instead of the 3d array which might be faster (again, you should time it):
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
In terms of performance, reshape
is free (it only reinterprets data, it doesn't copy) but permute
/transpose will often lead to a perforance hit when data gets copied.
edited Nov 12 at 12:16
answered Nov 12 at 11:59
Andras Deak
20.4k63870
20.4k63870
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261260%2fhow-to-compute-the-sum-of-squares-of-outer-products-of-two-matrices-minus-a-comm%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
did you mean
X(i,:)'*Y(i,:)
? otherwise there is no mention ofY
– bla
Nov 12 at 11:52
yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57