How to compute the sum of squares of outer products of two matrices minus a common matrix in Matlab?










3














Suppose there are three n * n matrices X, Y, S. How to fast compute the the following scalars b



for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end


The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C



for i = 1:n
C = C + X(i,:)' * Y(i,:);
end


can be calculated without for loop C = A.'*B which is only O(n^2). Is there exists a faster way to compute b?










share|improve this question























  • did you mean X(i,:)'*Y(i,:) ? otherwise there is no mention of Y
    – bla
    Nov 12 at 11:52











  • yes, thanks for pointing out the error. I have corrected it.
    – messcode
    Nov 12 at 11:57















3














Suppose there are three n * n matrices X, Y, S. How to fast compute the the following scalars b



for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end


The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C



for i = 1:n
C = C + X(i,:)' * Y(i,:);
end


can be calculated without for loop C = A.'*B which is only O(n^2). Is there exists a faster way to compute b?










share|improve this question























  • did you mean X(i,:)'*Y(i,:) ? otherwise there is no mention of Y
    – bla
    Nov 12 at 11:52











  • yes, thanks for pointing out the error. I have corrected it.
    – messcode
    Nov 12 at 11:57













3












3








3







Suppose there are three n * n matrices X, Y, S. How to fast compute the the following scalars b



for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end


The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C



for i = 1:n
C = C + X(i,:)' * Y(i,:);
end


can be calculated without for loop C = A.'*B which is only O(n^2). Is there exists a faster way to compute b?










share|improve this question















Suppose there are three n * n matrices X, Y, S. How to fast compute the the following scalars b



for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end


The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C



for i = 1:n
C = C + X(i,:)' * Y(i,:);
end


can be calculated without for loop C = A.'*B which is only O(n^2). Is there exists a faster way to compute b?







matlab matrix-multiplication






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 11:55

























asked Nov 12 at 11:30









messcode

186




186











  • did you mean X(i,:)'*Y(i,:) ? otherwise there is no mention of Y
    – bla
    Nov 12 at 11:52











  • yes, thanks for pointing out the error. I have corrected it.
    – messcode
    Nov 12 at 11:57
















  • did you mean X(i,:)'*Y(i,:) ? otherwise there is no mention of Y
    – bla
    Nov 12 at 11:52











  • yes, thanks for pointing out the error. I have corrected it.
    – messcode
    Nov 12 at 11:57















did you mean X(i,:)'*Y(i,:) ? otherwise there is no mention of Y
– bla
Nov 12 at 11:52





did you mean X(i,:)'*Y(i,:) ? otherwise there is no mention of Y
– bla
Nov 12 at 11:52













yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57




yes, thanks for pointing out the error. I have corrected it.
– messcode
Nov 12 at 11:57












2 Answers
2






active

oldest

votes


















4














You can use:



X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


Given your example



b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end


We can first bring the summation out of the loop:



b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))


Knowing that we can write (a - b)^2 as a^2 - 2*a*b + b^2



b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))


And we know that (a * b) ^ 2 is the same as a^2 * b^2:



X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))


Now we can compute each term separately:



 b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500:



===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.

===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.

===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.

===Loop solution===
Elapsed time is 2.20357 seconds


n=500;
X= rand(n);
Y= rand(n);
S= rand(n);

disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc





share|improve this answer


















  • 1




    Very nice! -- I guess sum(reshape(...)) is more expensive than sum(sum(...))? Is this because of the overhead in the reshape function? -- To avoid the double sum, in Octave you can do sum((...)(:)), and in MATLAB R2018b you can now do sum(...,'all'), which both are quite a bit more elegant than the double sum, and should be faster.
    – Cris Luengo
    Nov 12 at 16:17










  • Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable. sum((...)(:)) may complicate readability and sum(...,'all') uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
    – rahnema1
    Nov 12 at 16:40











  • The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like sumall=@(x)sum(x(:)). But that does imply 20 or so new operators (meanall, stdall, maxall, etc. etc.), you can't do one and not all the others. :)
    – Cris Luengo
    Nov 12 at 16:44


















3














You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:



% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);

% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));

% check
b - b2 % close to machine epsilon i.e. zero


What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n] against one with [n, n], the latter being implicitly the same as [n, n, 1]. The overlapping first index corresponds to the i in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i. Then we permute the indices in order to put the "i" index last, so that we can again broadcast the result with S of (implicit) size [n, n, 1]. What we then have is a matrix of size [n, n, n] where the first two indices are matrix indices in your original and the last one corresponds to i. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).



A slight variation of the above transposes S instead of the 3d array which might be faster (again, you should time it):



b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));


In terms of performance, reshape is free (it only reinterprets data, it doesn't copy) but permute/transpose will often lead to a perforance hit when data gets copied.






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261260%2fhow-to-compute-the-sum-of-squares-of-outer-products-of-two-matrices-minus-a-comm%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    You can use:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Given your example



    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
    end


    We can first bring the summation out of the loop:



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:) - S).^2;
    end
    b=sum(b(:))


    Knowing that we can write (a - b)^2 as a^2 - 2*a*b + b^2



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
    end
    b=sum(b(:))


    And we know that (a * b) ^ 2 is the same as a^2 * b^2:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=0;
    for i = 1:n
    b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
    end
    b=sum(b(:))


    Now we can compute each term separately:



     b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500:



    ===rahnema1 (B)===
    Elapsed time is 0.0984299 seconds.

    ===Andras Deak (B2)===
    Elapsed time is 7.86407 seconds.

    ===Andras Deak (B3)===
    Elapsed time is 2.99158 seconds.

    ===Loop solution===
    Elapsed time is 2.20357 seconds


    n=500;
    X= rand(n);
    Y= rand(n);
    S= rand(n);

    disp('===rahnema1 (B)===')
    tic
    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
    toc
    disp('===Andras Deak (B2)===')
    tic
    b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
    toc
    disp('===Andras Deak (B3)===')
    tic
    b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
    toc
    tic
    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
    end
    toc





    share|improve this answer


















    • 1




      Very nice! -- I guess sum(reshape(...)) is more expensive than sum(sum(...))? Is this because of the overhead in the reshape function? -- To avoid the double sum, in Octave you can do sum((...)(:)), and in MATLAB R2018b you can now do sum(...,'all'), which both are quite a bit more elegant than the double sum, and should be faster.
      – Cris Luengo
      Nov 12 at 16:17










    • Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable. sum((...)(:)) may complicate readability and sum(...,'all') uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
      – rahnema1
      Nov 12 at 16:40











    • The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like sumall=@(x)sum(x(:)). But that does imply 20 or so new operators (meanall, stdall, maxall, etc. etc.), you can't do one and not all the others. :)
      – Cris Luengo
      Nov 12 at 16:44















    4














    You can use:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Given your example



    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
    end


    We can first bring the summation out of the loop:



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:) - S).^2;
    end
    b=sum(b(:))


    Knowing that we can write (a - b)^2 as a^2 - 2*a*b + b^2



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
    end
    b=sum(b(:))


    And we know that (a * b) ^ 2 is the same as a^2 * b^2:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=0;
    for i = 1:n
    b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
    end
    b=sum(b(:))


    Now we can compute each term separately:



     b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500:



    ===rahnema1 (B)===
    Elapsed time is 0.0984299 seconds.

    ===Andras Deak (B2)===
    Elapsed time is 7.86407 seconds.

    ===Andras Deak (B3)===
    Elapsed time is 2.99158 seconds.

    ===Loop solution===
    Elapsed time is 2.20357 seconds


    n=500;
    X= rand(n);
    Y= rand(n);
    S= rand(n);

    disp('===rahnema1 (B)===')
    tic
    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
    toc
    disp('===Andras Deak (B2)===')
    tic
    b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
    toc
    disp('===Andras Deak (B3)===')
    tic
    b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
    toc
    tic
    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
    end
    toc





    share|improve this answer


















    • 1




      Very nice! -- I guess sum(reshape(...)) is more expensive than sum(sum(...))? Is this because of the overhead in the reshape function? -- To avoid the double sum, in Octave you can do sum((...)(:)), and in MATLAB R2018b you can now do sum(...,'all'), which both are quite a bit more elegant than the double sum, and should be faster.
      – Cris Luengo
      Nov 12 at 16:17










    • Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable. sum((...)(:)) may complicate readability and sum(...,'all') uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
      – rahnema1
      Nov 12 at 16:40











    • The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like sumall=@(x)sum(x(:)). But that does imply 20 or so new operators (meanall, stdall, maxall, etc. etc.), you can't do one and not all the others. :)
      – Cris Luengo
      Nov 12 at 16:44













    4












    4








    4






    You can use:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Given your example



    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
    end


    We can first bring the summation out of the loop:



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:) - S).^2;
    end
    b=sum(b(:))


    Knowing that we can write (a - b)^2 as a^2 - 2*a*b + b^2



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
    end
    b=sum(b(:))


    And we know that (a * b) ^ 2 is the same as a^2 * b^2:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=0;
    for i = 1:n
    b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
    end
    b=sum(b(:))


    Now we can compute each term separately:



     b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500:



    ===rahnema1 (B)===
    Elapsed time is 0.0984299 seconds.

    ===Andras Deak (B2)===
    Elapsed time is 7.86407 seconds.

    ===Andras Deak (B3)===
    Elapsed time is 2.99158 seconds.

    ===Loop solution===
    Elapsed time is 2.20357 seconds


    n=500;
    X= rand(n);
    Y= rand(n);
    S= rand(n);

    disp('===rahnema1 (B)===')
    tic
    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
    toc
    disp('===Andras Deak (B2)===')
    tic
    b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
    toc
    disp('===Andras Deak (B3)===')
    tic
    b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
    toc
    tic
    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
    end
    toc





    share|improve this answer














    You can use:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Given your example



    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
    end


    We can first bring the summation out of the loop:



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:) - S).^2;
    end
    b=sum(b(:))


    Knowing that we can write (a - b)^2 as a^2 - 2*a*b + b^2



    b=0;
    for i = 1:n
    b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
    end
    b=sum(b(:))


    And we know that (a * b) ^ 2 is the same as a^2 * b^2:



    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=0;
    for i = 1:n
    b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
    end
    b=sum(b(:))


    Now we can compute each term separately:



     b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));


    Here is the result of a test in Octave that compares my method and two other methods provided by @AndrasDeak and the original loop based solution for inputs of size 500*500:



    ===rahnema1 (B)===
    Elapsed time is 0.0984299 seconds.

    ===Andras Deak (B2)===
    Elapsed time is 7.86407 seconds.

    ===Andras Deak (B3)===
    Elapsed time is 2.99158 seconds.

    ===Loop solution===
    Elapsed time is 2.20357 seconds


    n=500;
    X= rand(n);
    Y= rand(n);
    S= rand(n);

    disp('===rahnema1 (B)===')
    tic
    X2 = X.^2;
    Y2 = Y.^2;
    S2 = S.^2;
    b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
    toc
    disp('===Andras Deak (B2)===')
    tic
    b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));
    toc
    disp('===Andras Deak (B3)===')
    tic
    b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));
    toc
    tic
    b=0;
    for i = 1:n
    b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
    end
    toc






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 12 at 13:20

























    answered Nov 12 at 12:56









    rahnema1

    9,8752922




    9,8752922







    • 1




      Very nice! -- I guess sum(reshape(...)) is more expensive than sum(sum(...))? Is this because of the overhead in the reshape function? -- To avoid the double sum, in Octave you can do sum((...)(:)), and in MATLAB R2018b you can now do sum(...,'all'), which both are quite a bit more elegant than the double sum, and should be faster.
      – Cris Luengo
      Nov 12 at 16:17










    • Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable. sum((...)(:)) may complicate readability and sum(...,'all') uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
      – rahnema1
      Nov 12 at 16:40











    • The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like sumall=@(x)sum(x(:)). But that does imply 20 or so new operators (meanall, stdall, maxall, etc. etc.), you can't do one and not all the others. :)
      – Cris Luengo
      Nov 12 at 16:44












    • 1




      Very nice! -- I guess sum(reshape(...)) is more expensive than sum(sum(...))? Is this because of the overhead in the reshape function? -- To avoid the double sum, in Octave you can do sum((...)(:)), and in MATLAB R2018b you can now do sum(...,'all'), which both are quite a bit more elegant than the double sum, and should be faster.
      – Cris Luengo
      Nov 12 at 16:17










    • Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable. sum((...)(:)) may complicate readability and sum(...,'all') uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
      – rahnema1
      Nov 12 at 16:40











    • The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like sumall=@(x)sum(x(:)). But that does imply 20 or so new operators (meanall, stdall, maxall, etc. etc.), you can't do one and not all the others. :)
      – Cris Luengo
      Nov 12 at 16:44







    1




    1




    Very nice! -- I guess sum(reshape(...)) is more expensive than sum(sum(...))? Is this because of the overhead in the reshape function? -- To avoid the double sum, in Octave you can do sum((...)(:)), and in MATLAB R2018b you can now do sum(...,'all'), which both are quite a bit more elegant than the double sum, and should be faster.
    – Cris Luengo
    Nov 12 at 16:17




    Very nice! -- I guess sum(reshape(...)) is more expensive than sum(sum(...))? Is this because of the overhead in the reshape function? -- To avoid the double sum, in Octave you can do sum((...)(:)), and in MATLAB R2018b you can now do sum(...,'all'), which both are quite a bit more elegant than the double sum, and should be faster.
    – Cris Luengo
    Nov 12 at 16:17












    Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable. sum((...)(:)) may complicate readability and sum(...,'all') uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
    – rahnema1
    Nov 12 at 16:40





    Thanks. I don't know exactly which of sum-reshape or sum-sum performs better. Here I used sum-sum to reflect the original code and may be more readable. sum((...)(:)) may complicate readability and sum(...,'all') uses the ugly string parameter passing. Comparing them I think the difference in speed in the current problem is neglectible. May we need a more elegant function/operator?
    – rahnema1
    Nov 12 at 16:40













    The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like sumall=@(x)sum(x(:)). But that does imply 20 or so new operators (meanall, stdall, maxall, etc. etc.), you can't do one and not all the others. :)
    – Cris Luengo
    Nov 12 at 16:44




    The double sum requires an intermediate array, so in principle it does more work. But yes, I'm sure it's not a measurable difference in this case. I agree with your readability comments. A new operator would be awesome. Something like sumall=@(x)sum(x(:)). But that does imply 20 or so new operators (meanall, stdall, maxall, etc. etc.), you can't do one and not all the others. :)
    – Cris Luengo
    Nov 12 at 16:44













    3














    You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:



    % dummy data
    n = 3;
    X = rand(n);
    Y = rand(n);
    S = rand(n);

    % vectorize
    b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));

    % check
    b - b2 % close to machine epsilon i.e. zero


    What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n] against one with [n, n], the latter being implicitly the same as [n, n, 1]. The overlapping first index corresponds to the i in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i. Then we permute the indices in order to put the "i" index last, so that we can again broadcast the result with S of (implicit) size [n, n, 1]. What we then have is a matrix of size [n, n, n] where the first two indices are matrix indices in your original and the last one corresponds to i. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).



    A slight variation of the above transposes S instead of the 3d array which might be faster (again, you should time it):



    b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));


    In terms of performance, reshape is free (it only reinterprets data, it doesn't copy) but permute/transpose will often lead to a perforance hit when data gets copied.






    share|improve this answer



























      3














      You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:



      % dummy data
      n = 3;
      X = rand(n);
      Y = rand(n);
      S = rand(n);

      % vectorize
      b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));

      % check
      b - b2 % close to machine epsilon i.e. zero


      What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n] against one with [n, n], the latter being implicitly the same as [n, n, 1]. The overlapping first index corresponds to the i in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i. Then we permute the indices in order to put the "i" index last, so that we can again broadcast the result with S of (implicit) size [n, n, 1]. What we then have is a matrix of size [n, n, n] where the first two indices are matrix indices in your original and the last one corresponds to i. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).



      A slight variation of the above transposes S instead of the 3d array which might be faster (again, you should time it):



      b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));


      In terms of performance, reshape is free (it only reinterprets data, it doesn't copy) but permute/transpose will often lead to a perforance hit when data gets copied.






      share|improve this answer

























        3












        3








        3






        You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:



        % dummy data
        n = 3;
        X = rand(n);
        Y = rand(n);
        S = rand(n);

        % vectorize
        b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));

        % check
        b - b2 % close to machine epsilon i.e. zero


        What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n] against one with [n, n], the latter being implicitly the same as [n, n, 1]. The overlapping first index corresponds to the i in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i. Then we permute the indices in order to put the "i" index last, so that we can again broadcast the result with S of (implicit) size [n, n, 1]. What we then have is a matrix of size [n, n, n] where the first two indices are matrix indices in your original and the last one corresponds to i. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).



        A slight variation of the above transposes S instead of the 3d array which might be faster (again, you should time it):



        b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));


        In terms of performance, reshape is free (it only reinterprets data, it doesn't copy) but permute/transpose will often lead to a perforance hit when data gets copied.






        share|improve this answer














        You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:



        % dummy data
        n = 3;
        X = rand(n);
        Y = rand(n);
        S = rand(n);

        % vectorize
        b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, ));

        % check
        b - b2 % close to machine epsilon i.e. zero


        What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n] against one with [n, n], the latter being implicitly the same as [n, n, 1]. The overlapping first index corresponds to the i in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i. Then we permute the indices in order to put the "i" index last, so that we can again broadcast the result with S of (implicit) size [n, n, 1]. What we then have is a matrix of size [n, n, n] where the first two indices are matrix indices in your original and the last one corresponds to i. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).



        A slight variation of the above transposes S instead of the 3d array which might be faster (again, you should time it):



        b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, ));


        In terms of performance, reshape is free (it only reinterprets data, it doesn't copy) but permute/transpose will often lead to a perforance hit when data gets copied.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 12 at 12:16

























        answered Nov 12 at 11:59









        Andras Deak

        20.4k63870




        20.4k63870



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261260%2fhow-to-compute-the-sum-of-squares-of-outer-products-of-two-matrices-minus-a-comm%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Top Tejano songwriter Luis Silva dead of heart attack at 64

            政党

            天津地下鉄3号線