What is the difference between rowsBetween and rangeBetween?










9















From the PySpark docs rangeBetween:




rangeBetween(start, end)



Defines the frame boundaries, from start (inclusive) to end (inclusive).



Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one off before the current row, and “5” means the five off after the current row.



Parameters:



  • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

  • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
    New in version 1.4.



while rowsBetween




rowsBetween(start, end)



Defines the frame boundaries, from start (inclusive) to end (inclusive).



Both start and end are relative positions from the current row. For example, “0” means “current row”, while “-1” means the row before the current row, and “5” means the fifth row after the current row.



Parameters:



  • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

  • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
    New in version 1.4.



For rangeBetween how is "1 off" different from "1 row", for example?










share|improve this question




























    9















    From the PySpark docs rangeBetween:




    rangeBetween(start, end)



    Defines the frame boundaries, from start (inclusive) to end (inclusive).



    Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one off before the current row, and “5” means the five off after the current row.



    Parameters:



    • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

    • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
      New in version 1.4.



    while rowsBetween




    rowsBetween(start, end)



    Defines the frame boundaries, from start (inclusive) to end (inclusive).



    Both start and end are relative positions from the current row. For example, “0” means “current row”, while “-1” means the row before the current row, and “5” means the fifth row after the current row.



    Parameters:



    • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

    • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
      New in version 1.4.



    For rangeBetween how is "1 off" different from "1 row", for example?










    share|improve this question


























      9












      9








      9


      1






      From the PySpark docs rangeBetween:




      rangeBetween(start, end)



      Defines the frame boundaries, from start (inclusive) to end (inclusive).



      Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one off before the current row, and “5” means the five off after the current row.



      Parameters:



      • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

      • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
        New in version 1.4.



      while rowsBetween




      rowsBetween(start, end)



      Defines the frame boundaries, from start (inclusive) to end (inclusive).



      Both start and end are relative positions from the current row. For example, “0” means “current row”, while “-1” means the row before the current row, and “5” means the fifth row after the current row.



      Parameters:



      • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

      • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
        New in version 1.4.



      For rangeBetween how is "1 off" different from "1 row", for example?










      share|improve this question
















      From the PySpark docs rangeBetween:




      rangeBetween(start, end)



      Defines the frame boundaries, from start (inclusive) to end (inclusive).



      Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one off before the current row, and “5” means the five off after the current row.



      Parameters:



      • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

      • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
        New in version 1.4.



      while rowsBetween




      rowsBetween(start, end)



      Defines the frame boundaries, from start (inclusive) to end (inclusive).



      Both start and end are relative positions from the current row. For example, “0” means “current row”, while “-1” means the row before the current row, and “5” means the fifth row after the current row.



      Parameters:



      • start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower).

      • end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher).
        New in version 1.4.



      For rangeBetween how is "1 off" different from "1 row", for example?







      sql apache-spark pyspark apache-spark-sql window-functions






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Sep 28 '18 at 4:32









      Community

      11




      11










      asked Oct 14 '16 at 17:32









      Evan ZamirEvan Zamir

      2,59133354




      2,59133354






















          1 Answer
          1






          active

          oldest

          votes


















          13














          It is simple:




          • ROWS BETWEEN doesn't care about the exact values. It cares only about the order of rows, and takes fixed number of preceding and following rows when computing frame.


          • RANGE BETWEEN considers values when computing frame.

          Let's use an example using two window definitions:



          • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

          • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW

          and data as



          +---+
          | x|
          +---+
          | 10|
          | 20|
          | 30|
          | 31|
          +---+


          Assuming the current row is the one with value 31 for the first window following rows will be included (current one, and two preceding):



          +---+----------------------------------------------------+
          | x|ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW|
          +---+----------------------------------------------------+
          | 10| false|
          | 20| true|
          | 30| true|
          | 31| true|
          +---+----------------------------------------------------+


          and for the second one following (current one, and all preceding where x >= 31 - 2):



          +---+-----------------------------------------------------+
          | x|ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW|
          +---+-----------------------------------------------------+
          | 10| false|
          | 20| false|
          | 30| true|
          | 31| true|
          +---+-----------------------------------------------------+





          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40048919%2fwhat-is-the-difference-between-rowsbetween-and-rangebetween%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            13














            It is simple:




            • ROWS BETWEEN doesn't care about the exact values. It cares only about the order of rows, and takes fixed number of preceding and following rows when computing frame.


            • RANGE BETWEEN considers values when computing frame.

            Let's use an example using two window definitions:



            • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

            • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW

            and data as



            +---+
            | x|
            +---+
            | 10|
            | 20|
            | 30|
            | 31|
            +---+


            Assuming the current row is the one with value 31 for the first window following rows will be included (current one, and two preceding):



            +---+----------------------------------------------------+
            | x|ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW|
            +---+----------------------------------------------------+
            | 10| false|
            | 20| true|
            | 30| true|
            | 31| true|
            +---+----------------------------------------------------+


            and for the second one following (current one, and all preceding where x >= 31 - 2):



            +---+-----------------------------------------------------+
            | x|ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW|
            +---+-----------------------------------------------------+
            | 10| false|
            | 20| false|
            | 30| true|
            | 31| true|
            +---+-----------------------------------------------------+





            share|improve this answer





























              13














              It is simple:




              • ROWS BETWEEN doesn't care about the exact values. It cares only about the order of rows, and takes fixed number of preceding and following rows when computing frame.


              • RANGE BETWEEN considers values when computing frame.

              Let's use an example using two window definitions:



              • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

              • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW

              and data as



              +---+
              | x|
              +---+
              | 10|
              | 20|
              | 30|
              | 31|
              +---+


              Assuming the current row is the one with value 31 for the first window following rows will be included (current one, and two preceding):



              +---+----------------------------------------------------+
              | x|ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW|
              +---+----------------------------------------------------+
              | 10| false|
              | 20| true|
              | 30| true|
              | 31| true|
              +---+----------------------------------------------------+


              and for the second one following (current one, and all preceding where x >= 31 - 2):



              +---+-----------------------------------------------------+
              | x|ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW|
              +---+-----------------------------------------------------+
              | 10| false|
              | 20| false|
              | 30| true|
              | 31| true|
              +---+-----------------------------------------------------+





              share|improve this answer



























                13












                13








                13







                It is simple:




                • ROWS BETWEEN doesn't care about the exact values. It cares only about the order of rows, and takes fixed number of preceding and following rows when computing frame.


                • RANGE BETWEEN considers values when computing frame.

                Let's use an example using two window definitions:



                • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

                • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW

                and data as



                +---+
                | x|
                +---+
                | 10|
                | 20|
                | 30|
                | 31|
                +---+


                Assuming the current row is the one with value 31 for the first window following rows will be included (current one, and two preceding):



                +---+----------------------------------------------------+
                | x|ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW|
                +---+----------------------------------------------------+
                | 10| false|
                | 20| true|
                | 30| true|
                | 31| true|
                +---+----------------------------------------------------+


                and for the second one following (current one, and all preceding where x >= 31 - 2):



                +---+-----------------------------------------------------+
                | x|ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW|
                +---+-----------------------------------------------------+
                | 10| false|
                | 20| false|
                | 30| true|
                | 31| true|
                +---+-----------------------------------------------------+





                share|improve this answer















                It is simple:




                • ROWS BETWEEN doesn't care about the exact values. It cares only about the order of rows, and takes fixed number of preceding and following rows when computing frame.


                • RANGE BETWEEN considers values when computing frame.

                Let's use an example using two window definitions:



                • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

                • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW

                and data as



                +---+
                | x|
                +---+
                | 10|
                | 20|
                | 30|
                | 31|
                +---+


                Assuming the current row is the one with value 31 for the first window following rows will be included (current one, and two preceding):



                +---+----------------------------------------------------+
                | x|ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW|
                +---+----------------------------------------------------+
                | 10| false|
                | 20| true|
                | 30| true|
                | 31| true|
                +---+----------------------------------------------------+


                and for the second one following (current one, and all preceding where x >= 31 - 2):



                +---+-----------------------------------------------------+
                | x|ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW|
                +---+-----------------------------------------------------+
                | 10| false|
                | 20| false|
                | 30| true|
                | 31| true|
                +---+-----------------------------------------------------+






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 14 '18 at 18:22

























                answered Oct 14 '16 at 18:02









                user6910411user6910411

                33.9k1079101




                33.9k1079101





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40048919%2fwhat-is-the-difference-between-rowsbetween-and-rangebetween%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Top Tejano songwriter Luis Silva dead of heart attack at 64

                    ReactJS Fetched API data displays live - need Data displayed static

                    Evgeni Malkin