Count-Min Sketch and Heavy-Hitters problem










0














I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.



An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.



My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"










share|improve this question




























    0














    I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
    For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.



    An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.



    My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"










    share|improve this question


























      0












      0








      0







      I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
      For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.



      An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.



      My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"










      share|improve this question















      I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
      For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.



      An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.



      My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"







      algorithm data-structures bloom-filter






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 '18 at 8:41







      user3508140

















      asked Nov 13 '18 at 8:18









      user3508140user3508140

      1488




      1488






















          1 Answer
          1






          active

          oldest

          votes


















          0














          From Wikipedia:




          In the data stream model, the frequent elements problem is to output a
          set of elements that constitute more than some fixed fraction of the
          stream. A special case is the majority problem, which is to determine
          whether or not any value constitutes a majority of the stream.



          More formally, fix some positive constant c > 1, let the length of the
          stream be m, and let fi denote the frequency of value i in the stream.
          The frequent elements problem is to output the set i .



          Some notable algorithms are:



          • Boyer–Moore majority vote algorithm

          • Karp-Papadimitriou-Shenker algorithm

          • Count-Min sketch

          • Sticky sampling

          • Lossy counting

          • Sample and Hold

          • Multi-stage Bloom filters

          • Count-sketch

          • Sketch-guided sampling

          Event detection Detecting events in data streams is often done using a
          heavy hitters algorithm as listed above: the most frequent items and
          their frequency are determined using one of these algorithms, then the
          largest increase over the previous time point is reported as trend.
          This approach can be refined by using exponentially weighted moving
          averages and variance for normalization.




          So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.






          share|improve this answer




















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276616%2fcount-min-sketch-and-heavy-hitters-problem%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            From Wikipedia:




            In the data stream model, the frequent elements problem is to output a
            set of elements that constitute more than some fixed fraction of the
            stream. A special case is the majority problem, which is to determine
            whether or not any value constitutes a majority of the stream.



            More formally, fix some positive constant c > 1, let the length of the
            stream be m, and let fi denote the frequency of value i in the stream.
            The frequent elements problem is to output the set i .



            Some notable algorithms are:



            • Boyer–Moore majority vote algorithm

            • Karp-Papadimitriou-Shenker algorithm

            • Count-Min sketch

            • Sticky sampling

            • Lossy counting

            • Sample and Hold

            • Multi-stage Bloom filters

            • Count-sketch

            • Sketch-guided sampling

            Event detection Detecting events in data streams is often done using a
            heavy hitters algorithm as listed above: the most frequent items and
            their frequency are determined using one of these algorithms, then the
            largest increase over the previous time point is reported as trend.
            This approach can be refined by using exponentially weighted moving
            averages and variance for normalization.




            So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.






            share|improve this answer

























              0














              From Wikipedia:




              In the data stream model, the frequent elements problem is to output a
              set of elements that constitute more than some fixed fraction of the
              stream. A special case is the majority problem, which is to determine
              whether or not any value constitutes a majority of the stream.



              More formally, fix some positive constant c > 1, let the length of the
              stream be m, and let fi denote the frequency of value i in the stream.
              The frequent elements problem is to output the set i .



              Some notable algorithms are:



              • Boyer–Moore majority vote algorithm

              • Karp-Papadimitriou-Shenker algorithm

              • Count-Min sketch

              • Sticky sampling

              • Lossy counting

              • Sample and Hold

              • Multi-stage Bloom filters

              • Count-sketch

              • Sketch-guided sampling

              Event detection Detecting events in data streams is often done using a
              heavy hitters algorithm as listed above: the most frequent items and
              their frequency are determined using one of these algorithms, then the
              largest increase over the previous time point is reported as trend.
              This approach can be refined by using exponentially weighted moving
              averages and variance for normalization.




              So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.






              share|improve this answer























                0












                0








                0






                From Wikipedia:




                In the data stream model, the frequent elements problem is to output a
                set of elements that constitute more than some fixed fraction of the
                stream. A special case is the majority problem, which is to determine
                whether or not any value constitutes a majority of the stream.



                More formally, fix some positive constant c > 1, let the length of the
                stream be m, and let fi denote the frequency of value i in the stream.
                The frequent elements problem is to output the set i .



                Some notable algorithms are:



                • Boyer–Moore majority vote algorithm

                • Karp-Papadimitriou-Shenker algorithm

                • Count-Min sketch

                • Sticky sampling

                • Lossy counting

                • Sample and Hold

                • Multi-stage Bloom filters

                • Count-sketch

                • Sketch-guided sampling

                Event detection Detecting events in data streams is often done using a
                heavy hitters algorithm as listed above: the most frequent items and
                their frequency are determined using one of these algorithms, then the
                largest increase over the previous time point is reported as trend.
                This approach can be refined by using exponentially weighted moving
                averages and variance for normalization.




                So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.






                share|improve this answer












                From Wikipedia:




                In the data stream model, the frequent elements problem is to output a
                set of elements that constitute more than some fixed fraction of the
                stream. A special case is the majority problem, which is to determine
                whether or not any value constitutes a majority of the stream.



                More formally, fix some positive constant c > 1, let the length of the
                stream be m, and let fi denote the frequency of value i in the stream.
                The frequent elements problem is to output the set i .



                Some notable algorithms are:



                • Boyer–Moore majority vote algorithm

                • Karp-Papadimitriou-Shenker algorithm

                • Count-Min sketch

                • Sticky sampling

                • Lossy counting

                • Sample and Hold

                • Multi-stage Bloom filters

                • Count-sketch

                • Sketch-guided sampling

                Event detection Detecting events in data streams is often done using a
                heavy hitters algorithm as listed above: the most frequent items and
                their frequency are determined using one of these algorithms, then the
                largest increase over the previous time point is reported as trend.
                This approach can be refined by using exponentially weighted moving
                averages and variance for normalization.




                So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 13 '18 at 9:28









                Joris SchellekensJoris Schellekens

                6,03611142




                6,03611142



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276616%2fcount-min-sketch-and-heavy-hitters-problem%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Top Tejano songwriter Luis Silva dead of heart attack at 64

                    政党

                    天津地下鉄3号線