Count-Min Sketch and Heavy-Hitters problem
I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.
An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.
My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"
algorithm data-structures bloom-filter
add a comment |
I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.
An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.
My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"
algorithm data-structures bloom-filter
add a comment |
I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.
An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.
My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"
algorithm data-structures bloom-filter
I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter.
For example, the question "how many times with probability of 10% did item x appear in the stream of data" could be answered by CM.
An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.
My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be "with probability of 10%, which item was the second most frequent in the stream of data?"
algorithm data-structures bloom-filter
algorithm data-structures bloom-filter
edited Nov 13 '18 at 8:41
user3508140
asked Nov 13 '18 at 8:18
user3508140user3508140
1488
1488
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
From Wikipedia:
In the data stream model, the frequent elements problem is to output a
set of elements that constitute more than some fixed fraction of the
stream. A special case is the majority problem, which is to determine
whether or not any value constitutes a majority of the stream.
More formally, fix some positive constant c > 1, let the length of the
stream be m, and let fi denote the frequency of value i in the stream.
The frequent elements problem is to output the set i .
Some notable algorithms are:
- Boyer–Moore majority vote algorithm
- Karp-Papadimitriou-Shenker algorithm
- Count-Min sketch
- Sticky sampling
- Lossy counting
- Sample and Hold
- Multi-stage Bloom filters
- Count-sketch
- Sketch-guided sampling
Event detection Detecting events in data streams is often done using a
heavy hitters algorithm as listed above: the most frequent items and
their frequency are determined using one of these algorithms, then the
largest increase over the previous time point is reported as trend.
This approach can be refined by using exponentially weighted moving
averages and variance for normalization.
So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276616%2fcount-min-sketch-and-heavy-hitters-problem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
From Wikipedia:
In the data stream model, the frequent elements problem is to output a
set of elements that constitute more than some fixed fraction of the
stream. A special case is the majority problem, which is to determine
whether or not any value constitutes a majority of the stream.
More formally, fix some positive constant c > 1, let the length of the
stream be m, and let fi denote the frequency of value i in the stream.
The frequent elements problem is to output the set i .
Some notable algorithms are:
- Boyer–Moore majority vote algorithm
- Karp-Papadimitriou-Shenker algorithm
- Count-Min sketch
- Sticky sampling
- Lossy counting
- Sample and Hold
- Multi-stage Bloom filters
- Count-sketch
- Sketch-guided sampling
Event detection Detecting events in data streams is often done using a
heavy hitters algorithm as listed above: the most frequent items and
their frequency are determined using one of these algorithms, then the
largest increase over the previous time point is reported as trend.
This approach can be refined by using exponentially weighted moving
averages and variance for normalization.
So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.
add a comment |
From Wikipedia:
In the data stream model, the frequent elements problem is to output a
set of elements that constitute more than some fixed fraction of the
stream. A special case is the majority problem, which is to determine
whether or not any value constitutes a majority of the stream.
More formally, fix some positive constant c > 1, let the length of the
stream be m, and let fi denote the frequency of value i in the stream.
The frequent elements problem is to output the set i .
Some notable algorithms are:
- Boyer–Moore majority vote algorithm
- Karp-Papadimitriou-Shenker algorithm
- Count-Min sketch
- Sticky sampling
- Lossy counting
- Sample and Hold
- Multi-stage Bloom filters
- Count-sketch
- Sketch-guided sampling
Event detection Detecting events in data streams is often done using a
heavy hitters algorithm as listed above: the most frequent items and
their frequency are determined using one of these algorithms, then the
largest increase over the previous time point is reported as trend.
This approach can be refined by using exponentially weighted moving
averages and variance for normalization.
So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.
add a comment |
From Wikipedia:
In the data stream model, the frequent elements problem is to output a
set of elements that constitute more than some fixed fraction of the
stream. A special case is the majority problem, which is to determine
whether or not any value constitutes a majority of the stream.
More formally, fix some positive constant c > 1, let the length of the
stream be m, and let fi denote the frequency of value i in the stream.
The frequent elements problem is to output the set i .
Some notable algorithms are:
- Boyer–Moore majority vote algorithm
- Karp-Papadimitriou-Shenker algorithm
- Count-Min sketch
- Sticky sampling
- Lossy counting
- Sample and Hold
- Multi-stage Bloom filters
- Count-sketch
- Sketch-guided sampling
Event detection Detecting events in data streams is often done using a
heavy hitters algorithm as listed above: the most frequent items and
their frequency are determined using one of these algorithms, then the
largest increase over the previous time point is reported as trend.
This approach can be refined by using exponentially weighted moving
averages and variance for normalization.
So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.
From Wikipedia:
In the data stream model, the frequent elements problem is to output a
set of elements that constitute more than some fixed fraction of the
stream. A special case is the majority problem, which is to determine
whether or not any value constitutes a majority of the stream.
More formally, fix some positive constant c > 1, let the length of the
stream be m, and let fi denote the frequency of value i in the stream.
The frequent elements problem is to output the set i .
Some notable algorithms are:
- Boyer–Moore majority vote algorithm
- Karp-Papadimitriou-Shenker algorithm
- Count-Min sketch
- Sticky sampling
- Lossy counting
- Sample and Hold
- Multi-stage Bloom filters
- Count-sketch
- Sketch-guided sampling
Event detection Detecting events in data streams is often done using a
heavy hitters algorithm as listed above: the most frequent items and
their frequency are determined using one of these algorithms, then the
largest increase over the previous time point is reported as trend.
This approach can be refined by using exponentially weighted moving
averages and variance for normalization.
So, yes. CMS can be used to determine frequency (in an approximative manner), which can be used to answer the HH question.
answered Nov 13 '18 at 9:28
Joris SchellekensJoris Schellekens
6,03611142
6,03611142
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276616%2fcount-min-sketch-and-heavy-hitters-problem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown