Operating on histogram bins Python

I am trying to find the median of values within a bin range generated by the np.histrogram function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:

x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

y values can have any sort of x value associated with them, for example:

hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185] 
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
 0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
 0.99999265]

So, I am trying to find the median y value of the 129 values in the first bin generated, etc.

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

asked Nov 14 '18 at 3:14

hlku2334

366

I'm having a bit of trouble believing your histogram, but I understand your point.

– Mad Physicist
Nov 14 '18 at 4:25

add a comment |

x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

y values can have any sort of x value associated with them, for example:

hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185] 
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
 0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
 0.99999265]

So, I am trying to find the median y value of the 129 values in the first bin generated, etc.

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

asked Nov 14 '18 at 3:14

hlku2334

366

I'm having a bit of trouble believing your histogram, but I understand your point.

– Mad Physicist
Nov 14 '18 at 4:25

add a comment |

x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

y values can have any sort of x value associated with them, for example:

hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185] 
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
 0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
 0.99999265]

So, I am trying to find the median y value of the 129 values in the first bin generated, etc.

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

asked Nov 14 '18 at 3:14

hlku2334

366

x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

y values can have any sort of x value associated with them, for example:

hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185] 
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
 0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
 0.99999265]

So, I am trying to find the median y value of the 129 values in the first bin generated, etc.

python numpy histogram median

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

asked Nov 14 '18 at 3:14

hlku2334

366

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

asked Nov 14 '18 at 3:14

hlku2334

366

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

edited Nov 14 '18 at 4:43

Mad Physicist

36k1571100

asked Nov 14 '18 at 3:14

hlku2334

366

asked Nov 14 '18 at 3:14

hlku2334

366

asked Nov 14 '18 at 3:14

hlku2334

366

I'm having a bit of trouble believing your histogram, but I understand your point.

– Mad Physicist
Nov 14 '18 at 4:25

add a comment |

I'm having a bit of trouble believing your histogram, but I understand your point.

– Mad Physicist
Nov 14 '18 at 4:25

I'm having a bit of trouble believing your histogram, but I understand your point.

– Mad Physicist
Nov 14 '18 at 4:25

add a comment |

3 Answers
3

active

oldest

votes

One way is with pandas.cut():

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)

>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64

If you want to stay in NumPy, you might want to check out np.digitize().

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

add a comment |

You can do this by slicing a sorted version of your data using the counts as indices:

x = np.random.rand(1000)
hist,bins = np.histogram(x)

ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]

which will out the medians as a standard Python list.

answered Nov 14 '18 at 4:13

tel

7,27621431

Take a look at np.split

– Mad Physicist
Nov 14 '18 at 4:45

add a comment |

np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).

If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:

x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')

Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:

m = [np.median(x[ind == label]) for label in range(b.size - 1)]

If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:

x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292680%2foperating-on-histogram-bins-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

One way is with pandas.cut():

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)

>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64

If you want to stay in NumPy, you might want to check out np.digitize().

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

add a comment |

One way is with pandas.cut():

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)

>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64

If you want to stay in NumPy, you might want to check out np.digitize().

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

add a comment |

One way is with pandas.cut():

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)

>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64

If you want to stay in NumPy, you might want to check out np.digitize().

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

One way is with pandas.cut():

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)

>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64

If you want to stay in NumPy, you might want to check out np.digitize().

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

answered Nov 14 '18 at 3:25

Brad Solomon

13.4k73484

add a comment |

You can do this by slicing a sorted version of your data using the counts as indices:

x = np.random.rand(1000)
hist,bins = np.histogram(x)

ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]

which will out the medians as a standard Python list.

answered Nov 14 '18 at 4:13

tel

7,27621431

Take a look at np.split

– Mad Physicist
Nov 14 '18 at 4:45

add a comment |

You can do this by slicing a sorted version of your data using the counts as indices:

x = np.random.rand(1000)
hist,bins = np.histogram(x)

ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]

which will out the medians as a standard Python list.

answered Nov 14 '18 at 4:13

tel

7,27621431

Take a look at np.split

– Mad Physicist
Nov 14 '18 at 4:45

add a comment |

You can do this by slicing a sorted version of your data using the counts as indices:

x = np.random.rand(1000)
hist,bins = np.histogram(x)

ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]

which will out the medians as a standard Python list.

answered Nov 14 '18 at 4:13

tel

7,27621431

You can do this by slicing a sorted version of your data using the counts as indices:

x = np.random.rand(1000)
hist,bins = np.histogram(x)

ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]

which will out the medians as a standard Python list.

answered Nov 14 '18 at 4:13

tel

7,27621431

answered Nov 14 '18 at 4:13

tel

7,27621431

answered Nov 14 '18 at 4:13

tel

7,27621431

answered Nov 14 '18 at 4:13

tel

7,27621431

Take a look at np.split

– Mad Physicist
Nov 14 '18 at 4:45

add a comment |

Take a look at np.split

– Mad Physicist
Nov 14 '18 at 4:45

Take a look at np.split

– Mad Physicist
Nov 14 '18 at 4:45

add a comment |

np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).

If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:

x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')

Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:

m = [np.median(x[ind == label]) for label in range(b.size - 1)]

If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:

x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

add a comment |

np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).

If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:

x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')

Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:

m = [np.median(x[ind == label]) for label in range(b.size - 1)]

If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:

x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

add a comment |

np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).

If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:

x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')

Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:

m = [np.median(x[ind == label]) for label in range(b.size - 1)]

If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:

x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).

If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:

x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')

Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:

m = [np.median(x[ind == label]) for label in range(b.size - 1)]

If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:

x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

answered Nov 14 '18 at 4:41

Mad Physicist

36k1571100

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth