Histograms and kernel density estimation KDE 2
You can download this whole post as a Jupyter notebook here
Why histograms¶
As we all know, Histograms are an extremely common way to make sense of discrete data. Whether we mean to or not, when we're using histograms, we're usually doing some form of density estimation. That is, although we only have a few discrete data points, we'd really pretend that we have some sort of continuous distribution, and we'd really like to know what that distribution is. For instance, I was recently grading an exam and trying to figure out what the underlying distribution of grades looked like, whether I should curve the exam, and, if so, how I should curve it.
I'll poke at this in an IPython Notebook; if you're doing this in a different environments, you may wish to uncomment out the commented lines so that your namespace is properly polluted.
from __future__ import division
%pylab inline
grades = array((93.5,93,60.8,94.5,82,87.5,91.5,99.5,86,93.5,92.5,78,76,69,94.5,89.5,92.8,78,65.5,98,98.5,92.3,95.5,76,91,95,61.4,96,90))
junk = hist(grades)
Why not histograms?¶
We can play around with the number of bins, but it's not totally clear what's going on with the left half of the grades.
junk = hist(grades,5)
junk = hist(grades,15)
So, maybe the histogram isn't the perfect tool for the job at hand. In fact, there are quite a few well-known problems with histograms. Shodor has a really nice histogram activity that lets you play around with data interactively. Rather than using Java or JavaScript directly, Jake Vanderplas has a great package called JSAnimation that lets us animate things directly in IPython Notebooks. I'll cheat a bit: since all I really need for this is a single slider, I can use JSAnimation to let us interact with data very similarly to the Shodor pages.
from JSAnimation.IPython_display import display_animation, anim_to_html
Before we start, I'll load in a few data sets. If you're interested, you can rerun this notebook with a different data set to see how it affects things. data_shodor
is the "My Data" set from their histogram activity page, data_sat
is the average SAT Math data from the same page, data_tarn
is from Tarn Duong's fantastic KDE explanation (we'll get there), and simple_data
is just a very simple data set.
data_tarn = array((2.1,2.2,2.3,2.25,2.4,2.61,2.62,3.3,3.4,3.41,3.6,3.8))
data_shodor = array((49,49,45,45,41,38,38,38,40,37,37,34,35,36,35,38,38,32,32,32,37,31,32,31,32,30,30,32,30,30,29,28,29,29,29,30,28,27,29,30,28,27,28,27,27,29,29,29,26,27,25,25,25,25,25,25,25,26,26,27))
data_sat = array((490,499,459,575,575,513,382,525,510,542,368,564,509,530,485,521,495,526,474,500,441,750,495,476,456,440,547,479,501,476,457,444,444,467,482,449,464,501,670,740,590,700,590,450,452,468,472,447,520,506,570,474,532,472,585,466,549,736,654,585,574,621,542,616,547,554,514,592,531,550,507,441,551,450,548,589,549,485,480,545,451,448,487,480,540,470,529,445,460,457,560,495,480,430,644,489,506,660,444,551,583,457,440,470,486,413,470,408,440,596,442,544,528,559,505,450,477,557,446,553,370,533,496,513,403,496,543,533,471,404,439,459,393,470,650,512,445,446,474,449,529,538,450,570,499,375,515,445,571,442,492,456,428,536,515,450,537,490,446,410,526,560,560,540,502,642,590,480,557,468,524,445,479))
simple_data = array((0,5,10))
data = grades
Two of the main problems with histograms are (1) you need to define a bin size (2) you need to decide where the left edge of the bin is.
Histogram bin size¶
Let's look at the effects of bin size on histograms.
Caveat: the code below is certainly not optimized. Ditto for all of the code in this notebook. I wrote it quickly and at the same time I learned what FuncAnimation
does. In order to make this read more easily, I've included most of the code at the end. If you're running this interactively, run the cell at the end now!
Let's start with getHistBinNumAni
. What does that do? Given a data set, it'll give us an interactive plot. By dragging the slider around, we can make a histogram with anywhere from 1 bin to some max (default: 20) number of bins. No matter how many bins we have, the actual data is shown in blue dots near the bottom. Here's what it looks like for the grades:
ani = getHistBinNumAni(data)
display_animation(ani, default_mode='once')
So, obviously chosing the number of bins makes a huge difference in how we'd interpret the data.
Where do the histogram bins start?¶
One of the other big problems with histograms, especially for relatively small data sets, is that you have to choose where the left edge of the first bin goes. Do you center the bin around the first group of points? Do you make the left edge match up with the left-most data point? Let's make some plots to see how that can affect things, because it's a bit easier to understand what I'm going on about that way. We'll make a similar animation with getHistBinOffsetAni
. As with the previous animation, drag the slider around. This time, we have the same number of bins, but the slider drags around the data relative to the bins (or vice versa, depending on how you think of it).
ani = getHistBinOffsetAni(data)
display_animation(ani, default_mode='once')
KDE (Kernel Density Estimation) to the rescue!¶
Kernel density estimation is my favorite alternative to histograms. Tarn Duong has fantastic KDE explanation, which is well worth reading. The basic idea is that, if you're looking at our simple dataset (simple_data = array((0,5,10)
), you might choose to represent each point as a rectangle:
bar(simple_data,ones_like(simple_data)*0.5,width=0.5,facecolor='grey',alpha=0.5)
junk = ylim(0, 2.0)
not so interesting so far, but what do we do when the rectangles get wide enough that they start to overlap? Instead of just letting them run over each other like
bar(simple_data,ones_like(simple_data)*0.5,width=6,facecolor='grey',alpha=0.5)
junk = ylim(0, 2.0)
and instead of coloring the overlap regions darker grey, we add the rectangles together. So, since each of the rectangles has height 0.5 in the above example, the dark grey regions should really have height 1.0. This idea is called "kernel density estimation" (KDE), and the rectangle that we're using is called the "kernel". If we wanted to draw a different shape at each point, we'd do so by specifying a different kernel (perhaps a bell curve, or a triangle).
KDE, rectangular kernel¶
Now let's try KDE with a rectangular kernel. This time, using getKdeRectAni
, you get a slider controls the width of the kernel.
ani = getKdeRectAni(simple_data)
display_animation(ani, default_mode='once')
play with the slider, and note what happens when you make it big enough that the rectangles start to overlap. By tuning the width of the rectangles, we can tune how finely or coarsely we're looking at the data. It's not so powerful with three data points, but check it out with the grades from above:
ani = getKdeRectAni(data)
display_animation(ani, default_mode='once')
In my view, there's a sweet spot right around 1/8 or 1/9 of the way across the slider where there are three distinct peaks. It looks very much like a trimodal distribution to me. So far, this isn't totally automatic; we have to pick the width of our kernel, but it's obvoius that KDE can give us a much better view of the underlying data than histograms!
Again, the slider controls kernel width.
ani = getKdeGaussianAni(data)
display_animation(ani, default_mode='once')
This gives us a really nice picture of the data. Play around with the slider and see what you think.
Kernel width¶
Not only does KDE give us a better picture than histograms, but there turn out to be actual answers to the question of "how wide should my kernel be?" You can see, for instance, that making the kernel too narrow doesn't provide much more information than the raw data, while making it too large oversmooths the data, making it mostly look like a single kernel with some bits on the sides.
Daniel Smith has a really nice KDE module that chooses an optimal bandwidth and can be used with SciPy (scipy does have its own KDE module, but I've found Daniel's to be quite robust).
Other data sets¶
I highly recommend just playing around with other data sets using the above code. I was interested in playing around with income data, so I show how to grab that data from the IRS website below and play around a bit without comment. Enjoy!
Income data¶
Let's grab the income data from The IRS and make some plots.
import urllib
f = urllib.urlopen("http://www.irs.gov/file_source/pub/irs-soi/09incicsv.csv")
#"State_Code","County_Code","State_Abbrv","County_Name", "Return_Num","Exmpt_Num","AGI","Wages_Salaries","Dividends","Interest"
irs2009 = loadtxt(f,delimiter=',',skiprows=1,usecols=(4,5,6,7,8,9))
agi2009 = irs2009[:,2]
Now try things like
ani = getHistBinNumAni(agi2009)
display_animation(ani, default_mode='once')
Whoops, that's hard to make sense of. Let's use logs
la2009 = log(agi2009)
la2009 = la2009[-isnan(la2009)]
ani = getHistBinNumAni(la2009)
display_animation(ani, default_mode='once')
ani = getKdeRectAni(la2009)
display_animation(ani, default_mode='once')
ani = getKdeGaussianAni(la2009)
display_animation(ani,default_mode='once')
In order to make this read more easily, I've put the bulk of the code below. You'll have to run it before the previous cells.
#!/usr/bin/env python
from numpy import histogram as nphistogram
#from numpy import array, linspace, zeros, ones, ones_like
#import numpy as np
#import matplotlib.pyplot as plt
#from matplotlib.pyplot import figure, hist, plot, ion, axes, title
from JSAnimation.IPython_display import display_animation, anim_to_html
from matplotlib import animation as animation
def getHistBinNumAni(data,totalframes=None,showpts=True):
#ion()
if totalframes is None:
totalframes = min(len(data)-1,100)
fig = figure()
ax = fig.gca()
n, bins, patches = hist(data, totalframes, normed=1, facecolor='green', alpha=0.0)
if showpts:
junk = plot(data,0.2*ones_like(data),'bo')
def animate(i):
n, bins = nphistogram(data, i+1, normed=False)
#print n
ax.set_ylim(0,1.1*n.max())
for j in range(len(n)):
rect,h = patches[j],n[j]
#print h.max()
x = bins[j]
w = bins[j+1] - x
rect.set_height(h)
rect.set_x(x)
rect.set_width(w)
rect.set_alpha(0.75)
#fig.canvas.draw()
ani = animation.FuncAnimation(fig, animate, totalframes, repeat=False)
return ani
def getHistBinOffsetAni(data,nbins=20,showpts=True):
offsets = linspace(-0.5,0.5,50)
totalframes = len(offsets)
fig = figure()
ax = fig.gca()
n, _bins, patches = hist(data, nbins, normed=1, facecolor='green', alpha=0.0)
if showpts:
junk = plot(data,0.2*ones_like(data),'bo')
# Obnoxious: find max number in a bin ever
nmax = 1
for i in range(totalframes):
dx = (data.max() - data.min())/nbins
_bins = linspace(data.min() - dx + offsets[i]*dx, data.max()+dx + offsets[i]*dx,len(data)+1)
n, bins = nphistogram(data, bins=_bins, normed=False)
nmax = max(nmax,n.max())
def animate(i):
dx = (data.max() - data.min())/nbins
# bins go from min - dx to max + dx, then offset.
_bins = linspace(data.min() - dx + offsets[i]*dx, data.max()+dx + offsets[i]*dx,nbins)
n, bins = nphistogram(data, bins = _bins, normed=False)
ax.set_ylim(0,1.1*nmax)
#ax.set_xlim(data.min()-dx,data.max()+dx)
binwidth = bins[1] - bins[0]
ax.set_xlim(bins[0]-binwidth,bins[-1] + binwidth)
for j in range(len(n)):
#continue
rect,h = patches[j],n[j]
#print h.max()
x = bins[j]
w = bins[j+1] - x
rect.set_height(h)
rect.set_x(x)
rect.set_width(w)
rect.set_alpha(0.75)
fig.canvas.draw()
ani = animation.FuncAnimation(fig, animate, totalframes, repeat=False)
return ani
#!/usr/bin/env python
from numpy import sqrt, pi, exp
def getKdeGaussianAni(data,totalframes=100, showpts=True):
fig = figure()
# Let's say 10000 points for the whole thing
width = data.max() - data.min()
left, right = data.min(), data.min() + (width)
left, right = left - (totalframes/100)*width, right + (totalframes/100)*width
ax = axes(xlim=(left,right),ylim=(-0.1,2))
line, = ax.plot([], [], lw=2)
if showpts:
junk = plot(data,ones_like(data)*0.1,'go')
numpts = 10000
x = linspace(left,right,numpts)
dx = (right-left)/(numpts-1)
def init():
line.set_data([], [])
return line,
def gaussian(x,sigma,mu):
# Why isn't this defined somewhere?! It must be!
return (1/sqrt(2*pi*sigma**2)) * exp(-((x-mu)**2)/(2*sigma**2))
def animate(i):
y = zeros(10000)
kernelwidth = .02*width*(i+1)
kernelpts = int(kernelwidth/dx)
kernel = gaussian(linspace(-3,3,kernelpts),1,0)
#kernel = ones(kernelpts)
for d in data:
center = d - left
centerpts = int(center/dx)
bottom = centerpts - int(kernelpts/2)
top = centerpts+int(kernelpts/2)
if top - bottom < kernelpts: top = top + 1
if top - bottom > kernelpts: top = top - 1
y[bottom:top] += kernel
ax.set_xlim(x[where(y>0)[0][0]],x[where(y>0)[0][-1]])
line.set_data(x,y)
ax.set_ylim(min(0,y.min()),1.1*y.max())
#title('ymin %s ymax %s'%(y.min(),y.max()))
#sleep(0.1)
return line,
ani = animation.FuncAnimation(fig, animate, init_func=init,
frames=totalframes, repeat=False)
return ani
#FACTOR ME for rect and gaussian
def getKdeRectAni(data,totalframes=100,showpts=True):
#ion()
totalframes = 100
fig = figure()
# Let's say 10000 points for the whole thing
width = data.max() - data.min()
left, right = data.min(), data.min() + (width)
left, right = left - (totalframes/100)*width, right + (totalframes/100)*width
ax = axes(xlim=(left,right),ylim=(-0.1,2))
line, = ax.plot([], [], lw=2)
numpts = 10000
x = linspace(left,right,numpts)
dx = (right-left)/(numpts-1)
def init():
line.set_data([], [])
return line,
if showpts:
junk = plot(data,0.2*ones_like(data),'bo')
def animate(i):
y = zeros(10000)
kernelwidth = .02*width*(i+1)
kernelpts = int(kernelwidth/dx)
kernel = ones(kernelpts)
for d in data:
center = d - left
centerpts = int(center/dx)
bottom = centerpts - int(kernelpts/2)
top = centerpts+int(kernelpts/2)
if top - bottom < kernelpts: top = top + 1
if top - bottom > kernelpts: top = top - 1
y[bottom:top] += kernel
line.set_data(x,y)
ax.set_ylim(0,1.1*y.max())
ax.set_xlim(x[where(y>0)[0][0]],x[where(y>0)[0][-1]])
#sleep(0.1)
return line,
ani = animation.FuncAnimation(fig, animate, init_func=init,
frames=totalframes, repeat=False)
return ani
And that's it. Cheers!
from IPython.core.display import HTML
HTML('''<h2>Comments from old blog</h2>
<div id="comments">
<h3 id="comments-title">13 Responses to <em>Histograms and Kernel Density Estimation (KDE)</em></h3>
<ol class="commentlist">
<li class="comment even thread-even depth-1" id="li-comment-511">
<div id="comment-511">
<div class="comment-author vcard">
<img alt='' src='http://2.gravatar.com/avatar/e1e193d200f563e379624d348296699f?s=40&d=mm&r=g' srcset='http://2.gravatar.com/avatar/e1e193d200f563e379624d348296699f?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">Arindam Paul</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-511">
December 14, 2013 at 10:04 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=511">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Excellent !! One of the best explanations of KDE I have ever seen.<br />
This post has generated enough interest to read your other blogs. great job.</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=511#respond' onclick='return addComment.moveForm( "comment-511", "511", "respond", "28" )' aria-label='Reply to Arindam Paul'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
<li class="comment odd alt thread-odd thread-alt depth-1" id="li-comment-560">
<div id="comment-560">
<div class="comment-author vcard">
<img alt='' src='http://0.gravatar.com/avatar/c4332fd6971134c4ce706b3021a1afef?s=40&d=mm&r=g' srcset='http://0.gravatar.com/avatar/c4332fd6971134c4ce706b3021a1afef?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">Nils Wagner</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-560">
February 19, 2014 at 8:45 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=560">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Assume that we have a spatial energy distribution given at discrete points in 3-D, i.e.</p>
<p>E_i(x_i,y_i,z_i)</p>
<p>where E_i denotes the energy and x_i,y_i,z_i are the corresponding coordinates.</p>
<p>Is it possible to extract the local hot spots using scipy ?</p>
<p>A small example is appreciated.</p>
<p>Thanks in advance.</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=560#respond' onclick='return addComment.moveForm( "comment-560", "560", "respond", "28" )' aria-label='Reply to Nils Wagner'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
<li class="comment even thread-even depth-1" id="li-comment-9267">
<div id="comment-9267">
<div class="comment-author vcard">
<img alt='' src='http://1.gravatar.com/avatar/77076e72c1acb8e5e811165d72334357?s=40&d=mm&r=g' srcset='http://1.gravatar.com/avatar/77076e72c1acb8e5e811165d72334357?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn"><a href='http://www.yourdomain.com' rel='external nofollow' class='url'>domain</a></cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-9267">
October 12, 2014 at 9:36 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=9267">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>It's really a cool and helpful piece of info. I'm happy that you simply shared this helpful information with us.<br />
Please stay us up to date like this. Thanks for sharing.</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=9267#respond' onclick='return addComment.moveForm( "comment-9267", "9267", "respond", "28" )' aria-label='Reply to domain'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
<li class="comment odd alt thread-odd thread-alt depth-1" id="li-comment-21606">
<div id="comment-21606">
<div class="comment-author vcard">
<img alt='' src='http://0.gravatar.com/avatar/39a8a0bf814df9653a7a7cb1ffc2aee1?s=40&d=mm&r=g' srcset='http://0.gravatar.com/avatar/39a8a0bf814df9653a7a7cb1ffc2aee1?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">Andreas</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-21606">
February 1, 2015 at 9:13 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=21606">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Thanks for sharing your knowledge and interpretation of kernel density estimation with us. Very enlighting.</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=21606#respond' onclick='return addComment.moveForm( "comment-21606", "21606", "respond", "28" )' aria-label='Reply to Andreas'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
<li class="comment even thread-even depth-1" id="li-comment-22489">
<div id="comment-22489">
<div class="comment-author vcard">
<img alt='' src='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=40&d=mm&r=g' srcset='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">gmas</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-22489">
February 17, 2015 at 8:23 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=22489">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>If I try to run your notebook, I get this name error:</p>
<p><code><br />
NameError Traceback (most recent call last)<br />
in ()<br />
----> 1 ani = getHistBinNumAni(data)<br />
2 display_animation(ani, default_mode='once')</p>
<p>NameError: name 'getHistBinNumAni' is not defined<br />
</code></p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=22489#respond' onclick='return addComment.moveForm( "comment-22489", "22489", "respond", "28" )' aria-label='Reply to gmas'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
<ul class="children">
<li class="comment odd alt depth-2" id="li-comment-22490">
<div id="comment-22490">
<div class="comment-author vcard">
<img alt='' src='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=40&d=mm&r=g' srcset='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">gmas</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-22490">
February 17, 2015 at 8:25 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=22490">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Ops.. I have just read the last part that asks to run the code before the other cells! Maybe you can add a note at the begin of the post..</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=22490#respond' onclick='return addComment.moveForm( "comment-22490", "22490", "respond", "28" )' aria-label='Reply to gmas'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
<li class="comment even thread-odd thread-alt depth-1" id="li-comment-26940">
<div id="comment-26940">
<div class="comment-author vcard">
<img alt='' src='http://1.gravatar.com/avatar/ab757a5013bba27ff3d69b8448b5b4a9?s=40&d=mm&r=g' srcset='http://1.gravatar.com/avatar/ab757a5013bba27ff3d69b8448b5b4a9?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">X</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-26940">
June 23, 2015 at 4:54 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=26940">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Is there a way to fit data to an exponential distribution such that it maximizes the entropy H(p_i) = - sum p_i*log(p_i) where p_i is the probability of a given event?</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=26940#respond' onclick='return addComment.moveForm( "comment-26940", "26940", "respond", "28" )' aria-label='Reply to X'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
<ul class="children">
<li class="comment byuser comment-author-mglerner bypostauthor odd alt depth-2" id="li-comment-26985">
<div id="comment-26985">
<div class="comment-author vcard">
<img alt='' src='http://1.gravatar.com/avatar/d49bf8fdd300871a66f21a8a97674483?s=40&d=mm&r=g' srcset='http://1.gravatar.com/avatar/d49bf8fdd300871a66f21a8a97674483?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">mglerner</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-26985">
June 25, 2015 at 4:00 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=26985">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>I don't know, but I've been wondering about similar things for a while. If I do learn the answer, I'll update.</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=26985#respond' onclick='return addComment.moveForm( "comment-26985", "26985", "respond", "28" )' aria-label='Reply to mglerner'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
<li class="post pingback">
<p>Pingback: <a href='https://www.physicsforums.com/threads/histogram-to-pdf.835833/#post-5247985' rel='external nofollow' class='url'>Histogram to PDF</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=29099">(Edit)</a></p>
</li><!-- #comment-## -->
<li class="post pingback">
<p>Pingback: <a href='https://www.physicsforums.com/threads/histogram-to-pdf.835833/#post-5252532' rel='external nofollow' class='url'>Histogram to PDF</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=29216">(Edit)</a></p>
</li><!-- #comment-## -->
<li class="comment even thread-even depth-1" id="li-comment-29361">
<div id="comment-29361">
<div class="comment-author vcard">
<img alt='' src='http://2.gravatar.com/avatar/50bb656b9713cee43bb2bdd8c25f75ae?s=40&d=mm&r=g' srcset='http://2.gravatar.com/avatar/50bb656b9713cee43bb2bdd8c25f75ae?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn"><a href='http://www.fixmynix.com' rel='external nofollow' class='url'>Koushik Khan</a></cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-29361">
October 14, 2015 at 3:09 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=29361">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Awesome presentation !</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=29361#respond' onclick='return addComment.moveForm( "comment-29361", "29361", "respond", "28" )' aria-label='Reply to Koushik Khan'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
<li class="comment odd alt thread-odd thread-alt depth-1" id="li-comment-33124">
<div id="comment-33124">
<div class="comment-author vcard">
<img alt='' src='http://0.gravatar.com/avatar/67cc0e4cd331c2d2728133b615aced06?s=40&d=mm&r=g' srcset='http://0.gravatar.com/avatar/67cc0e4cd331c2d2728133b615aced06?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn">Ben</cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-33124">
July 13, 2016 at 7:09 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=33124">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Fantastic explanation!<br />
Best KDE description I've found so far!<br />
Keep up the good work!</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=33124#respond' onclick='return addComment.moveForm( "comment-33124", "33124", "respond", "28" )' aria-label='Reply to Ben'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
<li class="comment even thread-even depth-1" id="li-comment-34145">
<div id="comment-34145">
<div class="comment-author vcard">
<img alt='' src='http://2.gravatar.com/avatar/e04743b9f0b91fe808eda34de68c222a?s=40&d=mm&r=g' srcset='http://2.gravatar.com/avatar/e04743b9f0b91fe808eda34de68c222a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' /> <cite class="fn"><a href='http://www.imdb.com/' rel='external nofollow' class='url'>Milissa Washam</a></cite> <span class="says">says:</span> </div><!-- .comment-author .vcard -->
<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-34145">
December 11, 2016 at 12:20 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&c=34145">(Edit)</a> </div><!-- .comment-meta .commentmetadata -->
<div class="comment-body"><p>Just wanted to say this website is extremely good. I always want to hear new things about this because I’ve the similar blog during my Country with this subject which means this help´s me a lot. I did so a search around the issue and located a large amount of blogs but nothing beats this. Many thanks for sharing so much inside your blog..</p>
</div>
<div class="reply">
<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&replytocom=34145#respond' onclick='return addComment.moveForm( "comment-34145", "34145", "respond", "28" )' aria-label='Reply to Milissa Washam'>Reply</a> </div><!-- .reply -->
</div><!-- #comment-## -->
</li><!-- #comment-## -->
</ol>
</div><!-- #comments -->
''')
Excellent !! One of the best explanations of KDE I have ever seen.
This post has generated enough interest to read your other blogs. great job.
Assume that we have a spatial energy distribution given at discrete points in 3-D, i.e.
E_i(x_i,y_i,z_i)
where E_i denotes the energy and x_i,y_i,z_i are the corresponding coordinates.
Is it possible to extract the local hot spots using scipy ?
A small example is appreciated.
Thanks in advance.
It's really a cool and helpful piece of info. I'm happy that you simply shared this helpful information with us.
Please stay us up to date like this. Thanks for sharing.
Thanks for sharing your knowledge and interpretation of kernel density estimation with us. Very enlighting.
If I try to run your notebook, I get this name error:
NameError Traceback (most recent call last)
in ()
----> 1 ani = getHistBinNumAni(data)
2 display_animation(ani, default_mode='once')
NameError: name 'getHistBinNumAni' is not defined
Ops.. I have just read the last part that asks to run the code before the other cells! Maybe you can add a note at the begin of the post..
Is there a way to fit data to an exponential distribution such that it maximizes the entropy H(p_i) = - sum p_i*log(p_i) where p_i is the probability of a given event?
I don't know, but I've been wondering about similar things for a while. If I do learn the answer, I'll update.
Pingback: Histogram to PDF (Edit)
Pingback: Histogram to PDF (Edit)
Awesome presentation !
Fantastic explanation!
Best KDE description I've found so far!
Keep up the good work!
Just wanted to say this website is extremely good. I always want to hear new things about this because I’ve the similar blog during my Country with this subject which means this help´s me a lot. I did so a search around the issue and located a large amount of blogs but nothing beats this. Many thanks for sharing so much inside your blog..