# Multi-modality: Is this a good measure for it?

Discussion in 'Scientific Statistics Math' started by Kerry, Dec 17, 2011.

1. ### KerryGuest

Hi,

I have 68 samples of neurons where I am plotting their length
distributed across distance from the neuron origin, where distance is
binned into tenths. Looking at the shape of these plots, many of the
neurons look bimodal or trimodal to me. I want to see if anyone has
some feedback for the following method for quantifying multimodality:
For each sample neuron, if I connect the length value for each bin to
the next bin with a line I get a spatial profile I'll call the 'value
line'. So I can measure the area above and beneath the value line and
take the ratio between those areas. Beneath the line is obvious. By
above the line, I mean wherever there are 2 local peaks with a trough
between them (i.e. whenever the slope goes from positive to
negative), I would measure between a line connecting those peaks and
the lower boundary provided by the value line (i.e. the area of
trough). The area would be measured here as a polygon where the number
of sides depends on the number of bins between the peaks. Then I'd sum
all of these local areas to get the total 'above the value line' area
and divide by the area underneath the value line to get a continuous
value for multi-modality that I can compare across neurons, Here,
larger values mean more multi-modal. I like this idea because it
doesn't assume normality. Does this method make sense? Is there a term
use for it? Does anyone know of any references that use this method (I
don't care if the subject matter relates)?

Thanks,
kbrownk

Kerry, Dec 17, 2011

2. ### Rich UlrichGuest

We might have some ideas, but I think that I speak for most of
and we need more explanation of the problem.

I can say what I am guessing, and you might work from there, if
it is not too far off.

I think you are saying that you have 68 neurons. Each neuron has
dozens or hundreds of "arms" which you are measuring. The measurement
question is addressed to assessing multimodality for measures for a
single neuron.

You are measuring the lengths and "binning" the results "into tenths".
- Does that mean that take the minimum and maximum for a neuron, and
divide the range into 10 intervals? (How many measures are there?)
- Does the overall pattern of lengths look like it is Normallly
distributed around a mean, or uniformly distributed, or (perhaps)
distributed where the log of the lengths is uniform or normal? - I
mention this because it seems to me that if you are going to have
any statistical test, it will be based on randomizing numbers, and
you will need to consider an *appropriate* distribution for
randomiizing.

- Is it possible or feasible or a reasonable alternative to use the
exact lengths, instead of bins? I'm wondering if the distances
a better starting point for figuring multi-modality since it would not
throw out exact information. This might suggest some sort of
one-dimensional cluster analysis and may lend itself to testing
with asymptotic results, instead of Monte Carlo for each new
neuron based on its number of arms.

Hope this helps.

Rich Ulrich, Dec 17, 2011

3. ### Herman RubinGuest

I am assuming from what you have posted that the distribution
is continuous, and that you have disrcretized it.

Without making considerable assumptions about the smoothness
of the distribution, it is quite difficult to come up to
conclusions about the number of modes of a distribution.
This is because there are even infinitely-modal distributions,
or worse, in every neighborhood of a distribution.

Even with those assumptions, it is difficult. Does it matter?
Also, unless the modality is essentially obvious, it takes a
large sample to get a good grasp of it.

Herman Rubin, Dec 17, 2011
4. ### KerryGuest

Yes. Arms would be like branches of a tree and the origin of the
neuron would be like where the tree trunk meets the ground.
Kurtosis and skew varies widely for the total distribution, except
that they all appear platykurtic with max kurtosis values around 1.5.

I don't think there is a meaningful way to extract length information
from the exact values since it is the sum of them that I'm looking
for.

Thanks,
kbrownk

Kerry, Dec 17, 2011
5. ### KerryGuest

In a sense this is true, I could have used smaller bins, but in the
case of summing overall length, it is isn't necessarily more
informative to do so. Small enough bins and each bin would either have
a value of 0 or the size of the bin itself (say, 0.0001). Then, the
density of the values this creates provides a sense of length across
distance.

I think in my case, I can test how much 'multi-modality' changes with
bin sizes, but I need a measure for multi-modality to start with.

But in my case, I'm really just trying to understand how multi-modal
the values appear. If I were to draw a random line w/ multiple curves
in MS Paint, how multi-modal would it be? I'm trying to characterize
the line, not some underlying population(s). Perhaps multi-modal is an
incorrect term for it?

Thanks,
kbrownk

Kerry, Dec 17, 2011
6. ### KerryGuest

I'm sorry, I realize I'm being annoying confusing because I'm having
trouble with clarifying my own thoughts. If I had 2 populations of
neurons from different regions in the brain and they looked different
I could plot say their total length for all neurons in a histogram.
Here, the idea of modality is obvious and the tests I've seen for
modality would apply. In my case, I'm not sure it makes sense to
consider the profie of a single neuron to be composed of 'hidden'
populations. It is exactly as multimodal as it appears in a sense. At
the same time, I think calling the measure modality apply because some
bins contain more length than others and thus are places where the
majority of the neuron exists. So I think it makes sense to
characterize multi-modality by some continuous value range rather than
test for number of underlying populations.

kbrownk

Kerry, Dec 17, 2011
7. ### KerryGuest

Yes, just like a histogram.
The way I was thinking of it was, if there are 3 peaks, then the there
are at most 3 modes, however I do see how it could be potentially
infinite. It would be especially nice to not have to assume anything
about the underlying shape of those 3 potential modes but it seems I'd
have to assume something. So, let's say I want to test for normal
distributions where one would expect some level of separation between
modes. It'd be nice to test against other distributions too, but at
least testing with the assumption of normal modes gets me started.
What is considered a sample here? If I'm trying to characterize single
distributions then the sample is N=1 neuron. Are the data points
within a distribution what you mean by the sample?

Thanks,
kbrownk

Kerry, Dec 18, 2011
8. ### KerryGuest

If I were to use a typical mixture model method, at least to learn
about them while exploring the data I have, is an Expectation-
Maximization (EM) algorithm a good place to start? To keep it simple,
I'd like to test if there are 2 components (sub-populations) within my
overall distribution for each neuron's length across distance-from-
origin bins. As for sample size, which I assume refers to my number of
bins, I can easily increase number of bins, perhaps up to 100 bins.
Would this be adequate? More than 500 would be pushing it, and in
between 100 and 500 would be trade-offs in data accuracy.

So, is this a good starting place to explore my data and multi-
modality analyses together?

If so, how do I interpret a log-likelihood? I can get my head around p
values and <0.05 provides a standard for practical reasons (for better
or worse). Anything like this exist for the log-likelihood measured
using the EM Algorithm?

Thanks,
kbrownk

Kerry, Dec 18, 2011
9. ### Rich UlrichGuest

[snip, all but the end of Kerry citing Kerry]]

Well, this is something new, and leaves me puzzled again.
Are you interested in the sum of the arms, or their profile
for a neuron? - You skipped my question about how many
arms there are.
? I thought that was what you were proposing, and your other
comments did not disabuse me of that. Now I really don't know
what you have in mind.

If one neuron sends its arms to two separate regions, that would
be a reason to have two additional "distances." I assume that a
neuron also is located somewhere that lets it link to nearby neurons
that have related functionality. That would create three modes, if
the two distant ones were not at the same range.

Rich Ulrich, Dec 19, 2011
10. ### KerryGuest

I really appreciate your help. When I originally posted I thought I
had it all figured out as to what my issue was. I've written responses
below, but I feel like when someone is as confused as I now am, I need
to figure it out internally before wasting someone else's time and
effort, since I must be missing some very basic point that's keeping
me from providing clear, concise, and specific explanations of my
issue. Basically, I've had several people at work look at my data
(each neuron's length across distance), see that many of them look
strongly or weakly bimodal and suggest I find a way to test for
bimodality. I just cannot think of what such a test would tell me that
isn't already evident in the neuron's profile. Can I not see exactly
how bimodal they are visually? Any assumption about the distribution
below the peaks seems arbitrary. This is the question I need to figure
out. I greatly appreciate any continuing advice, but also understand
if it makes more sense for me to come back when I'm better equipped to
understand the purpose of modality testing and how it applies to my
own data.

Thanks again,
kbrownk
There's 125 to 675 arms per neuron, mean = 270.

Each arm can vary in length from 1 to 100 micrometer. The total length
of the neuron is of course the sum of the arm lengths, but I'm not
treating arms as the functional unit when summing length, since arms
can cross distance bins. The neurons have been digitally traced so
that each arm is divided in a number of compartments of varying
lengths. So, a distance bin sums all compartments pooled from all arms
within each distance bin. If the neuron was 2-dimensional, imagine
drawing a circle with a 50 micrometer radius around the neuron origin
and then another circle with a 100 micrometer radius with the same
center, and so on until the entire neuron is covered by the largest
circle. I then sum the length in between each circle (i.e. bin).
Summed length across bins gives a density profile of the entire neuron
collapsed to one dimension. This measures where most of the neuron is
located distance-wise. So, I'm not sure what 'exact values' would be
here. Compartment lengths (i.e. the compartments that make up each
arm) are not related meaningfully to the data. For instance, if the
human who created the compartments made them larger in one spot and
smaller (i.e. higher resolution) in another, it doesn't tell us
anything about the neuron, just what the tracer did.

I'm measuring the total length across distance for one neuron at a
time. Initially I was hoping to find a test that could test if, say,
half of the neurons were bimodal, one was trimodal, and the rest were
unimodal, likely with some confidence level. I just don't know what
type of test can provide results like this. Now I'm even wondering if
such a test would be more meaningful than just characterizing the
profile, such as by measuring the peak sizes (or more exactly,
measuring the trough sizes between them). Herman Rubin wrote in his
post that there is theoretically an infinite number of modes that
could sum to the overall distribution I see. Why would assuming one be
more informative than another? What more does it tell me than I can
already tell by looking at the profile?

My confusing example was if I got just total length as a scalar for
each neuron and then plotted the scalar lengths of all 68 neurons
here, just total neuron lengths). But please ignore the example since

All in all, I'm saying it feels more informative to characterize the
profiles rather than prove some hypothesis about underlying
distributions. That said, they're not exclusive and I'm very
interested in multi-modality testing methods. I'm just having trouble
understanding what they'd say about the data that I can't tell already
by looking at the given's neuron overall distribution. It seems it
would be more informative to just measure how big each lump is (the
actual measure would be how big each trough is relative to the area
under the overall distribution line).

Yes, this is a very good example for neurons. It so happens that the
neuron type I work on is unique in primarily contacting only one other
neuron. But it links to this neuron about 1500 times, climbing over it
like a vine. So one could still ask if there are seprate modes at
different distances along the neuron it contacts (i.e. high densities
areas of contacts). But can't I tell this just by looking at each
neurons length across distance profile? I could just measure the size
of those lumps by measuring the troughs between them. What information
is missing in this profile that some test could provide?

Thanks,
kbrownk

Kerry, Dec 19, 2011
11. ### KerryGuest

OK, I totally see what you mean by exact values now. If I have length
of 30 micrometers in distance bin1 and 100 in bin2, I could create a
vector with 30 1s and 100 2s etc. Is this right?

kbrownk

Kerry, Dec 19, 2011
12. ### KerryGuest

And now I think I knwo why I'm so troubled by the idea of testing for
underlying distributions. In the typical scenario, you have only a
sample of the true population. The true population may be 2 separate
populations and testing this in some way that use sampled data makes
sense. In my case, I have the true population. I have the complete
neuron for each case. So this is why I keep saying that it feels to me
like the modality is exactly what is shown by the length vs distance
plot for a given neuron. But this isn't to say there are separate
modes within that plot. It just seems like I can measure them directly
from the plot rather than try and extrapolate some arbitrary
underlying distributions. Does this make sense?

Thanks,
kbrownk

Kerry, Dec 19, 2011
13. ### KerryGuest

Dear Mr. Ulrich,

I'm not sure if you're still checking my posts in this thread, but
just in case I wanted to update where I'm at with the analysis plans
and continuing issues. I have come to the conclusion that it makes
sense to test for multi-modality in my distributions of length across
distances (one sample distribution for each of my 68 neurons). The
only test I've seen used in my field to test for multi-modality is
Hartigan's Dip Test. Each of my 68 neuron distributions were rejected
for uni-modality at p<0.04 using the Dip Test (68/68 being significant
makes me somewhat concerned. Klomogorv-Smirnov result in the same
conclusions, but that is limited confirmation). As far as I can tell,
the Dip Test does not tell you what the underlying distributions are
that would best fit the distribution, so I wanted to follow up by
using some test to find the number of modes and their parameters
(mean, standard deviation, height etc.). Each neuron has 2 types of
branches based on their appearance, and since they are mostly
separable by distance, our hypothesis is that we will typically find 2
modes with means, heights, and standard deviations similar to the
distributions for each of the branch types when separated.

Of course, such an analysis requires that I choose an appropriate test
to find the modes and define the distribution type I think will
provide the best fit. For the distribution type, I plan to check at
least Gaussian and Gamma distributions. For the analysis, I'm looking
into the Expectation-Maximization algorithm. R has a library package
called mixtools that provides the analysis, which I am still
struggling to understand. I'm hoping to check each neuron distribution
for 1 to k modes, and base the determined # modes where increasing #
modes provides diminishing returns in increasing the fit (based on log
likelihood, though I'm still trying to understand how this value is
determined and what it means).

I can't help but to feel I'm in over my head here, but I haven't found
a simpler solution for characterizing multi-modality. Does the plan I
have seem appropriate? Any advice is appreciated as always.

Sincerely,
kbrownk

Kerry, Dec 22, 2011
14. ### Rich UlrichGuest

I would say, if you have two "types" by appearance, it makes
sense to me to test whether that appearance is characterized by
different means. Ta-Da! - a systematic difference shows that
there are, indeed, differences.

Next - How smooth are the distributions for each Type?

because I never had data that even suggested that approach.
And those problems never grabbed my curiosity.

I don't see what you are aiming at, in particular. But I have
noticed that Herman and others, over the years, have never led
me to think that they easily offer firm answers, especially if you
don't have prior knowledge of the number of components.

There are still regular readers of the sci.stat.* groups, and
maybe some of the others will have something to add.

Rich Ulrich, Dec 23, 2011
15. ### KerryGuest

I'm not sure how you know the means without assuming something about
the underlying distributions. So, I'd have to fit the data with
separate distributions based on what combination of distributions
minimize error. This would be relatively simple (I assume) if I were
just fitting one distribution, and I believe this is what Maximum
Likelihood Estimation is for, which is another process I'm trying to
figure out. Fitting more than distribution I'd guess requires some
heuristic process given the number of possible combinations of
distributions. My sense is that this is what Expectation-Maximization
(EM) is used for, but I'm very open to less complex methods that will
do a similar job.

By appearance, they all have at least two local peaks and some have 3
or 4. Basically, any time the slope goes from negative to positive to
negative across distance, I have a local peak. I could characterize
the data by fitting each peak with its own distribution. But if I find
that the increase in error for reducing the number of distributions is
minimal than explanatory power increases with little cost. I believe
most of these neurons can be approximated to having only 2 modes
without increases in cost. If so, I have a neat explanation which is
based on the two different branch (arm) types I mentioned. Those
branch types were defined on completely different properties unrelated
to distance, so this would be an informative result. But I need a way
to measure cost here. The log-likelihood value returned by the EM
algorithm appears to serve this purpose, but I still have to try and
comprehend how it is evaluated.
What I'm after is not just how smooth they are but how much each
contributes to the separate modes found for the neuron as a whole. We
are trying to simplify the characterization of the neuron. So, if we
find that the two branch types can generally be found just by looking
at the distribution of the total neuron across distance and fitting it
with 2 separate e.g., Gaussian distributions, then future researchers
who don't have the means to separate each neuron by branch type can
estimate their locations. This is why I need to find the best fitting
distributions.

Do you know any methods besides mixture models that might accomplish
this?

Thanks,
kbrownk

Kerry, Dec 23, 2011
16. ### Rich UlrichGuest

I don't know what you are trying to accomplish, but I'm getting
the impression that you are more eager to play with some E-M
algorithm than figure out what you can actually say about your
neurons.

If there are two types, visually, not related to distance, it surely
seems to me that the starting point is to characterize the two
visual types. I don't understand what keeps you from starting
there.

You have onlly 10 bins, and you think you see 4 modes?

I believe that you must be looking at a lot of random variation,
up and down by 20 or 30% *randomly*, in the middle of a
uniform spread. If there are 50 neurons counted in a bin, the
CI for the variation is about 30% in either direction.

Ten bins sounded rather few to me, from the start, for distinguishing
even 3 modes, but it is surely too few for four. And if the distnaces
tend to have gaussian distributions around whatever means may
exist, if there are true modes, then means need to be more than
one SD apart in order to have any "dip" expected at all.

Change my queston there. If you separate the two types, Do the
distributions for the two types, separately, look less multi-modal?
Is it as easy to distinguish "type" as to measure distances?

Rich Ulrich, Dec 23, 2011
17. ### KerryGuest

What I have in mind seems so simple and yet how to get it is escaping
me. What it comes down to is that I need a way to characterize the
overall distribution because some people simply won't care about the
branch types. So forget the branch types. Here's a scenario: I say the
overall distribution (distribution = length vs path distance for a
specific individual neuron) is bimodal, and someone else is going to
want that quantified. If I go to a conference and someone asks "How
are these neurons distributed?" I want to be able to answer with the
number of modes and their parameters. You mention that seeing 4 modes
within 10 bins seems suspect. I agree. I need a way to quantify which
modes are non-negligible in a sense and which can be ignored. The more
modes I assume exist, the better fit I will get since I can fit each
mode with it's own distribution. But some exta modes may be so
negligible that the extra distribution doesn't change the goddness of
fit by much. This seems like a good way to choose the number of modes.

I have and can continue to play with bin sizes. Ultimately I will test
whatever measures I use again a range of bin sizes. But solving
binning issues leaves the quesion of how to quantify modes un-
anwererd. Quantifying bi-modality must be fraught with issues since I
can't find a common alorithm for doing so. I'd like to say something
like "If the reader is looking for an easy way to capture what the
distribution is, one can imagine two Gaussian distributions of
different means and standard deviations, where the 1st Gaussian
captures X amount of the data while the 2nd Gaussian captures 1-X
amount. These parameters provide a better fit than using any other
Gaussian parameters. We also tried Gamma distributions but the fit was
not as good. This is determined by looking at the goodness of fit
value Z." The issue is I don't have a way to test the required number
possible of distribution combinations, nor do I have a way to quantify
'goodness of fit'.

If later on I find that branch types relate to the modes, then that's
a bonus.

summarizes I think the central issue.

No, I'd actually love to avoid the E-M algorithm or anything else
having to do w/ likelihood. Least sum of squares seems more obvious at
least to me for quantifying distribution fits. I'm trying to simplify
the story as much as possible. I started with plots of length across
distance. These plots have bumps and dips, with some being less severe
than others. My hypothesis is that in most cases, only 2 of these
bumps are severe, but 2 severe modes means that it would be
inappropriate to treat the data as unimodal. I need a way to quantify
all this. My starting assumption was that I'd need to assume some
distribution type (Gaussian, etc.), assume some # of modal
distributions to fit (1 for unimodal, 2 for bimodal), and maximize the
parameters of those distributions to minimize error between the
predicted values and the observations. So I'm actually looking for the
simplest method that can optimize and return some error value. I'm
absolutely ready to scrap the notion that E-M is the simplest method
for this, especially since I don't understand it anyway.

I deleted an explanation I wrote as to why I don't think this is
appropriate b/c it was detail heavy and I don't want to burden you
with an explanation which has to do with the field and the subject
matter more than anything statistical. I can elaborate if you want,
but it may be easier to just to forget the branch types. Simply put, I
need to start w/ a simple overall description of the neuron and then I
can work my way down to details like different branch types. I need a
way to explain how entire neuron distributes length across distance. I
see the possibility of modes and I want to quantify that.

I think 'mode' here is subjective. At the most, there is as many modes
as there are peaks, regardless of how severe. I'm looking for a way to
quantify how many modes can be ignored when fitting the data w/ the
best distribution or combo of sub-distributions. My hypothesis is that
there will be a huge difference in residual error b/t fitting w/ one
distribution vs. fitting with 2. There will be very little decrease in
error when fitting with more than 2 distributions. If this turns out
to be the case, it would be a nice additional find that the 2 branch
types fits have similar parameters to the 2 modes found in the overall
distribution but that is a bonus story.

I'm open to other ways to tell the story, but this approach makes a
lot of sense to me. I think it's the complexity of finding a way to
fit the data by minimizing the error that is making this confusing. I
know exactly what I'm looking for. If I had enough time, I would
manually try out a million different distribution combinations to
reduce residual error in the overall observed distribution. This is
impractical, so I need an automated way to do this. Nonlinear
regression works for fitting one distribution. Is there a name for a
method that can try and fit two or more additive sub-distributions
together, rather than assuming a single distribution of some type best
fits the data? That's what I need!

Recall that I only have one neuron per distribution. So if I only had
data from 1 neuron, I would still have the same length vs distance
distribution for it. I have 68 neurons, so I have 68 distributions
that have nothing to do with each other. I am not pooling their values
together. The reasons for not doing so are complicated.
Yes but how much less multimodal is left un-quantified unitl I have a
method to fit the overall distribution

There is overlap but mean distance b/t both branch type is
significantly different

Thanks,
kbrownk

Kerry, Dec 23, 2011
18. ### KerryGuest

An update on my final decisions: I ended up skipping any fancy
algorithms and just am using Excel Solver to reduce least squares for
two mixed Gaussians (i.e. Solver manipulates two means, two std devs
and and mixing function that changes the proportion of the 2 Gaussian
functions to the whole). I then convert the least squares to a
nonlinear R squared value to give a sense of fit quality. I also
checked other distribution types to see if they fit better but they
did not. I then separate the 2 branch types I have and fit them each
with a Gaussian. Then I compare meand and std dev of each branch type
Gaussian to the 2 modes created in the mixed function. Each branch
type distribution is very similar to one of the 2 respective modes in
the mixed distribution suggesting they are capturing the same thing.
So this is my story. I am comfortable with this result because it
emphasizes using one of the 'standard' dsitribution types as a tool to
bring out some non-obvious findings.

Thanks for bearing with me through my mess of logic, I learned a lot!
kbrownk

Kerry, Dec 31, 2011