Multi-modality: Is this a good measure for it?

Discussion in 'Scientific Statistics Math' started by Kerry, Dec 17, 2011.

  1. Kerry

    Kerry Guest


    I have 68 samples of neurons where I am plotting their length
    distributed across distance from the neuron origin, where distance is
    binned into tenths. Looking at the shape of these plots, many of the
    neurons look bimodal or trimodal to me. I want to see if anyone has
    some feedback for the following method for quantifying multimodality:
    For each sample neuron, if I connect the length value for each bin to
    the next bin with a line I get a spatial profile I'll call the 'value
    line'. So I can measure the area above and beneath the value line and
    take the ratio between those areas. Beneath the line is obvious. By
    above the line, I mean wherever there are 2 local peaks with a trough
    between them (i.e. whenever the slope goes from positive to
    negative), I would measure between a line connecting those peaks and
    the lower boundary provided by the value line (i.e. the area of
    trough). The area would be measured here as a polygon where the number
    of sides depends on the number of bins between the peaks. Then I'd sum
    all of these local areas to get the total 'above the value line' area
    and divide by the area underneath the value line to get a continuous
    value for multi-modality that I can compare across neurons, Here,
    larger values mean more multi-modal. I like this idea because it
    doesn't assume normality. Does this method make sense? Is there a term
    use for it? Does anyone know of any references that use this method (I
    don't care if the subject matter relates)?

    Kerry, Dec 17, 2011
    1. Advertisements

  2. Kerry

    Rich Ulrich Guest

    We might have some ideas, but I think that I speak for most of
    your readers in saying that we don't know much about neurons,
    and we need more explanation of the problem.

    I can say what I am guessing, and you might work from there, if
    it is not too far off.

    I think you are saying that you have 68 neurons. Each neuron has
    dozens or hundreds of "arms" which you are measuring. The measurement
    question is addressed to assessing multimodality for measures for a
    single neuron.

    You are measuring the lengths and "binning" the results "into tenths".
    - Does that mean that take the minimum and maximum for a neuron, and
    divide the range into 10 intervals? (How many measures are there?)
    - Does the overall pattern of lengths look like it is Normallly
    distributed around a mean, or uniformly distributed, or (perhaps)
    distributed where the log of the lengths is uniform or normal? - I
    mention this because it seems to me that if you are going to have
    any statistical test, it will be based on randomizing numbers, and
    you will need to consider an *appropriate* distribution for

    - Is it possible or feasible or a reasonable alternative to use the
    exact lengths, instead of bins? I'm wondering if the distances
    computed between adjacent or near-adjacent lengths might be
    a better starting point for figuring multi-modality since it would not
    throw out exact information. This might suggest some sort of
    one-dimensional cluster analysis and may lend itself to testing
    with asymptotic results, instead of Monte Carlo for each new
    neuron based on its number of arms.

    Hope this helps.
    Rich Ulrich, Dec 17, 2011
    1. Advertisements

  3. Kerry

    Herman Rubin Guest

    I am assuming from what you have posted that the distribution
    is continuous, and that you have disrcretized it.

    Without making considerable assumptions about the smoothness
    of the distribution, it is quite difficult to come up to
    conclusions about the number of modes of a distribution.
    This is because there are even infinitely-modal distributions,
    or worse, in every neighborhood of a distribution.

    Even with those assumptions, it is difficult. Does it matter?
    Also, unless the modality is essentially obvious, it takes a
    large sample to get a good grasp of it.
    Herman Rubin, Dec 17, 2011
  4. Kerry

    Kerry Guest

    Yes. Arms would be like branches of a tree and the origin of the
    neuron would be like where the tree trunk meets the ground.
    Kurtosis and skew varies widely for the total distribution, except
    that they all appear platykurtic with max kurtosis values around 1.5.

    I don't think there is a meaningful way to extract length information
    from the exact values since it is the sum of them that I'm looking

    Kerry, Dec 17, 2011
  5. Kerry

    Kerry Guest

    In a sense this is true, I could have used smaller bins, but in the
    case of summing overall length, it is isn't necessarily more
    informative to do so. Small enough bins and each bin would either have
    a value of 0 or the size of the bin itself (say, 0.0001). Then, the
    density of the values this creates provides a sense of length across

    I think in my case, I can test how much 'multi-modality' changes with
    bin sizes, but I need a measure for multi-modality to start with.

    But in my case, I'm really just trying to understand how multi-modal
    the values appear. If I were to draw a random line w/ multiple curves
    in MS Paint, how multi-modal would it be? I'm trying to characterize
    the line, not some underlying population(s). Perhaps multi-modal is an
    incorrect term for it?

    Kerry, Dec 17, 2011
  6. Kerry

    Kerry Guest

    I'm sorry, I realize I'm being annoying confusing because I'm having
    trouble with clarifying my own thoughts. If I had 2 populations of
    neurons from different regions in the brain and they looked different
    I could plot say their total length for all neurons in a histogram.
    Here, the idea of modality is obvious and the tests I've seen for
    modality would apply. In my case, I'm not sure it makes sense to
    consider the profie of a single neuron to be composed of 'hidden'
    populations. It is exactly as multimodal as it appears in a sense. At
    the same time, I think calling the measure modality apply because some
    bins contain more length than others and thus are places where the
    majority of the neuron exists. So I think it makes sense to
    characterize multi-modality by some continuous value range rather than
    test for number of underlying populations.

    Kerry, Dec 17, 2011
  7. Kerry

    Kerry Guest

    Yes, just like a histogram.
    The way I was thinking of it was, if there are 3 peaks, then the there
    are at most 3 modes, however I do see how it could be potentially
    infinite. It would be especially nice to not have to assume anything
    about the underlying shape of those 3 potential modes but it seems I'd
    have to assume something. So, let's say I want to test for normal
    distributions where one would expect some level of separation between
    modes. It'd be nice to test against other distributions too, but at
    least testing with the assumption of normal modes gets me started.
    What is considered a sample here? If I'm trying to characterize single
    distributions then the sample is N=1 neuron. Are the data points
    within a distribution what you mean by the sample?

    Kerry, Dec 18, 2011
  8. Kerry

    Kerry Guest

    If I were to use a typical mixture model method, at least to learn
    about them while exploring the data I have, is an Expectation-
    Maximization (EM) algorithm a good place to start? To keep it simple,
    I'd like to test if there are 2 components (sub-populations) within my
    overall distribution for each neuron's length across distance-from-
    origin bins. As for sample size, which I assume refers to my number of
    bins, I can easily increase number of bins, perhaps up to 100 bins.
    Would this be adequate? More than 500 would be pushing it, and in
    between 100 and 500 would be trade-offs in data accuracy.

    So, is this a good starting place to explore my data and multi-
    modality analyses together?

    If so, how do I interpret a log-likelihood? I can get my head around p
    values and <0.05 provides a standard for practical reasons (for better
    or worse). Anything like this exist for the log-likelihood measured
    using the EM Algorithm?

    Kerry, Dec 18, 2011
  9. Kerry

    Rich Ulrich Guest

    [snip, all but the end of Kerry citing Kerry]]

    Well, this is something new, and leaves me puzzled again.
    Are you interested in the sum of the arms, or their profile
    for a neuron? - You skipped my question about how many
    arms there are.
    ? I thought that was what you were proposing, and your other
    comments did not disabuse me of that. Now I really don't know
    what you have in mind.

    If one neuron sends its arms to two separate regions, that would
    be a reason to have two additional "distances." I assume that a
    neuron also is located somewhere that lets it link to nearby neurons
    that have related functionality. That would create three modes, if
    the two distant ones were not at the same range.
    Rich Ulrich, Dec 19, 2011
  10. Kerry

    Kerry Guest

    I really appreciate your help. When I originally posted I thought I
    had it all figured out as to what my issue was. I've written responses
    below, but I feel like when someone is as confused as I now am, I need
    to figure it out internally before wasting someone else's time and
    effort, since I must be missing some very basic point that's keeping
    me from providing clear, concise, and specific explanations of my
    issue. Basically, I've had several people at work look at my data
    (each neuron's length across distance), see that many of them look
    strongly or weakly bimodal and suggest I find a way to test for
    bimodality. I just cannot think of what such a test would tell me that
    isn't already evident in the neuron's profile. Can I not see exactly
    how bimodal they are visually? Any assumption about the distribution
    below the peaks seems arbitrary. This is the question I need to figure
    out. I greatly appreciate any continuing advice, but also understand
    if it makes more sense for me to come back when I'm better equipped to
    understand the purpose of modality testing and how it applies to my
    own data.

    Thanks again,
    There's 125 to 675 arms per neuron, mean = 270.

    Each arm can vary in length from 1 to 100 micrometer. The total length
    of the neuron is of course the sum of the arm lengths, but I'm not
    treating arms as the functional unit when summing length, since arms
    can cross distance bins. The neurons have been digitally traced so
    that each arm is divided in a number of compartments of varying
    lengths. So, a distance bin sums all compartments pooled from all arms
    within each distance bin. If the neuron was 2-dimensional, imagine
    drawing a circle with a 50 micrometer radius around the neuron origin
    and then another circle with a 100 micrometer radius with the same
    center, and so on until the entire neuron is covered by the largest
    circle. I then sum the length in between each circle (i.e. bin).
    Summed length across bins gives a density profile of the entire neuron
    collapsed to one dimension. This measures where most of the neuron is
    located distance-wise. So, I'm not sure what 'exact values' would be
    here. Compartment lengths (i.e. the compartments that make up each
    arm) are not related meaningfully to the data. For instance, if the
    human who created the compartments made them larger in one spot and
    smaller (i.e. higher resolution) in another, it doesn't tell us
    anything about the neuron, just what the tracer did.

    I'm measuring the total length across distance for one neuron at a
    time. Initially I was hoping to find a test that could test if, say,
    half of the neurons were bimodal, one was trimodal, and the rest were
    unimodal, likely with some confidence level. I just don't know what
    type of test can provide results like this. Now I'm even wondering if
    such a test would be more meaningful than just characterizing the
    profile, such as by measuring the peak sizes (or more exactly,
    measuring the trough sizes between them). Herman Rubin wrote in his
    post that there is theoretically an infinite number of modes that
    could sum to the overall distribution I see. Why would assuming one be
    more informative than another? What more does it tell me than I can
    already tell by looking at the profile?

    My confusing example was if I got just total length as a scalar for
    each neuron and then plotted the scalar lengths of all 68 neurons
    together to create a histogram (I'm asking nothing about distance
    here, just total neuron lengths). But please ignore the example since
    it wasn't helpful.

    All in all, I'm saying it feels more informative to characterize the
    profiles rather than prove some hypothesis about underlying
    distributions. That said, they're not exclusive and I'm very
    interested in multi-modality testing methods. I'm just having trouble
    understanding what they'd say about the data that I can't tell already
    by looking at the given's neuron overall distribution. It seems it
    would be more informative to just measure how big each lump is (the
    actual measure would be how big each trough is relative to the area
    under the overall distribution line).

    Yes, this is a very good example for neurons. It so happens that the
    neuron type I work on is unique in primarily contacting only one other
    neuron. But it links to this neuron about 1500 times, climbing over it
    like a vine. So one could still ask if there are seprate modes at
    different distances along the neuron it contacts (i.e. high densities
    areas of contacts). But can't I tell this just by looking at each
    neurons length across distance profile? I could just measure the size
    of those lumps by measuring the troughs between them. What information
    is missing in this profile that some test could provide?

    Kerry, Dec 19, 2011
  11. Kerry

    Kerry Guest

    OK, I totally see what you mean by exact values now. If I have length
    of 30 micrometers in distance bin1 and 100 in bin2, I could create a
    vector with 30 1s and 100 2s etc. Is this right?

    Kerry, Dec 19, 2011
  12. Kerry

    Kerry Guest

    And now I think I knwo why I'm so troubled by the idea of testing for
    underlying distributions. In the typical scenario, you have only a
    sample of the true population. The true population may be 2 separate
    populations and testing this in some way that use sampled data makes
    sense. In my case, I have the true population. I have the complete
    neuron for each case. So this is why I keep saying that it feels to me
    like the modality is exactly what is shown by the length vs distance
    plot for a given neuron. But this isn't to say there are separate
    modes within that plot. It just seems like I can measure them directly
    from the plot rather than try and extrapolate some arbitrary
    underlying distributions. Does this make sense?

    Kerry, Dec 19, 2011
  13. Kerry

    Kerry Guest

    Dear Mr. Ulrich,

    I'm not sure if you're still checking my posts in this thread, but
    just in case I wanted to update where I'm at with the analysis plans
    and continuing issues. I have come to the conclusion that it makes
    sense to test for multi-modality in my distributions of length across
    distances (one sample distribution for each of my 68 neurons). The
    only test I've seen used in my field to test for multi-modality is
    Hartigan's Dip Test. Each of my 68 neuron distributions were rejected
    for uni-modality at p<0.04 using the Dip Test (68/68 being significant
    makes me somewhat concerned. Klomogorv-Smirnov result in the same
    conclusions, but that is limited confirmation). As far as I can tell,
    the Dip Test does not tell you what the underlying distributions are
    that would best fit the distribution, so I wanted to follow up by
    using some test to find the number of modes and their parameters
    (mean, standard deviation, height etc.). Each neuron has 2 types of
    branches based on their appearance, and since they are mostly
    separable by distance, our hypothesis is that we will typically find 2
    modes with means, heights, and standard deviations similar to the
    distributions for each of the branch types when separated.

    Of course, such an analysis requires that I choose an appropriate test
    to find the modes and define the distribution type I think will
    provide the best fit. For the distribution type, I plan to check at
    least Gaussian and Gamma distributions. For the analysis, I'm looking
    into the Expectation-Maximization algorithm. R has a library package
    called mixtools that provides the analysis, which I am still
    struggling to understand. I'm hoping to check each neuron distribution
    for 1 to k modes, and base the determined # modes where increasing #
    modes provides diminishing returns in increasing the fit (based on log
    likelihood, though I'm still trying to understand how this value is
    determined and what it means).

    I can't help but to feel I'm in over my head here, but I haven't found
    a simpler solution for characterizing multi-modality. Does the plan I
    have seem appropriate? Any advice is appreciated as always.

    Thank you for your consideration and advice.

    Kerry, Dec 22, 2011
  14. Kerry

    Rich Ulrich Guest

    I would say, if you have two "types" by appearance, it makes
    sense to me to test whether that appearance is characterized by
    different means. Ta-Da! - a systematic difference shows that
    there are, indeed, differences.

    Next - How smooth are the distributions for each Type?

    I've never had much to say about "mixture models" for densities,
    because I never had data that even suggested that approach.
    And those problems never grabbed my curiosity.

    I don't see what you are aiming at, in particular. But I have
    noticed that Herman and others, over the years, have never led
    me to think that they easily offer firm answers, especially if you
    don't have prior knowledge of the number of components.

    There are still regular readers of the sci.stat.* groups, and
    maybe some of the others will have something to add.
    Rich Ulrich, Dec 23, 2011
  15. Kerry

    Kerry Guest

    I'm not sure how you know the means without assuming something about
    the underlying distributions. So, I'd have to fit the data with
    separate distributions based on what combination of distributions
    minimize error. This would be relatively simple (I assume) if I were
    just fitting one distribution, and I believe this is what Maximum
    Likelihood Estimation is for, which is another process I'm trying to
    figure out. Fitting more than distribution I'd guess requires some
    heuristic process given the number of possible combinations of
    distributions. My sense is that this is what Expectation-Maximization
    (EM) is used for, but I'm very open to less complex methods that will
    do a similar job.

    By appearance, they all have at least two local peaks and some have 3
    or 4. Basically, any time the slope goes from negative to positive to
    negative across distance, I have a local peak. I could characterize
    the data by fitting each peak with its own distribution. But if I find
    that the increase in error for reducing the number of distributions is
    minimal than explanatory power increases with little cost. I believe
    most of these neurons can be approximated to having only 2 modes
    without increases in cost. If so, I have a neat explanation which is
    based on the two different branch (arm) types I mentioned. Those
    branch types were defined on completely different properties unrelated
    to distance, so this would be an informative result. But I need a way
    to measure cost here. The log-likelihood value returned by the EM
    algorithm appears to serve this purpose, but I still have to try and
    comprehend how it is evaluated.
    What I'm after is not just how smooth they are but how much each
    contributes to the separate modes found for the neuron as a whole. We
    are trying to simplify the characterization of the neuron. So, if we
    find that the two branch types can generally be found just by looking
    at the distribution of the total neuron across distance and fitting it
    with 2 separate e.g., Gaussian distributions, then future researchers
    who don't have the means to separate each neuron by branch type can
    estimate their locations. This is why I need to find the best fitting

    Do you know any methods besides mixture models that might accomplish

    Kerry, Dec 23, 2011
  16. Kerry

    Rich Ulrich Guest

    I don't know what you are trying to accomplish, but I'm getting
    the impression that you are more eager to play with some E-M
    algorithm than figure out what you can actually say about your

    If there are two types, visually, not related to distance, it surely
    seems to me that the starting point is to characterize the two
    visual types. I don't understand what keeps you from starting

    You have onlly 10 bins, and you think you see 4 modes?

    I believe that you must be looking at a lot of random variation,
    up and down by 20 or 30% *randomly*, in the middle of a
    uniform spread. If there are 50 neurons counted in a bin, the
    CI for the variation is about 30% in either direction.

    Ten bins sounded rather few to me, from the start, for distinguishing
    even 3 modes, but it is surely too few for four. And if the distnaces
    tend to have gaussian distributions around whatever means may
    exist, if there are true modes, then means need to be more than
    one SD apart in order to have any "dip" expected at all.

    Change my queston there. If you separate the two types, Do the
    distributions for the two types, separately, look less multi-modal?
    Is it as easy to distinguish "type" as to measure distances?
    Rich Ulrich, Dec 23, 2011
  17. Kerry

    Kerry Guest

    What I have in mind seems so simple and yet how to get it is escaping
    me. What it comes down to is that I need a way to characterize the
    overall distribution because some people simply won't care about the
    branch types. So forget the branch types. Here's a scenario: I say the
    overall distribution (distribution = length vs path distance for a
    specific individual neuron) is bimodal, and someone else is going to
    want that quantified. If I go to a conference and someone asks "How
    are these neurons distributed?" I want to be able to answer with the
    number of modes and their parameters. You mention that seeing 4 modes
    within 10 bins seems suspect. I agree. I need a way to quantify which
    modes are non-negligible in a sense and which can be ignored. The more
    modes I assume exist, the better fit I will get since I can fit each
    mode with it's own distribution. But some exta modes may be so
    negligible that the extra distribution doesn't change the goddness of
    fit by much. This seems like a good way to choose the number of modes.

    I have and can continue to play with bin sizes. Ultimately I will test
    whatever measures I use again a range of bin sizes. But solving
    binning issues leaves the quesion of how to quantify modes un-
    anwererd. Quantifying bi-modality must be fraught with issues since I
    can't find a common alorithm for doing so. I'd like to say something
    like "If the reader is looking for an easy way to capture what the
    distribution is, one can imagine two Gaussian distributions of
    different means and standard deviations, where the 1st Gaussian
    captures X amount of the data while the 2nd Gaussian captures 1-X
    amount. These parameters provide a better fit than using any other
    Gaussian parameters. We also tried Gamma distributions but the fit was
    not as good. This is determined by looking at the goodness of fit
    value Z." The issue is I don't have a way to test the required number
    possible of distribution combinations, nor do I have a way to quantify
    'goodness of fit'.

    If later on I find that branch types relate to the modes, then that's
    a bonus.

    I answer specifically some of your points below, but the above
    summarizes I think the central issue.

    No, I'd actually love to avoid the E-M algorithm or anything else
    having to do w/ likelihood. Least sum of squares seems more obvious at
    least to me for quantifying distribution fits. I'm trying to simplify
    the story as much as possible. I started with plots of length across
    distance. These plots have bumps and dips, with some being less severe
    than others. My hypothesis is that in most cases, only 2 of these
    bumps are severe, but 2 severe modes means that it would be
    inappropriate to treat the data as unimodal. I need a way to quantify
    all this. My starting assumption was that I'd need to assume some
    distribution type (Gaussian, etc.), assume some # of modal
    distributions to fit (1 for unimodal, 2 for bimodal), and maximize the
    parameters of those distributions to minimize error between the
    predicted values and the observations. So I'm actually looking for the
    simplest method that can optimize and return some error value. I'm
    absolutely ready to scrap the notion that E-M is the simplest method
    for this, especially since I don't understand it anyway.

    I deleted an explanation I wrote as to why I don't think this is
    appropriate b/c it was detail heavy and I don't want to burden you
    with an explanation which has to do with the field and the subject
    matter more than anything statistical. I can elaborate if you want,
    but it may be easier to just to forget the branch types. Simply put, I
    need to start w/ a simple overall description of the neuron and then I
    can work my way down to details like different branch types. I need a
    way to explain how entire neuron distributes length across distance. I
    see the possibility of modes and I want to quantify that.

    I think 'mode' here is subjective. At the most, there is as many modes
    as there are peaks, regardless of how severe. I'm looking for a way to
    quantify how many modes can be ignored when fitting the data w/ the
    best distribution or combo of sub-distributions. My hypothesis is that
    there will be a huge difference in residual error b/t fitting w/ one
    distribution vs. fitting with 2. There will be very little decrease in
    error when fitting with more than 2 distributions. If this turns out
    to be the case, it would be a nice additional find that the 2 branch
    types fits have similar parameters to the 2 modes found in the overall
    distribution but that is a bonus story.

    I'm open to other ways to tell the story, but this approach makes a
    lot of sense to me. I think it's the complexity of finding a way to
    fit the data by minimizing the error that is making this confusing. I
    know exactly what I'm looking for. If I had enough time, I would
    manually try out a million different distribution combinations to
    reduce residual error in the overall observed distribution. This is
    impractical, so I need an automated way to do this. Nonlinear
    regression works for fitting one distribution. Is there a name for a
    method that can try and fit two or more additive sub-distributions
    together, rather than assuming a single distribution of some type best
    fits the data? That's what I need!

    Recall that I only have one neuron per distribution. So if I only had
    data from 1 neuron, I would still have the same length vs distance
    distribution for it. I have 68 neurons, so I have 68 distributions
    that have nothing to do with each other. I am not pooling their values
    together. The reasons for not doing so are complicated.
    Yes but how much less multimodal is left un-quantified unitl I have a
    method to fit the overall distribution

    There is overlap but mean distance b/t both branch type is
    significantly different

    Kerry, Dec 23, 2011
  18. Kerry

    Kerry Guest

    An update on my final decisions: I ended up skipping any fancy
    algorithms and just am using Excel Solver to reduce least squares for
    two mixed Gaussians (i.e. Solver manipulates two means, two std devs
    and and mixing function that changes the proportion of the 2 Gaussian
    functions to the whole). I then convert the least squares to a
    nonlinear R squared value to give a sense of fit quality. I also
    checked other distribution types to see if they fit better but they
    did not. I then separate the 2 branch types I have and fit them each
    with a Gaussian. Then I compare meand and std dev of each branch type
    Gaussian to the 2 modes created in the mixed function. Each branch
    type distribution is very similar to one of the 2 respective modes in
    the mixed distribution suggesting they are capturing the same thing.
    So this is my story. I am comfortable with this result because it
    emphasizes using one of the 'standard' dsitribution types as a tool to
    bring out some non-obvious findings.

    Thanks for bearing with me through my mess of logic, I learned a lot!
    Kerry, Dec 31, 2011
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.