Correlation: In Memory of My Former Academic Colleagues

Discussion in 'Scientific Statistics Math' started by Reef Fish, Jun 15, 2005.

  1. Reef Fish

    Reef Fish Guest

    William Kruskal, an authority on theoretical statistics who helped
    the U.S. government bring statistical methods to bear on public
    policy issues, died of pneumonia on Thursday, April 21, at Bernard
    Mitchell Hospital in Chicago. He was 85.

    I learned of this news from the current (June 2005) issue of AMSTAT
    News I just received. The full text of the Obiturary in AMSTAT News
    is contained in the 5/5/2005 University of Chicago Press Release:

    Bill was a former colleague of mine at the University of Chicago during
    1970-1975 when I taught at the Grad School of Business while Bill was
    the Chairman of the Statistics department; and during 1982-3 when I
    a Visiting Full Professor at UC. Two of my graduate courses at the
    Data Analysis, and Applied Multivariate Analysis, were jointly listed
    both departments and drew some of the more Applications oriented
    students from the Statistics Department.

    I know Bill's brother Joe Kruskal (Fellow of the ASA, former President
    of the Psychometrics Society, best known for his pioneer work on
    dimensional Scaling) much better than I knew Bill, but I had many
    memorable encounters with this remarkable Giant of the Statistical
    Profession. Bill was not a giant in physical statue, and for some
    unknown reason, every time I saw any of the recent movies starred by
    shorter of the two stars (and there sere only two actors in the movie,
    sitting across the table talking for the entire time of the 3-hr movie)

    in "Dinner with Andre", he reminded me of Bill Kruskal. I think Bill's

    male-pattern baldness and his manner of mild speech had more to do
    with my association of them than their actual physical resemblance
    as in a "celebrity look alike" context.

    Among many of Bill's attributes that accounted for his enormous
    successes (many of which are listed in his Obituary), Bill was a
    "politician" (used in the complimentary rather than a derogatory sense)
    -- the extreme direct opposite of my own style, as readers well know.
    See excerpt from my book review below.

    It was in this respect that I had one of my memorable academic
    encounters, on the subject of CORRELATION, in 1982, when my book review
    article appeared, (JASA 1982, 489-491) about "Correlation and
    written by a social scientist, inadequately trained in the subject of
    Statistics, advancing the quackery that by merely drawing "path
    diagrams" for causation, one can draw causal inference from
    correlational data without any controlled or designed experiments.

    After documenting the technical (statistical) flaws in the book,
    and debunking the path-diagram fallacy, I concluded my review with:

    *> "I am less perturbed by the poor substantive quality of this
    *> book than by the fact that we are witnessing the emergence of
    *> a subculture of economists and social scientists, who are no
    *> more qualified or equipped to practice statistics than law
    *> law or medicine, yet who nonetheless do practice it among
    *> their circles of nonstatisticians, without much visible signs
    *> of protest from the community of statisticians. I feel
    *> obliged to register my strongest protest against this type of
    *> malpractice, fostered by the title and content of this book."

    This was after I toned down my wording CONSIDERABLY when the editor
    of JASA was getting a bit nervous, from the possibility of lawsuits
    in our litigous society, because I had also used these terms to
    describe those mal-practitioners -- "quacks" and "black magic" --
    and those terms DID stay in the review. :)

    Readers may or may not be surprised that I had EXACTLY ONE protest
    (amongst a large number of supporting statisticians who had seen
    the malpractice among their social scientist colleagues and applied
    statisticians) from a very prominent member of the "social scientist"
    group, who shall remain nameless.

    He made his protest known to Bill Kruskal, probably because I was at
    the University of Chicago at the time and the review indicated my UC
    affiliation. The protester didn't like my categorical use of "a
    subculture of economists and social scientists", and the derogatory
    implication of the word "subculture".

    Bill was, inadvertently dragged into this cross-fire as an involuntary
    go-between. But through Bill's supreme diplomatic and political
    skills, he persuaded the protester that I was actually on HIS (the
    protester's) side, trying to expose and reduce some ongoing abuses
    of which the protester was aware and sympathetic to my stand. So,
    it was a case of POTENTIAL political storm settled without a ripple.

    Unfortunately, 23 years later, we are STILL witnessing some quacks
    and black-magicians in sci.stat.*, to name but one NOISY ONE,
    Richard Ulrich, who is still embracing the malpractice of drawing
    causal inference from correlational data -- as recently as a few
    days ago, and he is not even a social scientist!

    Another one of my former colleagues at the University of Chicago,
    Harry Roberts, had passed away, last year. The Obituary began with:

    Harry was one of a few academicians I have respect for. He published
    little, but what little he published was NOT because he had to
    publish, but because he thought it important to statistics and
    statistical education.

    I learned more about Bayesian Statistics (both theory and practice)
    from Harry, than from my mentor L.J.(Jimmie) Savage -- sometimes
    labeled as the Founder of Baysian Statistics in the USA -- from whom
    I learned MUCH. I learned it from Harry from TEACHING from his
    unpublished Lecture Notes, at the Grad School of Business in Chicago.
    They were advanced courses primarily populated by advanced MBA
    students. Savage was marking Harry's manuscript black and blue
    (as Savage did ALL the manuscripts he read) when he was doing the
    same on my dissertation draft. :) That was before 1970. Harry
    was revising and revising with the help of other colleagues the
    same manuscript, even in the 1980s, long after he stopped teaching
    courses in Bayesian Statistics. That manuscript was never published.

    Harry was the theoretician at the UC GSB when I started there as
    an Assistant Professor in 1970. He was soon converted to be a
    dedicated Data Analyst and was bitten by IDA -- the interactive
    data analysis software package I develped at the GSB, within the
    summer of 1972, to teach my course in Data Analysis.

    Our collaboration began almost immeidately thereafter. Harry no
    longer taught any Bayesian Statistics from his Lecture Notes or
    manuscript for his book, and dedicated himself to the promotion
    of non-Bayesian APPLIED statistics, using the package IDA as a
    vehicle in so doing.

    IDA was DESIGNED to be used successfully WITHOUT any manual -- a
    concept well before its time, in 1972, and it WAS used, by students
    in numerous universities WITHOUT any manual for 7 years until there
    was sufficient demand for me to compile and organize what's
    internally documented in the "conversational system", together with
    illustrations of HOW to do certain statistical procedures and
    calculations WITHOUT an special Proc or Command for doing so, such
    One-Way MANOVA, Mann-Whitney U, Kruskal-Wallis ANOVA, and Two-Stage
    Least Squares.

    Our joint work resulted in the co-authorship of two books: A
    "Conversational Statistics with IDA" book (1982) written mostly by
    Harry; and "IDA: A User's Guide to IDA: Interactive Data Analysis
    and Forecasting System" (also 1982) written mostly by me. Earlier
    versions of those books appeared as early as 1980.

    On the topic related to Correlation and Causation, Harry wrote) wrote
    (page 17-21) on explaining the interpretation of R-square in a
    regression (where R is exactly the same as the correlation coefficient
    |r| between X and Y in a simple regression:

    1. The word "explained" is sometimes erroneous thought to
    connote causation whereas it refers only to deviations
    of fitted values from the overall mean, without any
    implication that the regression model that produced
    these fitted values has captured any causal scheme
    underlying the data.

    In other words, it was a strong CAUTION that a regression does NOT
    "explain" anything. It merely FITS a model to day, and above all,
    one is NOT to draw any unwarranted "causation" inference from

    John Tukey was another former academic colleague who passed away
    since my "cold turkey" (not cold Tukey) retirement from the
    university and the statistical professor (out of disgust of the
    ongoing ills of our educational system and the statistical
    profession, some of which are actively discussed in threads in
    this newsgroup).

    John was more of my statistical mentor than he was a professional
    colleague. He passed away on July 26, 2000. My "cold turkey"
    retirement from the profession was so COLD that I didn't realize
    John had died until 2003, when I stopped at the JSM Annual Meeting
    in San Francisco because four of my former doctoral students were
    presenting papers there, and I saw the "Tukey Memorial Lecture"
    on the Program. Dave Hoaglin must have been amused when I revealed
    that I didn't know Tukey had died until then, more than THREE YEARS

    I first met John when I gave an Invited Talk at the Princeton
    Department of Statistics in 1973, deliberately bringing coal to
    New Castle by selecting "Interactive Data Analysis" as the title
    of my talk. :) Of course by then, I already had a tremendous
    influence by John's philosophy in Data Analysis -- for enlightment,
    rather than a drunk using a lamp post for support. I later was
    selected by NSF to participant in John's Special one-week Seminar
    for professors, and it was there that I received two of the five
    books I kept, in statistics, after my 1999 retirement. His Data
    Analysis book, and the special bound Mosteller and Tukey book with
    my initials on it.

    When I contributed my "Interactive Data Analysis" article to the
    Encyclopedia of Statistical Sciences, one of the referee thought
    I was identifying "Data Analysis" too closely to Tukey's brand of
    Data Analysis.

    I am sure Tukey had much influence on my style, but MY brand of
    Data Analysis actually identify closely with Tukey-Anscombe-
    Mosteller-Box-Schatzoff, as well as many of my contemporaries,
    such as Hoaglin, Velleman, Cook, Weisberg, Cleveland, and a few
    others. It's Tukey and Box's brand of Exploratory Data Analysis,
    fully integrated with the Confirmatory Data Analysis of Interval
    Estimation and Hypothesis testing, as long as those are done in
    an ENLIGHTENED manner, not as a drunk seeking support.

    Where does Tukey come in on the CORRELATION topic?

    Perhaps some you know the quotes and the sources of the quotes
    that I have long remembered (and understood WHY), but had forgotten
    the sources.

    One of them was associating certain kind of confirmatory data
    analysis as "sweeping dirt under a rug".

    And the use of CORRELATION is "sweeping dirt under a rug with a

    There is MUCH Truth in the last statement that you have to ponder
    over all the "useless" uses of correlation as well as abuses of it.

    On this occasion of my Memorial tribute to three of my former
    statistical colleagues, I am going to take a Three Day Moratorium
    in posting to this group, to ponder over my own mortality, and let
    readers have a chance to ponder over the state of ignorance,
    quackery, malpractice, and occasional enlightments in statistics
    and the statistical profession, which can be seen in this
    newsgroup, a microcosm of the statistics world at large.

    During the Moratorium, I'll also have the opportunity to enjoy a
    couple of days of my retirement, and catch up on some reading
    and listening of "noise", while ever keeping my eyes open for
    the silver-lining and the needle in the haystack of statistical
    practice and discussions. :)

    -- Bob.
    Reef Fish, Jun 15, 2005
    1. Advertisements

  2. I must be honored to be mentioned in Bob's Memorial.

    But net-copping duties never end.

    On 15 Jun 2005 12:11:42 -0700, "Reef Fish"
    - Never from the correlational data alone. Bob would stop
    trashing me with misrepresentation, if he were intellectually
    honest. But Bob reads badly, and then he sticks with it.

    I do plead guilty to embracing the "intelligent model-building"
    that he also seems to oppose.

    I wonder if this is this the topic --
    I mentioned that Bob's conclusions about what works in
    the classroom must be based on observational correlations
    (and they are social science, no less).

    - That was intended as a cheap, joking disproof of Bob's
    position by reductio-ad-absurdum, not as private claim
    of a good result. Is Bob a good observer?

    I think I count myself as a "social scientist".

    [snip, much more]
    Richard Ulrich, Jun 16, 2005
    1. Advertisements

  3. Reef Fish

    clemenr Guest

    May I ask what "intelligent model-building" is? As an AI dweeb I'm on
    the alert for cross-overs between AI an stats. Not that I plan to do
    anything, but because they are interesting. Going through the articles
    in the _Intelligent Data Analysis_ journal, I find that these are
    mainly the application of technology that overlaps with AI, such as
    machine learning techniques, as data analysis techniques. There are
    some Statistics expert systems to guide novices in the choice and
    application of statistical hypothesis tests to their data, but I'm not
    aware of much else.


    clemenr, Jun 16, 2005
  4. Sorry to disappoint. I just then pulled the term out of
    the air, and I failed to avoid the AI hot-word.

    Maybe "social science model-building" is sufficiently
    descriptive, though that includes some stuff that I
    don't "embrace". See recent posts in the thread, for
    several outlines of how models are built. I don't see
    any differences between what any of us recently posted.
    And I don't think anything is new to you.

    Social scientists want regression models where the
    variables make sense, and -- I think this is fair -- Bob
    believes it can't be done. However, I have to *infer*
    what he believes, because he seems to me to contradict
    his own posts and lack support from his own citations;
    and he likes almost nothing that I post on it.
    Richard Ulrich, Jun 16, 2005
  5. Reef Fish

    Anon. Guest

    Perhaps "thoughtful model-building" would be a better term? If my
    understanding of your idea is correct, then I find myself using the same
    approaches in ecology as well, so it's not just social scientists who do
    this. I guess your main point is not to rely on automatic procedures,
    but to apply thoght to the process, and ask things like "does this make

    Anon., Jun 17, 2005
  6. Reef Fish

    Reef Fish Guest

    I am building up too much of a backlog of posts to wait for the
    weekend to follow up on becaues weekends are not usually for newsgroup
    The trouble with model-building-fantasizers of the "does this make
    sense" kind is that while is it what ALL "intelligent" and "thoughtful"
    model builders do, the majority of the malpractitioners use regression
    (and correlation) to ascertain cause WITHOUT going through the
    necessary designed experiments of controls.

    The end result is that "what makes sense to them" is far too often
    just their own prejudices and they are using regression as the drunk
    usses a lamp post for SUPPORT, rather than light.

    The "thoughtful" builder asks the SAME question but without
    the "causal mechanism" or what "explains" the observed phenomenon.
    They use regression results in those cases to POINT them to insight
    they had not seen before AND to follow-up with designed experiments
    to ascertain the "causal", "explanatory", or "control" mechanisms
    NONE OF WHICH can be validly established by regression analysis based
    on "uncontrolled" observational data used by social scientists and
    economists to draw their (most often) invalid and incorrect

    "Lower Maximum Speed save Lives" was one of the myths that began
    in the Nixon years (1974), and repealed nearly 25 years later to
    reveal how wrong it had been without a controlled or designed
    experiment when the 55 MPH National Farce was passed, no the
    strength of politicians who knew nothing about statistics and
    causal inference from data.

    In the same Nixon years, a fiscal policy was implemented, based
    on one theory developed by an economist on conclusions drawn
    WITHOUT a controlled experiemnt. The OPPOSITE effect took place,
    and 6 months later, Nixon reversed his course 180 degrees!

    Too many examples of this kind are firmly planted in the history
    of the mis-use of regression and statistical methods.

    -- Bob.
    Reef Fish, Jun 17, 2005
  7. Reef Fish

    Reef Fish Guest

    Ross-c, I'll give you an overview of the subject you questioned.
    But first, I'll give my response to Richard Ulrich's pollution post
    of vacuous substance in this thread, as my generic as well as
    COMPLETE response to ALL of his gratuitous follow-ups.

    For Richard Ulrich, "intelligent model-building" is an oxymoron.

    When I pointed out Richard's error and malpractice in correlation
    and model-building methods, he accused me of knowing nothing what
    "model building". And after I cited him my Encyclopedia article
    in which I discussed model building, he didn't even bother to read
    it before he repeated his allegation that I don't know how to
    build models.

    My tribute to Harry Robert and John Tukey, as well as my thread on
    "Science and Statistics" talking about George's Box's idea about
    model building (which nearly coincides with mine), there should be
    plenty of indication that I do know something about model building,
    of the multiple regression type.

    The point about Richard Ulrich is that he continues to MISREPRESENT
    what I said, and MISREPRESENT himself when he kept sweeping his errors
    under the rug and claim that he was doing something else, while in
    ALL instances I told the READERS to read it in the archives or old
    posts to decide for THEMSELVES instead of listening to Richard's

    The preceding paragraph will be my standard statement regarding
    ANYTHING Richard Ulrich post about me -- read what *I* said in my
    post, whether I said it to others or to him, rather than listen to his
    revision of history by Richard Ulrich, by insisting to be the self-
    serving interpreter, misrepresenter, and LIAR.

    Now I'll respond to YOUR question which I do have some history of
    knowledge, in what *I* did, since the early 1970s.
    I'll use Regression (interactive type) packages as a concrete example
    of the kind of AI (Artificial Intelligence) ideas that had been used
    and much that has NOT found its way into current packages, but are
    highly desirable ones.

    Many of my AI ideas were implemented in my IDA software package in
    (1972) <see my tribute to Harry about his role> and published in
    "General Considerations on the Design of an Interactive System for
    Data Analysis," Comm of the ACM (1980), 147-154.

    The AI features include "User Interface" ("User friendly" features
    and "well-behaved" system), and "Error Detection and Recovery",
    to AI features in the automatic analysis of residuals and other
    regression diagnostics to WARN users of potential problems in
    the model.

    Let me start with a trivial AI implementation I did in the home-
    grown version of IDA in its early years. It had error detection
    and help features on COMMANDS incorrectly spelled, or help needed,
    but to add some HUMAN interest to keep students who were
    frustrated or bored, I put in a section of codes (in the command
    interpreting section) to detect some choice "four letter words" that
    are in common usage, as used by George Carlin or Richard Nixon
    in his expletives.

    When one of those words was detected, the program goes to a
    random selection of one of several responses to that particular
    word, as the system's response to the student's "command" word.

    Students were first surprised that the software would recognize
    such words and gave unpredictable but appropriate responses.
    Then they would DELIBERATELY try some expletives to see how many
    the system would recognize. The system kept a TALLY of how
    many times those four-letter words had been invoked, and when the
    number exceeds a certain threshold in a session, the student would
    be advised to quit playing and get back to their data analysis
    work. :)

    Harry (Roberts) didn't know I had those in the system, and he
    must have found out about it from his students who were gigglng
    and laughing about some of my built-in responses to wonder how
    statistical data analysis could be so "interesting"!

    That's artificial intelligence!

    In the IDA Manual (1982) I had these subsections on IDA:
    2.5 Error Checks on User Input
    5.1 Syntatic Error
    5.3 Logical Error
    5.3 Probable Error

    These are more or less standard features now on browsers, PCs
    and online software; but not nearly as prevalent in statistical

    2.5.6. Automatic Diagnostic checks and warning.

    "IDA automatically checks the residuals of each regression
    model for any gross violation of the standard specifications
    or assumptions".

    2.5.7 Automatic Updating

    This involves (the program) remembering what a user has done
    in a regression problem to have been able to look at certain
    results, such as the ANOVA table associated with the regression,
    by merely typing one command word. If the user, in the
    exploratory style, or regression diagnostic style want to
    find out how the deletion of a particluar SINGLE observation
    might have affect the analysis, as soon the observation is
    deleted, the user will able to ask ANYTHING about the NEW
    fitted model (with the observation deleted) without having
    to re-specify any of the previous tasks because the results
    would have been automatically updated.

    Also a single-word command RECOUP would restore whatever
    had been temporarily deleted or changed and restore all
    results back to the original.

    It automatically checked for gross outliers (against the normality
    assumption) in the residuals; checked RUNS and autocorrelations
    of the residuals for departure from the INDEPENDENCE assumption;
    and checked for other anomalies that a user may need to be warned.

    This was before the days of the "regression diagnostics" of the
    leverage and influence type -- if IDA still existed, those would
    have been the natural candidates to put in as AI in automatic

    The existing statistical software packages have not even begun to
    tap or tackle the many areas of Artificial Intelligence that are well
    within the technical capability of today's computing technology, down
    to even the PC level.

    The whole idea of AI is to find ways of mimicking what an
    intelligent HUMAN data analyst might do in any situation that the
    program may detect that could help the data analyst in tackling
    the problem, online or in batch.

    The potential use of AI in the application of statistics is
    virtually limitless. But it requires a develper who is well versed
    in both STATISTICS and SOFTWARE design/coding/engineering.

    That is one area that statistical software develpers today have
    explored only the tip of the iceberg.

    Some of them have not even gone as far as where I had been
    30 years ago.

    Ross-c, I hope the above gives you an old-bird's-eye view of
    some areas in statistics where AI had been, is possible, and
    wanting; as well as some idea of what "model building" is all
    about -- a thoughtful and careful analysis of data AIDED by
    software with AI capabilities to make the task of CAREFUL
    exploratory analyses much easier.

    It's NOTHING like letting a program do ANY actual model-
    decision for the user -- in fact the direct opposite -- it's
    pointing users to the correct direction(s) to look and decide.

    The UNINTELLIGENT model-builders are those encountered in the
    social sciences everyday: untrained to deal with statistical
    data, and stuff garbage IN, and get garbage OUT, into SPSS, SAS,
    or whatever program they have at their disposal.

    -- Bob.
    Reef Fish, Jun 17, 2005
  8. Reef Fish

    Data Matter Guest

    Some balance is required, in my view. Substantively I agree with what
    Bob is saying, that not enough attention is paid to whether regression
    assumptions are violated, and to designing experiments to establish
    some idea of causality (if they are feasible).

    On the other hand, I'm more comfortable with results that are
    "understandable". I recall being interviewed by a statistician (in
    industry) a while ago. I was asked if I had found any unexpected
    insights from my data mining work. To me, "unexpected" insights are
    of the "there are more burglaries when the moon is full" kind i.e.
    spurious correlations not generalizable. I feel much better when I
    find correlations which I can interpret. It is wrong to put undue
    emphasis on "unexpected" insights; ex-post "obvious" insights are often
    more solid.
    Data Matter, Jun 17, 2005
  9. Reef Fish

    Jerry Dallal Guest

    While there may be things to criticize gives one pause.
    How does one proceed when controlled experiments are unethical? I agree
    that it is often worse to act on bad data than not act at all.
    Jerry Dallal, Jun 17, 2005
  10. Reef Fish

    Reef Fish Guest

    But that's exactly where the "expected sign fallacy" arose.
    How could the sign of a model for predicting GPA in college have
    a NEGSTIVE sign in the coefficient of SAT_Math? It's not
    "understandable" to them. By the same token, what seems
    understandable may in fact be the WRONG "explanation" and the
    WRONG cause.

    The government and other gullible non-thinkers had been fooled
    for nearly 30 years on the false causal "assumption" that lower
    maximum speed on Interstate saves lives, as if that was the
    DIRECT cause, and that RAISING back the speed from 55 to 70
    (as the law was finally repealed) will CAUSE many lives to be lost.­s/speed/u_turn.html

    Empirical data and statistics proved otherwise! They should
    have done the CONTROL study 30 years ago!!!

    *> In California, where interstate speed limits are set at 70 mph,
    *> the fatality rate declined 4 percent between 1995 and 1996 -

    That's just a drop in the bucket of the flood of examples that the
    legislators and politicians were full of bunk!

    Here's another:­a-346es.html

    Here's a pdf version of the detailed report (long download time):­346,pdf

    Here's another one on the same theme:­obestory.html

    The wrong model was an "understandable" model. It was so
    understandable that even the brain-dead politicians in
    Washington and laymen could understand it -- and were fooled
    alike, for nearly 25 years, until the speed limit was repealed
    by Congress, and was raised from 55 MPH to 70 MPH. Statistical
    quacks began estimated the expected INCREASE in fatality,
    until data showed that it actually DECREASED. :)

    What is "understandable" is not only "not necessarily correct",
    but is OFTEN wrong, because such models are built by statistical
    quacks who don't understand anything about "partial correlation"
    or anything about how causal inference CANNOT be validly drawn
    from purely observational data, no matter how many "path-diagram"
    arrows ones draws on paper, without a "designed experiment or

    You would have LOVED to use the NEGATIVE correlation between
    smoking and STOMACH cancer as having found a miracle cure as Salk
    and his vaccine wouldn't you? This was pointed out at the time
    when the POSITIVE correlation between smoking and LUNG cancer
    was used as the CAUSAL inference for "smoking causes lung cancer"
    until some joker had the gall to ask "how about the same kind
    of data for stomach during the same period the lung cancer data
    was observed?"

    Half a century later, there STILL has not been a single valid
    controlled experiment on that issue -- on HUMANS. :) And the
    evidence remain slim to highly questionable, notwithstanding
    all the warning labels the Surgeon General have put on
    cigarette boxes.

    You are placing the WRONG emphasis and interpretation on what I
    said about "unexpected findings". What I said was a model builder
    should leave the door OPEN for unexpected results, and not BAR
    them simply because some sign seem wrong to THEM.

    Whether a result is EXPECTED or UNEXPECTED, the enlightened data
    analyst places the emphasis on HOW they were derived and

    That is Science and Statistics, and the PROPER use of model-
    building methodology in regression analysis!

    -- Bob.
    Reef Fish, Jun 17, 2005
  11. Reef Fish

    Jerry Dallal Guest


    You're an anecdote, no? You might argue the Finnish study is an
    anecdote, too, but it is a very powerful anecdote. What about all of
    the Olympic gold medals you might have won had you not smoked? :)

    David Salsburg has a very nice chapter about causation in his book "The
    Lady Tasting Tea" and what causation means when not every smoker
    develops lung cancer while some nonsmokers do. He states, following
    Russell, that there is no such thing as cause-and-effect and discusses
    the concept of "material implication". He describes the 1959 paper by
    Cornfield et al. on smoking and cancer as "a classic example of how
    cause is proved in epidemiological studies. Although each study is
    flawed, the evidence keeps mounting, as one study after another
    reinforces the same conclusions." (p 191)

    What I like about the Finnish study is its response to Fisher's
    challenge that the observed associations between cancer and smoking
    might be due to confounding with genetics. According to Salsburg,
    Fisher "assembled data on identical twins and showed that there was a
    strong familial tendency for both twins to be either smokers or
    nonsmoders. He challenged the others to show that lung cancer was not
    similarly genetically influenced." (p 194) The Finnish study comes back
    at him as a great big, in-your-face "So, there!" Kind of hard to argue
    genetics when dealing with monozygotic twins.

    I'm am old fashioned Bradford-Hill kind of guy, myself.
    Jerry Dallal, Jun 17, 2005
  12. Reef Fish

    Reef Fish Guest

    Welcome back, Jerry.

    Excellent question, and excellent comment.

    My answer would be, if you CAN'T do a control experiment on the
    subjects, then DON'T draw strong conclusions from them AS IF
    it had been done.

    I am A counterexample to all the unproven Myths regarding smoking!
    That it's additive and that it causes cancer.

    I smoked HEAVILY for 30 years (about a pack and a half a day, from
    regular Marboro to King Size Malboro ot the longest Malbora I
    could buy). :) I quite "cold turkey" (I do that often because
    I like cold turkeys I suppose) in 1992, NOT for health reasons,
    but for the SOCIAL stigma and inconvenience attached to the
    habit. When I had to go down SEVEN floors of the Harvard Science
    Building, to smake a cigarette outside of the building, and then
    go back seven floors (not the mention all the inconveniences in
    restaurants, airports, and airplanes), that was when I decided
    QUIT. End of story. :) I had no ill effects of tar, nicotine,
    or anything else associated with my long term smoking. I don't
    mind people smoking in my presence or even blow smoke in my face,
    if they don't do it too often and too deliberately. :)

    Never missed a cigaretta a single day -- surprised even MYSELF,
    having listened to the same propaganda as you have, I am sure.

    There was never any "controlled experiments" on humans (a controlled
    experiment is the NECESSARY prerequisite to ascertain staatistical
    causal link). All the surgeon general's flunkies did was to
    subject RATS to smoking the equivalent of 10,000 (or more) cigarettes
    a day, and concluded that there was a tendency for those RATS to have

    If I WERE one of those rats, subject to the (ooops I can't say inhumane
    because they are rats) cigarette smoking torture, I would have wished
    I could have cancer and die SOONER! :)

    Okay, so I opened TWO cans of worm today: The Smoking-Lung Cancer
    link Myth, and the Highway-fatality and Maximum Speed Myth. The
    former is a thread that will be recurrent and will never end until
    scientists found the TRUTH -- that it's all related to the
    individual's GENES. You have good genes, you live. Bad genes,
    you die -- from one illness or another.

    -- Bob.
    Reef Fish, Jun 18, 2005
  13. Reef Fish

    Reef Fish Guest

    My anecdote is a COUNTEREXAMPLE to at least the addiction theory of
    cigarrette smoking. It only take one counterexample to disprove
    that theory and I've heard similar anecdotes from others.

    I had NO withdrawal symptom. Not even any discomfort when I stopped
    cold turkey style. Never missed it.

    If there was ADDICTION, such as in the use of hard drugs, none of
    that would have been remotely possible.

    If smoking were that harmful, the government could have made a law
    to make it ILLEGAL to somke cigarettes. There's a law against
    smoking marijuana and it's not even additive nor harmful, and is
    even beneficial to certain patients.

    So, if the Surgeon General REALLY thought smoking was so harmful, why
    is everyone allowed to smoke as much as they wish?

    It's all POLITICS.

    BTW, I don't smoke weeds either. :) I don't even drink. A clear
    and working MIND that my genes gave me is too beautiful a thing to
    mess with.
    Yeah, and how many of these studies are CONTROLLED studies on HUMAN?
    How many on rats? How many merely correlational? :)
    That's more like it.
    Just like the highway fatality vs max speed kind of statistics. TOO
    simplistic and IMPROPERLY controlled in the Finnish study. The valid
    control would have been assigning the MZ twin pairs (before either
    started smoking) by randomly assigning one to smoke (as rats are
    randomly assigned) and the other not smoke. Then at least you have
    a better handle on the SMOKING factor. There are still dozens of
    other contributing factors to cancer (as there were in the highway
    fatality effect besides MAX Interstate speed limit) completely
    uncontrolled and NOT included in the Finnish study.

    So, there!

    Genetics is not EVERYTHING. The cited study certainly lacked depth
    and control or analysis of OTHER factors that undoubtedly contributed
    to the result cited, having mentioned that it wasn't strickly a
    controlled study because the "control" group was self-selected. :)

    Besides, why only the post-hoc risk cited for MALES and not females?
    What about all the OTHER contributing causes to cancer besides smoking?
    Any of them even considered or mentioned?
    Is Bradford-Hill the control or the experimental group? Jerry, you're
    just not enough CRITICAL (I could, but won't say gullible) kind of
    guy by not asking enough questions and challenging the improperly
    conducted and reported study, as I did.

    -- Bob.
    Reef Fish, Jun 18, 2005
  14. Reef Fish

    Jerry Dallal Guest

    I was going to go to bed, but I'll bite. You're looking for something,
    but I'm not sure what.

    If your ability to kick the habit disproves the theory that cigarette
    smoking is addictive, then doesn't my mother's *in*ability to kick it
    (and she tried everything, hypnosis, the works) disprove the theory that
    smoking is nonaddictive?

    But, a theory that smoking is or isn't addictive isn't meant to mean
    that it has to apply in every case. You're asking what "cause and
    effect" means. As Salsburg writes, "There is, in fact, no such thing as
    cause and effect. It is a popular chimera, a vague notion that will not
    withstand the batterings of pure reason. It contains little of no value
    in scientific discourse." (p 185-186, The Lady Tasting Tea). Your
    arguments illustrate what Salsburg is saying.

    I don't take "Smoking is addictive" to be an absolute. I take it to
    mean that the probability that a randomly chosen smoker is addicted is
    some value greater than 0. What else can it mean if it is to "withstand
    the batterings of pure reason"?
    Depending on how you define "politics". It has a lot to do with human
    behavior and the extent to which it can be regulated, as illustrated by
    the 18-th amendment to the US constitution. And money.
    none that I know of.
    That's why it's so difficult to demonstrate beyond a reasonable doubt.
    With my students, I like to contrast the ease with which the Salk
    vaccine trials demonstrated the vaccine's effectiveness in one summer,
    while there are some who will still try to argue that smoking may not be

    Demonstrating that smoking is harmful in the sense that P(bad
    stuff|"things") increases when smoking is added to "things" is difficult
    in the absence of controlled trials, but not impossible.

    I'm not arguing about the driving speed.

    The point here is that just as Fisher studied twins to show a genetic
    predisposition to smoke, the Finish study looked at MZ twins ("Just like
    you, Abbott! Just like you!") to show an association between smoking and
    cancer, free of *genetics* as a contributing factor.
    You are touching once again on what we mean by "cause and effect". When
    controlled studies are not possible, one has to be very careful about
    how one assembles the appropriate observational studies. One study will
    not do it. It make take scores of studies over dozens of years. That's
    why Salsburg is so high on Cornfield et al and the care with which they
    addressed such issues. I'm convinced smoking is bad in the
    probabilistic sense I gave earlier, but as you demonstrate this feeling
    is not universal.

    The example you want is hormone replacement therapy (HRT). The
    prospective studies are showing HRT to be...not good. If this is true,
    it represents (or so I have been told by those who do this for a living)
    the greatest failing in the history of modern epidemiology. The epi
    studies are overwhelmingly in favor of a benefit. It would be like
    doing a controlled smoking study and finding the nonsmokers dropping
    like flies. The models and expectations generated by the HRT epi
    studies don't seem to apply to the prospective studies and there are a
    lot of people trying to understand why.

    Sir Austin
    I would say that you are being too critical if you are arguing that
    causes cannot be demonstrated in the absence of controlled experiments.
    *Very* difficult, not impossible.
    Jerry Dallal, Jun 18, 2005
  15. Reef Fish

    Data Matter Guest

    I totally agree with you on what you're saying. I think there needs to
    be a balance between "practical mindedness" and "trust in scientific
    procedures" (for lack of better words).

    When I wrote what I wrote before, I was just pointing out that at the
    other extreme of what you were pointing out, there are scientists who
    question their own existence unless their models come up with something

    Since you are an expert in cluster analysis, how do you reconcile this
    discussion with the fact that there is no one correct cluster
    structure, that without the class variable, you can't directly measure
    the quality of fit, and that by most accounts, "interpretability"
    should drive the choice between different clusters, and that
    experimentation with many methodologies are encouraged?

    My position is summarized by my disgust at seeing cluster analysis
    described in a popular (non-scientific) text as "automatic cluster
    detection". There is nothing automatic about this procedure!
    Data Matter, Jun 18, 2005
  16. On Sat, 18 Jun 2005 00:58:29 -0300, Jerry Dallal
    I'd be disappointed in my friends in epi if they thought
    it was any "great failing" at all. The articles about the
    controlled study -- in the *newspapers* that I read -- made
    it clear that the controlled study was considered vital
    because the observational studies could not be considered
    definitive. (a) They were observational; (b) there were
    known (or strongly suspected) selection effects for those
    samples; and [the newspapers did not say this one nearly
    as clearly] (c) the effect sizes were small enough that
    they could be reasonably explained by a confounding
    variable. IIRC, effect sizes not more than the 150% Odds
    Ratio for mortality that *I* consider moderately easy for
    confounders to account for. The small effect size (in both
    absolute and relative terms) is why the sample had to be
    10 thousand or so, which is why it wasn't done earlier.

    By the way, in the controversy on *smoking*, (c), small
    effects, was never at issue for lung cancer; and (b) selection
    effects and hypothesized confounds were systematically
    eliminated. Also, biological arguments were strengthened
    over time. Heart disease is considered the bigger killer from
    smoking, but it was the tougher argument to make because
    the OR for deaths is only 2.0 rather than 5.0 or more.

    HRT is, for many women, a physically harsh treatment. Women
    who "stayed the course" -- in the earlier, misleading,
    observational studies -- were the ones who were particularly
    (conventionally) knowledgeable and careful about their health.
    I thought that was sufficient explanation, but of course they
    should be trying to document *exactly* what mattered.
    [classical 1965 essay on "The Environment and Disease:
    Association or Causation".]
    So, Bob has finally weighed in, to answer my oft-repeated
    question. He does not credit much of epidemiology.
    That says more about him than about epidemiology.

    More -
    The 70 mph speed limit was adopted for saving gas, during
    the energy crisis. Arguments about "lives saved by lower
    speeds" were always considered doubtful by rational
    analysts who pointed to disparity of speeds as being the
    bigger hazard on highways. [The counter to that was,
    for a time, "Get stricter enforcement.]

    I do not understand the grounds on which Bob seems to
    place credence in studies exonerating 70 mph. The
    ones that I have seen, which were, indeed, credible,
    amassed inferences across circumstances. In their totality,
    they are still far less convincing than the comprehension
    and scope of the evidence about smoking.
    Richard Ulrich, Jun 18, 2005
  17. Reef Fish

    Jerry Dallal Guest

    [Disclaimer: As I stated in my original post, my discussion of HRT is
    based on what my epidemiologist colleagues tell me. I don't do research
    in HRT, myself.]


    I *have* seen some NIH submissions that would curl your hair and I don't
    know your epidemiologist friends, but in my neck of the woods we don't
    put thousands of people into a study where the active treatment is
    expected to be harmful.

    Before the RCTs that made the news in 2002, it was generally felt that
    HRT had a favorable impact on CHD...or maybe not,...but not harmful with
    respect to CHD. The big debate was whether the expected decrease in
    risk of CHD outweighed what was then considered a small increase of the
    risk of breast cancer. The fact that the controlled studies showed
    increased risk of CHD came like a bolt from the blue.

    What might seem like a small digression regarding beta carotene: In the
    CARET study, it was found that those taking beta carotene had, contrary
    to expectation, an increased risk of lung cancer. There, the risk was
    for heavy smokers, It is now generally felt that beta carotene is not
    harmful for nonsmokers, but is not a supplement to be taken by smokers.
    So, there's a ready explanation for the inconsistency between the
    CARET data and observational data suggesting that beta carotene might be
    protective. The "smoker" hypothesis seemed to be playing out in other
    studies that were ongoing at the time, last time I looked.

    To get back to HRT, there is yet to be a similar convincing explanation
    of the reason for the discrepancy between the observational and
    controlled HRT data.
    While Bob is fully capable of defending himself, Bob has always been one
    of those who wants things demonstrated through controlled experiments.

    Epidemiology works by looking at a possibly causal relationship in a
    wide variety of setting. If the same thing keeps manifesting itself,
    you start to expect it, like Kafka's Leopards at the Temple ("Leopards
    break into the temple and drink to the dregs what is in the sacrificial
    pitchers; this is repeated over and over again; finally it can be
    calculated in advance, and it becomes a part of the ceremony.")

    Epidemiology suggested beta carotene might protect against lung cancer.
    The CARET study is undertaken. Ooops! The epidemiology doesn't apply
    to heavy smokers.

    Epidemiology suggests that HRT may be protective against CHD. Many
    intervention trials are begun. Ooops! But, here, the reason is
    unclear, so much so that the "estrogen only" arms of some trials were
    allowed to continue under the theory that it might have been the
    combination with progestin that was harmful. This is where the problem
    lies. If this paradigm for establishing causality is to continue being
    useful, we have to be able to identify the reasons why it fails.

    To my knowledge, there is no instance where RCTs have led to *dramatic*
    failures, that is, establishing as effective therapies that became
    widely accepted only to be shown to be harmful later on.

    [Disclaimer: As I stated in my original post, my discussion of HRT is
    based on what my epidemiologist colleagues tell me. I don't do research
    in HRT, myself.]
    Jerry Dallal, Jun 18, 2005
  18. On 17 Jun 2005 06:59:44 -0700, "Reef Fish"
    Well, Bob failed to give a useful reference, earlier, so I
    couldn't find a damn thing when I looked.

    Here is what I posted, which Bob disparages above.
    Bob had quoted a few lines from his Encyclopedia article,
    which sounded very reasonable, but not useful. --
    ==== Jun 1, 7:00 pm, from my post
    RU >
    "I have not yet seen the article, but the quote sounds like it
    might leave intact my distinction between developing a
    prediction equation numerically, versus building a model
    while making use of the sense of the variables. "
    ==== end
    I had drawn a distinction between creating "prediction equations"
    and "models" of the sort that assign meaning to variables; and,
    Yes, I offered the speculation that Bob was doing the former.
    Bob never offered any re-write of the idea.

    Bob summarizes badly, totally ignoring the specific context
    (above) for the word "model." Summarizes carelessly,
    disingenuously, dishonestly.

    RF >
    Bob won't clarify what I am mis-representing, so far as I can tell.
    He has been saying this for a while.
    ==== May 30, 7:58, from Bob's post
    RU > ... And, my
    RF >
    You LIE was, and IS, misdirecting the attention to predicction
    equations and model building, rather than facing your ABUSE
    and IGNORANCE about the "expected SIGNS" of coefficients.

    How many times will that have to be
    repeated before it sinks in?
    I would prefer a little explication, rather than repetition.
    Bob repeats. Dull. Unappetizing. Unpleasant.

    Is it misdirection to distinguish the modeling?
    It seems to me to be the core of our difference.
    My model building leads to "expected SIGNS".
    If it is not the core, why isn't it?

    This is what led me to conclude that *I* did not know
    what Bob was talking about, and it was hardly worth
    pursuing more.

    It does cross my mind that my earliest posts on the subject
    were less focused than now; I had never generated these
    arguments before, so there was value for me in working
    out the expressions, such as "epidemiology as model
    for social sciences." I now wonder if Bob attached
    himself to some particular sentence of mine, with his own
    peculiar reading of it, and I wonder if he is now angry
    that I don't say it again or defend it. Well, the re-formulation
    *is* the defense, at least until Bob cites the problem.

    It is my impression that Bob has come closer to making
    relevant remarks in the last few days, as he stumbles
    all over the content of real studies.

    [snip, rest, including a nice description Bob's fine
    interactive analysis program.]
    Richard Ulrich, Jun 18, 2005
  19. Reef Fish

    Jerry Dallal Guest

    [Disclaimer: As I stated in my earlier posts, my discussion of HRT is
    based on what my epidemiologist colleagues tell me. I don't do research
    in HRT, myself.]

    I don't have a problem. I have a point. The point is that if the
    epidemiology had suggested the kinds of increased risks uncovered in the
    RCTs, the RCTs would never have been done on such a grand scale. The
    RCTs contradicted the observational studies by showing increased risk of
    CHD where observational studies suggested a benefit. To my knowledge,
    the reason for the discrepancy has not been resolved. [FWIW, it is my
    understanding that the RCTs pretty much supported the other claims for
    HRT based on observational data.]

    This is important because up to the end of the 20th century, the
    conventional wisdom was that HRT was likely to be beneficial with
    respect to CHD. You can verify this for yourself through a simple
    search in Google Scholar. A benefit had been suggested by so many
    disparate studies that it is my understanding, from those working in
    HRT, that the benefit was approaching the levels required to establish
    "cause and effect" by those who work with observational data. The RCTs
    were to be the proverbial icing on the cake.

    I'm happy to discuss this further, but unless this is your area of
    application/expertise, in which case I'd yield to your expertise, I'd
    ask that first you speak with your colleagues who work in the area, as I
    have done.

    [Disclaimer: As I stated in my earlier posts, my discussion of HRT is
    based on what my epidemiologist colleagues tell me. I don't do research
    in HRT, myself.]
    Jerry Dallal, Jun 18, 2005
  20. Here's the ethical dilemma, which is the flip side of
    randomizing folks to "smoker/ non-smoker": If EVERYONE
    becomes convinced that a treatment is good, based on
    observational data, it becomes, arguably, "unethical" to
    deprive subjects of the treatment. You can't be an ethical
    scientist and still participate in the trials, if you are *too*
    easily convinced. I try to keep a health skepticism about
    all the "observational" results that have small effect sizes
    (many of the huge ones) or marginal p-values (think of
    data dredging).

    A lot of treatments are tested where there might be a
    negative outcome. Is your problem with the "thousands"?
    What was expected was a positive effect, *small* in
    absolute or relative terms, thus requiring a large sample.
    The result was "small, negative." The risk of doing no-study
    was that an ongoing treatment *might* have zero or
    negative effects.

    JD >
    I have said before that I credit Bob's 1982 JASA review with
    helping to sustain a healthy distrust of observational studies.
    HRT did have a speculative biological explanation, I think,
    but that was not convincing, either.

    I didn't think the change in outcome was unusually large.
    I thought that the reaction was chagrin, not shock -- The
    public heath media does oversell every outcome, whether
    most scientists are well on-board or not; and the scientists
    get blamed whenever there is revision.

    [snip, beta carotene/ CARET; a little more.]

    RU > >
    Maybe that was an awkward transition by me. Bob's problem
    is that he "wants things demonstrated through controlled
    experiments" and nothing else will do. Certainly, Randomized
    Control (RC) is the better way. - My own epidemiology example
    had to do with detecting what food was responsible for food
    poisoning at a picnic, where the data are post-hoc and observational.
    Smoking is another example.

    Certainly, RC is not the only way. Bob seems to reject the
    smoking evidence, which was never controlled. Confusing
    to me, now, he seems to accept the 70 mph studies, where
    RC is missing, and hard to even conceive of. Bob mentions
    "control" but that must be statistical control, which is what
    so much model building and testing is about.

    [snip, beta carotene; more on HRT ... ]
    Richard Ulrich, Jun 19, 2005
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.