Complex(?) Data restructuring

Discussion in 'SPSS' started by Simon, Oct 18, 2011.

  1. Simon

    Simon Guest


    I was hoping to get some help restructuring some data into a format
    that I can use for further analysis. It might involve SPSS creating a
    new dataset for each run? Currently my data looks like this...

    Season Year Month Day Stanton
    Summer 1958 1 1 0
    Summer 1958 1 2 0
    Summer 1958 1 3 3

    ....but I need it to look this...

    1958 1959 1960
    0 1 0
    0 0 0
    3 0 3

    Where each new dataset represents a single season (summer, autumn,
    winter, spring), and under each year is the 90ish days representing
    each season (each of values represent rainfall). Possible?

    I hope that makes sense.

    Simon, Oct 18, 2011
    1. Advertisements

  2. Simon

    David Guest

    Your example input does not map to your example output in any obvious
    Please repost your query with a more illustrative example.
    David, Oct 18, 2011
    1. Advertisements

  3. Simon

    Rich Ulrich Guest

    I agree with David, that making sense of this is hard.
    But I *think* that I figured it out.

    Your present record has Season Year Month Day Rain,
    as you show, but changing the last var-name.

    What you need to create, as a first step, is a file with
    records that have Season Month Day Rain58 Rain59 Rain60 .
    Your example drops the month-day identifiers, but that's
    not a good idea, whether you intended it or not.

    You can do this step by using Cases-to-vars, after
    sorting the file by Season Month Day Year.

    Then you can write out four files by using DO IF
    to test for seasons, with the four XSAVE commands.
    (You may drop the Season variable, if you want to, but I
    would keep it around to help confirm identification).
    Rich Ulrich, Oct 18, 2011
  4. Simon

    Simon Guest

    Thank you both. But perhaps I am going about this the wrong way...

    Let me explain, the headers (1st row) contain day, month, year,
    season, climate type, gauge station 1, gauge station 2 etc rainfall
    station 2, rainfall station 2 etc

    What I want to look at is trends in rainfall and riverflow at both the
    seasonal and annual basis for each climate type (12 of these) (this is
    easy) BUT the clincher is I want to look at certain percentiles for
    flow (essentially extreme events) and rainfall.

    So for example, flow trends for summers ranging from 1958-2002 -
    climate type 1, gauging station 1 then 1958-2002 - climate type 2,
    gauging station 1 etc

    Does that make sense? Cheers, appreciate your help! :)
    Simon, Oct 21, 2011
  5. Simon

    David Guest

    First thought (sans coffee) is that the data are currently in the
    appropriate format and you *DO NOT* want to go long to wide.
    I would first deal with any missing time points (are these daily
    Modelling: Seasonal ARIMA models come to mind (separately for each
    climate type). Steep learning curve if you are not familiar with the
    HTH, David
    David, Oct 21, 2011
  6. Simon

    Simon Guest

    Thanks mate, sorry for my late reply. Big weekend in New Zealand.

    I am familiar with ARIMA models but all I want to do is Kenall Tau
    test - just keeping it simple at this stage. Cheers
    Simon, Oct 24, 2011
  7. Simon

    Bruce Weaver Guest

    Bruce Weaver, Oct 24, 2011
  8. Simon

    Simon Guest

    So long to wide, any advice? unless I can filter the dataset by
    season, kst and percentile all at once?
    Simon, Oct 26, 2011
  9. Simon

    David Guest

    COMPUTE FILTER=((season EQ ? ) AND (kst EQ ?) AND (percentile EQ ?)).
    do whatever.
    Rarely advisable to take data from long to wide.
    If you insist see CASESTOVARS.
    David, Oct 27, 2011
  10. Simon

    Simon Guest

    I know how to filter the data, what i really need to do is calculate
    the 1,5,10,90,95 and 99 percentile for each season by year. So for
    summer 1959 I would have the 1,5,10, 90,95 and 99 then for 1960 etc,
    then same again but swap summer for KST type by year...does that make
    sense? I have no problem calculating and doing all these steps
    individually but I cant recycle the output for further analysis. It is
    a real shame SPSS doesnt (or does?) have a 'by group' function like
    Statistica - which again I am also using, but the same problem occurs,
    that is not being able to recycle data for further use.
    Simon, Oct 27, 2011
  11. Simon

    David Guest

    " It is a real shame SPSS doesnt (or does?) have a 'by group' function
    like Statistica -"
    It is a real shame that you don't RTFM a little more carefully?
    See SPLIT FILE ;-)
    See RANK ... BY ... / NTILES (100) / INTO ....
    HTH, David
    David, Oct 28, 2011
  12. Simon

    Simon Guest

    I actually figured that out myself :p ok, final it
    possible to recycle the output for further analysis? cheers.
    Simon, Oct 31, 2011
  13. Simon

    Jon Peck Guest

    Any output table can be captured as an SPSS dataset or in many other formats by using OMS (the Output Management System). Some procedures also provide direct ways of saving computed results as datasets.
    Jon Peck, Oct 31, 2011
  14. Simon

    David Guest

    Not sure what you mean by "recycle the output" in this context?
    RANK doesn't really create much in terms of output. It creates new
    variable(s) in the active data set. If you specify the BY keyword,
    the "ranks" are calculate independently for each strata implied by the
    combined categories of the "BY" variable(s).
    So, what are you specifically attempting to achieve if you would be so
    kind to elaborate.
    David, Nov 1, 2011
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.