# computing a within-subject variable

Discussion in 'SPSS' started by mattiaslarsen, Oct 13, 2011.

1. ### mattiaslarsenGuest

Dear Group,

In our study we investigate the prevalence of sex-selective abortions
in a context of son preference and are looking at births as outcomes
and repeated measures for each mother. We therefore have our data in
long format with multiple births per mother where mother is our
subject. One effect we are particularly interested in is whether a
birth has been preceded by the birth of a son since we think that this
will affect the odds of a birth being a son. If mothers have already
had a son at the time of a birth, they will be less likely to sex-
select through aborting girl fetuses.

I am having difficulties constructing the variable which we would
calle ‘NoPrevSon’ and I am wondering if someone could help. We want to
compute a dichotomous variable with the value 1 if the mother has not
previously had a son, and the value 0 if she already has a son. We
have three other variables from which this could be computed; Subject
ID which is the mothers ID and with which we can specify that we are
only interested in births by the same mothers (i.e. we are actually
looking at a ‘within-subject’ effect), child’s age and child’s gender.
For a woman with four children, of which her first two were girls but
her third and fourth children were boys, we would like the value of
‘NoPrevSon’ for the first birth to be 1, for the second birth to be
1 as well as for the third birth. However, the value for the fourth
birth should be zero. How can we compute this?

Best
Mattias

mattiaslarsen, Oct 13, 2011

2. ### Andy WGuest

Although there are likely many ways of doing this, I am going to
suggest you aggregate a value for the oldest son's age to the mother
id, and then it is as simple as using an if statement to see if the
child's age for the current record is younger then they have an older
brother. What follows is an example in syntax.

*******************************************.
*Making similar data.
data list free / momid kid_age kid_sex (3F2.0).
begin data
1 5 0
1 7 1
1 9 1
1 12 0
2 2 1
2 3 0
3 5 0
3 5 1
4 7 0
4 10 1
end data.

*Lets just say for the kid_sex variable 0 is a female and 1 is a male.

*First lets get the age of the OLDEST SON.

*I just need to pick an age for girls that is smaller than any
possible age for boys.
compute sonONLYage = kid_age.
if kid_sex = 0 sonONLYage = -1.
execute.

*Here I aggregate the maximum value within momid, so it ends up being
the age of the oldest son.
AGGREGATE
/BREAK=momid
/sonONLYage_max=MAX(sonONLYage).

*now it is as simple as seeing if the age of the current child is
older than the oldest son.
if kid_age >= sonONLYage_max NoPrevSon = 1.
if kid_age < sonONLYage_max NoPrevSon = 0.
execute.
******************************************.

If you have any other questions, if you provide an example data frame
with fake data that resembles your variables & variable names it would
help.

Andy W

ps - Thank you for writing a well formulated (and edited
appropriately) question.

Andy W, Oct 13, 2011

3. ### Bruce WeaverGuest

I think this produces the same result without the AGGREGATE.

sort cases by momid (a) kid_age(d).
compute NoPrevSon = (\$casenum EQ 1) or momid NE lag(momid).
if (NoPrevSon EQ 0) NoPrevSon = lag(kid_sex) EQ 0.
list.

OUTPUT from LIST:

momid kid_age kid_sex NoPrevSon

1 12 0 1
1 9 1 1
1 7 1 0 - second kid was male
1 5 0 0

2 3 0 1
2 2 1 1

3 5 0 1
3 5 1 1

4 10 1 1
4 7 0 0 - first kid was male

Number of cases read: 10 Number of cases listed: 10

Bruce Weaver, Oct 13, 2011
4. ### Bruce WeaverGuest

Oops...my code worked for the sample data set Andy generated, but not
for a case like MOMID=5 in the following dataset. Here is some revised
code that *does* appear to work properly.

data list free / momid kid_age kid_sex (3F2.0).
begin data
1 5 0
1 7 1
1 9 1
1 12 0
2 2 1
2 3 0
3 5 0
3 5 1
4 7 0
4 10 1
5 10 0
5 9 1
5 8 0
5 7 1
5 6 0
5 5 1
end data.

sort cases by momid (a) kid_age(d).
compute NoPrevSon = (\$casenum EQ 1) or momid NE lag(momid).
do if (momid EQ lag(momid)).
- do if lag(NoPrevSon) EQ 0.
- compute NoPrevSon = 0.
- else.
- compute NoPrevSon = lag(kid_sex) EQ 0.
- end if.
end if.
formats NoPrevSon(f1.0).
list.

OUTPUT:
momid kid_age kid_sex NoPrevSon

1 12 0 1
1 9 1 1
1 7 1 0
1 5 0 0

2 3 0 1
2 2 1 1

3 5 0 1
3 5 1 1

4 10 1 1
4 7 0 0

5 10 0 1
5 9 1 1
5 8 0 0
5 7 1 0
5 6 0 0
5 5 1 0

Number of cases read: 16 Number of cases listed: 16

Bruce Weaver, Oct 13, 2011
5. ### Bruce WeaverGuest

Ray Koopman pointed out to me off-line that the nested DO-IF above could
be reduced to a single COMPUTE, as follows:

compute noprevson2 =
(\$casenum EQ 1) OR
(momid NE LAG(momid)) OR
(LAG(noprevson2)*(1-LAG(kid_sex))).

OUTPUT:
momid kid_age kid_sex NoPrevSon noprevson2
1 12 0 1 1
1 9 1 1 1
1 7 1 0 0
1 5 0 0 0
2 3 0 1 1
2 2 1 1 1
3 5 0 1 1
3 5 1 1 1
4 10 1 1 1
4 7 0 0 0
5 10 0 1 1
5 9 1 1 1
5 8 0 0 0
5 7 1 0 0
5 6 0 0 0
5 5 1 0 0

Number of cases read: 16 Number of cases listed: 16

Thanks Ray.

Bruce Weaver, Oct 14, 2011
6. ### MattiasGuest

Thanks a lot! This really helps.

/Mattias Larsen

Mattias, Oct 17, 2011