PROC SQL Missing values in left join

Discussion in 'SAS (Statistical Analysis Software)' started by Talbot Michael Katz, Oct 29, 2003.

  1. Hi, all.

    I am performing a left join like the following...

    proc sql;
    create table lj as select l.key, l.v1, r.v2
    from l left join r
    on l.key = r.key;
    quit;

    Since not every key value in l is represented in r, the values of v2 for
    those keys show up as missing. I want to replace the missing values with
    zeroes. I can do that by setting

    options missing = 0;

    before running the sql step. I wanted to find a way to do this explicitly
    inside the sql step with a single select. I tried the following, but it
    came out the same as the original version:

    proc sql;
    create table lj as select l.key, l.v1,
    (0 * (r.v2 is missing)) + (r.v2 * (r.v2 is not missing)) as v2
    from l left join r
    on l.key = r.key;
    quit;

    Here is what did work:

    proc sql;
    create table lj as select l.key, l.v1,
    case when r.v2 = . then 0 else r.v2 end as v2
    from l left join r
    on l.key = r.key;
    quit;

    Why doesn't the first substitution work in PROC SQL? It works inside a
    DATA STEP. Thanks!

    -- TMK --
     
    Talbot Michael Katz, Oct 29, 2003
    #1
    1. Advertisements

  2. if r.v2 is missing then (r.v2 * (r.v2 is not missing)) is missing [ missing
    * non-missing = missing ], and 0 + missing = missing.

    Case is much better. It's clear what you are doing and fewer multiplications
    should run faster ?

    BTW, the Base SAS function SUM ignores missings. Try not to confuse or mix
    with SQL group SUM function (which also ignores missings). SUM only returns
    missing if all contributing values are missing; tack on a ,0 to getting zero
    when inputs are all missing.... use with prudence.

    Compare and contrast a confusing mix of SUM in SQL.

    data myData;
    x=.; y=.; output; output;
    x=1; output;
    run;

    proc sql;
    select sum (x), sum(y) from myData;
    select sum (x,y) from myData;
    select sum (sum(x),sum(y)) from myData;
    quit;


    OPTIONS MISSING=0 is only for the treatment of a missing value when rendered
    for output. The value is still missing. (In you assignment expressions,
    you won't be getting nothing from nothing :)
     
    Richard A. DeVenezia, Oct 29, 2003
    #2
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.