Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205491

How to perform operations in Stata with different groups for each row

$
0
0

Say that I have these data:

clear all
input n str6 G1 str6 G2 v desired computed
1 "B""A" 1 5 .
2 "A""A" 2 5.5 .
3 "C""A" 3 4.5 .
4 "A""B" 4 2 .
5 "B""B" 5 2.5 .
6 "C""B" 6 1.5 .
end

n is observation number, G1 is group 1, G2 is group 2 (say class 1 and class 2), and v is value. desired is the desired output, and computed will be the attempt at the desired output.

My goal is to perform ~in Stata~ an operation, in this example an average, over all observations that had no contact with the observation, including the observation itself---i.e., that were not in the same G1 or in the same G2 as the observation (or are that observation). For example, v for observation 1 would be the sum of the values of v for observations 4 and 6. (1, 2, and 3 are excluded because they share the same G2 as 1. 5 is also excluded because it shares the same G1 as 1.) So we sum the v of observations 4 and 6 and get 4+6=10 and divide by the number, 2, to get 5.

I think I can get what I want with the following code:

local N = _N
forvalues i = 1/`N' {
    preserve
    *create temp, which, when equal to 1, indicates the observations to make the calculation on
    gen temp = 1
    *save locals equal to the first and second group of `i'
    local temp_G1 = G1[`i']
    local temp_G2 = G2[`i'] 
    *make temp = 0 for observations that were in first and/or second group as `i'
    replace temp = 0 if G1=="`temp_G1'"
    replace temp = 0 if G2=="`temp_G2'"
    *compute sum on observations that have a temp equal to 1
    egen sum = sum(v) if temp==1
    *fill in the sum for all obs
    egen sum_all = max(sum)
    *compute number in group
    egen num = total(temp) if temp==1
    display "`num'"
    egen num_all = max(num)
    *save the value of the sum in a local
    local calc = sum_all[`i']/num_all[`i']
    restore
    *fill in the value from the local for row `i'
    replace computed = `calc' in `i'
}

However, this approach seems very long and inelegant. Is there a better way to go about this in Stata? I thought about using bys, but I couldn't figure it out. If it were only G1 or G2, I think it would be easier, but both together seem problematic with double counting---bys might include observations both in the G1 count and in the G2 count.

I guess another way to ask the question is if there is a way to do functions on each observation/row like R's apply family or if I need to use the clumsy loops approach like I do here.


Viewing all articles
Browse latest Browse all 205491

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>