Better way to improve the for loop for my case in R? -
prob stat : data set holds 2 columns mstr_program_list, loc_cat 600000. loc_cat column holds both missing , non missing cells. other columns not havings na's. each prog in mstr_program_list, need find total number of loc-cat associated program, % of non missing rows , among non missing rows find count of categories divided into.
ex : unknown prog - total number of rows = 3, non missing rows in loc_cat 1 therefore % (2/3)*100 , number of categories divided 2 (rests:full) (rests:lim)
> head(data) l.name mstr_program_list loc_cat 1 6 j'sgroup unknown <na> 2 bj's- maine roasted tomat rests: full 3 bj's- maine unknown rests: full 4 brad's q q unknown rests: lim
expected output:
mstr_prog total_count %good(non missing rows) number of loc_cat unknown 3 66.7 2
the code below taking lot of time. in fact results not showing. can me improve code. prob code per view adding vectors.
upon research came know add values vector not use append , go c()
v <- c(v, 'y') # adding elements vector
code:
data <- read.csv("mgdata.csv",header=t, na.strings="", colclasses = classes, nrows = 600338,comment.char="") ## import data. data_nonull <- na.omit(data) mpl_unique <- unique(data$mstr_program_list) mas_prog_list <- as.character() loc_count <- as.numeric() per_seg <- as.numeric() num_seg <- as.numeric() for(i in 1:length(mpl_unique)) { l_t <- length(data$mstr_program_list[data$mstr_program_list == i]) # loc_cat specific prog l_g <- length(data_nonull$mstr_program_list[data_nonull$mstr_program_list == i]) ## know filled ones excluding empty s <- subset(data_nonull, mstr_program_list==i, select =c(loc_cat)) if((any(i == mas_prog_list)) == false) { no_seg <- nrow(unique(s)) mas_prog_list <- c(mas_prog_list, i) # adding values vector loc_count <- c(loc_count, l_t) perct_seg <- ((l_g/l_t)*100) per_seg <- c(per_seg, perct_seg) num_seg <- c(num_seg, no_seg) } } } seg_analysis <- data.frame(mas_prog_list, loc_count, per_seg, num_seg)
i new r. correct me changes in code, naming convention/ terminology used.
Comments
Post a Comment