r - preparing data for analysis -
i new here , new r , statistic in general. got simple 1million rows of data in csv format. there 4 columns: col1 - location col2 - someone's name col3 - date visit col4 - time of visit
when importing r translated data frame , columns character (i use str() find structure of imported data , class() thats why know data.frame.
as see none of them numeric, want able aggregation e.g count number of visits person, day, time location or vice versa.
do need manipulate data outside r e.g import sql , aggregation there or can in r?
i hope can guide me in right direction... many peddie
i suggest getting familiar plyr package.
install.packages("plyr")
it ask choose place download from, choose closest 1 you. load library
library(plyr)
ok lets have data frame looks this
> df name day location 52 jake wed mi 25 sally tue ny 38 sue fri ny 45 sally tue mi 42 sue mon mi 17 sally fri ca 28 jake tue ny 14 sue thur ca 47 jim tue mi 67 jim tue al
we can ask how many times each location visited
> ddply(df, .(location), summarise, count=length(name)) location count 1 ca 2 2 ny 3 3 mi 4 4 al 1
or how many people visited location on particular day
> ddply(df, .(location, day), summarise, count=length(name)) location day count 1 ca thur 1 2 ca fri 1 3 ny tue 2 4 ny fri 1 5 mi mon 1 6 mi tue 2 7 mi wed 1 8 al tue 1
you should thorough tutorial on plyr, commands above going on ddply splitting data frame unique combinations of values in columns specify , summarizing data based on function (in our case length) specify.
i hope helps.
Comments
Post a Comment