[PYTHON] 날짜를 기반으로 한 R 또는 Python의 요소 값 붙여 넣기 - 학교 간 나누기 만들기
PYTHON날짜를 기반으로 한 R 또는 Python의 요소 값 붙여 넣기 - 학교 간 나누기 만들기
다음 데이터 집합 (Break_data)이 학교 달력에서 시작하여 휴식 시간으로 수집되었습니다.
print(Break_data)
Start End Break Year
1 2016-02-24 2016-02-29 Spring_Break 2016
2 2016-03-23 2016-03-28 Easter_Recess 2016
3 2016-10-05 2016-10-10 Mid_Term_Break 2016
4 2017-03-01 2017-03-06 Spring_Break 2017
5 2017-04-12 2017-04-17 Easter_Recess 2017
6 2017-10-04 2017-10-09 Mid_Term_Break 2017
7 2018-02-28 2018-03-05 Spring_Break 2018
8 2018-03-28 2018-04-02 Easter_Recess 2018
head(df$date)
[1] "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05"
tail(df$date)
[1] "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12"
다음 단계에서 제공되는 단계를 따르십시오. https : //stackoverflow.com/a/51052626/9341589
나는 데이터 팩 df의 범위 (즉, 2016-02-05에서 2018-07-12까지의 많은 변수를 포함)와 비교함으로써 유사한 요인 변수 Break를 만들고 싶다. - 샘플링 간격은 15 분이다. 96 행).
필자의 경우이 표에 표시된 값 외에도이 날짜의 시작과 끝에 속하지 않는 값을 0 일이 아닌 것으로 간주하도록하고 싶습니다.
위에 언급 된 링크의 단계에 따라 R 코드의 수정 된 버전입니다.
Break_data$Start <- ymd(Break_data$Start)
Break_data$End <- ymd(Break_data$End)
df$date <- ymd(df$date)
LU <- Map(`:`, Break_data$Start, Break_data$End)
LU <- data.frame(value = unlist(LU),
index = rep(seq_along(LU), lapply(LU, length)))
df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]]
나는 이것에 덧붙여 for 루프 나 단순한 If 함수에서 Non_Break를 시작과 끝 범위에없는 시간에 제공해야한다고 생각한다.
편집하다: 나는 두 가지 방법으로 시도했다.
FIRST - 매핑을 사용하지 않고
for (i in c(1:nrow(df))){
if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29")
df$Break[i]<-"Spring_Break"
else if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28")
df$Break[i]<-"Easter_Recess"
else if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10")
df$Break[i]<-"Mid_Term_Break"
else if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06")
df$Break[i]<-"Spring_Break"
else if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17")
df$Break[i]<-"Easter_Recess"
else if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09")
df$Break[i]<-"Mid_Term_Break"
else if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05")
df$Break[i]<-"Easter_Recess"
else if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02")
df$Break[i]<-"Easter_Recess"
else (df$Break[i]<-"Not_Break")
}
첫 번째는 영원히 달리고 있습니다 :) 그리고 나는 Not_Break와 Spring_Break의 2 가지 값을 얻고 있습니다.
그리고 이것은 경고 메시지입니다.
Warning messages:
1: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
2: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
3: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
4: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
5: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
6: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
7: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
8: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
9: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
10: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
11: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
12: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
13: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
14: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
15: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
16: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
17: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
18: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
19: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
20: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
21: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
22: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
23: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
24: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
25: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
26: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
27: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
28: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
29: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
30: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
31: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
32: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
33: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
34: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
35: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
36: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
37: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
38: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
39: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
40: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
41: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
42: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
43: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
44: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
45: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
46: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
47: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
48: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
the condition has length > 1 and only the first element will be used
49: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
50: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >= ... :
the condition has length > 1 and only the first element will be used
두 번째 - 링크의 코드에 추가 :
LU <- Map(`:`, Break_data$Start, Break_data$End)
LU <- data.frame(value = unlist(LU),
index = rep(seq_along(LU), lapply(LU, length)))
for (i in c(1:nrow(df))){
if (df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]])
else (df$date[i] >= "2016-02-05" & df$date <= "2018-07-12")
df$Break[i]<-"Not_Break"
}
두 번째도 오류가 발생합니다. 코드 또는 구현 (R 또는 Python에서) 수정
이 일을하는 더 효율적인 방법이 있습니까?
참고 : 데이터 세트는 https://github.com/tomiscat/data에서 공개적으로 볼 수 있습니다.
해결법
-
==============================
1.
library(lubridate) # data Break_data <- data.table::fread( " Start End Break Year 2016-02-24 2016-02-29 Spring_Break 2016 2016-03-23 2016-03-28 Easter_Recess 2016 2016-10-05 2016-10-10 Mid_Term_Break 2016 2017-03-01 2017-03-06 Spring_Break 2017 2017-04-12 2017-04-17 Easter_Recess 2017 2017-10-04 2017-10-09 Mid_Term_Break 2017 2018-02-28 2018-03-05 Spring_Break 2018 2018-03-28 2018-04-02 Easter_Recess 2018" ) df <- data.frame( date = c("2016-02-05","2016-02-05", "2016-02-05" ,"2016-02-05", "2016-02-05", "2016-02-05", "2016-02-26", "2016-10-07", "2018-03-30", "2018-07-12","2018-07-12", "2018-07-12", "2018-07-12", "2018-07-12" ,"2018-07-12") ) # mapping Break_data$Start <- ymd(Break_data$Start) Break_data$End <- ymd(Break_data$End) df$date <- ymd(df$date) LU <- Map(`:`, Break_data$Start, Break_data$End) LU <- data.frame(value = unlist(LU), index = rep(seq_along(LU), lapply(LU, length))) df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]] # if not mapped(df$Break ==NA), then set it to "Non_break" df$Break <- ifelse(is.na(df$Break), "Non_Break", df$Break) df$Break <- factor(df$Break) df #> date Break #> 1 2016-02-05 Non_Break #> 2 2016-02-05 Non_Break #> 3 2016-02-05 Non_Break #> 4 2016-02-05 Non_Break #> 5 2016-02-05 Non_Break #> 6 2016-02-05 Non_Break #> 7 2016-02-26 Spring_Break #> 8 2016-10-07 Mid_Term_Break #> 9 2018-03-30 Easter_Recess #> 10 2018-07-12 Non_Break #> 11 2018-07-12 Non_Break #> 12 2018-07-12 Non_Break #> 13 2018-07-12 Non_Break #> 14 2018-07-12 Non_Break #> 15 2018-07-12 Non_Break
reprex 패키지 (v0.2.0)에서 2018-08-19에 생성되었습니다.
편집 : 전체 솔루션
from https://stackoverflow.com/questions/51887163/pasting-factor-variable-values-in-r-or-python-based-on-date-creating-school-br by cc-by-sa and MIT license
'PYTHON' 카테고리의 다른 글
[PYTHON] 파이썬 대 Cpython (0) | 2018.10.09 |
---|---|
[PYTHON] Project Euler와의 속도 비교 : C vs Python vs. Erlang vs Haskell (0) | 2018.10.09 |
[PYTHON] 새 줄을 삽입하지 않고도 사용자 입력을받을 수 있습니까? (0) | 2018.10.09 |
[PYTHON] 이름을 가진 모듈을 가져 오는 방법? (0) | 2018.10.09 |
[PYTHON] 개체가 파이썬에서 가비지 수집되는시기는 언제입니까? (0) | 2018.10.09 |