복붙노트

[PYTHON] 날짜를 기반으로 한 R 또는 Python의 요소 값 붙여 넣기 - 학교 간 나누기 만들기

PYTHON

날짜를 기반으로 한 R 또는 Python의 요소 값 붙여 넣기 - 학교 간 나누기 만들기

다음 데이터 집합 (Break_data)이 학교 달력에서 시작하여 휴식 시간으로 수집되었습니다.

 print(Break_data)

 Start        End          Break       Year
1 2016-02-24 2016-02-29   Spring_Break 2016
2 2016-03-23 2016-03-28  Easter_Recess 2016
3 2016-10-05 2016-10-10 Mid_Term_Break 2016
4 2017-03-01 2017-03-06   Spring_Break 2017
5 2017-04-12 2017-04-17  Easter_Recess 2017
6 2017-10-04 2017-10-09 Mid_Term_Break 2017
7 2018-02-28 2018-03-05   Spring_Break 2018
8 2018-03-28 2018-04-02  Easter_Recess 2018
head(df$date)
[1] "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05" "2016-02-05"

tail(df$date)
[1] "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12" "2018-07-12"

다음 단계에서 제공되는 단계를 따르십시오. https : //stackoverflow.com/a/51052626/9341589

나는 데이터 팩 df의 범위 (즉, 2016-02-05에서 2018-07-12까지의 많은 변수를 포함)와 비교함으로써 유사한 요인 변수 Break를 만들고 싶다. - 샘플링 간격은 15 분이다. 96 행).

필자의 경우이 표에 표시된 값 외에도이 날짜의 시작과 끝에 속하지 않는 값을 0 일이 아닌 것으로 간주하도록하고 싶습니다.

위에 언급 된 링크의 단계에 따라 R 코드의 수정 된 버전입니다.

Break_data$Start <- ymd(Break_data$Start)
Break_data$End <- ymd(Break_data$End)
df$date <- ymd(df$date)

LU <-  Map(`:`, Break_data$Start, Break_data$End)
LU <- data.frame(value = unlist(LU),
                 index = rep(seq_along(LU), lapply(LU, length)))

df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]]

나는 이것에 덧붙여 for 루프 나 단순한 If 함수에서 Non_Break를 시작과 끝 범위에없는 시간에 제공해야한다고 생각한다.

편집하다: 나는 두 가지 방법으로 시도했다.

FIRST - 매핑을 사용하지 않고

for (i in c(1:nrow(df))){
  if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29")
    df$Break[i]<-"Spring_Break"
  else if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28")
    df$Break[i]<-"Easter_Recess"
  else if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10")
    df$Break[i]<-"Mid_Term_Break"
  else if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06")
    df$Break[i]<-"Spring_Break"
  else if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17")
    df$Break[i]<-"Easter_Recess"
  else if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09")
    df$Break[i]<-"Mid_Term_Break"
  else if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05")
    df$Break[i]<-"Easter_Recess"
  else if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02")
    df$Break[i]<-"Easter_Recess"
  else (df$Break[i]<-"Not_Break")
}

첫 번째는 영원히 달리고 있습니다 :) 그리고 나는 Not_Break와 Spring_Break의 2 가지 값을 얻고 있습니다.

그리고 이것은 경고 메시지입니다.

Warning messages:
1: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
2: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
3: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
4: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
5: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
6: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
7: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
8: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
  the condition has length > 1 and only the first element will be used
9: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
10: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
11: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
12: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
13: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
14: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
15: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
16: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
  the condition has length > 1 and only the first element will be used
17: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
18: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
19: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
20: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
21: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
22: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
23: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
24: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
  the condition has length > 1 and only the first element will be used
25: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
26: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
27: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
28: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
29: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
30: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
31: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
32: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
  the condition has length > 1 and only the first element will be used
33: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
34: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
35: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
36: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
37: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
38: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
39: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
40: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
  the condition has length > 1 and only the first element will be used
41: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
42: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
43: In if (df$date[i] >= "2016-10-05" & df$date <= "2016-10-10") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
44: In if (df$date[i] >= "2017-03-01" & df$date <= "2017-03-06") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
45: In if (df$date[i] >= "2017-04-12" & df$date <= "2017-04-17") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
46: In if (df$date[i] >= "2017-10-04" & df$date <= "2017-10-09") df$Break[i] <- "Mid_Term_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
47: In if (df$date[i] >= "2018-02-28" & df$date <= "2018-03-05") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
48: In if (df$date[i] >= "2018-03-28" & df$date <= "2018-04-02") df$Break[i] <- "Easter_Recess" else (df$Break[i] <- "Not_Break") :
  the condition has length > 1 and only the first element will be used
49: In if (df$date[i] >= "2016-02-24" & df$date <= "2016-02-29") df$Break[i] <- "Spring_Break" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used
50: In if (df$date[i] >= "2016-03-23" & df$date <= "2016-03-28") df$Break[i] <- "Easter_Recess" else if (df$date[i] >=  ... :
  the condition has length > 1 and only the first element will be used

두 번째 - 링크의 코드에 추가 :

LU <-  Map(`:`, Break_data$Start, Break_data$End)
LU <- data.frame(value = unlist(LU),
                 index = rep(seq_along(LU), lapply(LU, length)))

for (i in c(1:nrow(df))){
  if (df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]])
  else (df$date[i] >= "2016-02-05" & df$date <= "2018-07-12")
  df$Break[i]<-"Not_Break"
}

두 번째도 오류가 발생합니다. 코드 또는 구현 (R 또는 Python에서) 수정

이 일을하는 더 효율적인 방법이 있습니까?

참고 : 데이터 세트는 https://github.com/tomiscat/data에서 공개적으로 볼 수 있습니다.

해결법

  1. ==============================

    1.

    library(lubridate)
    
    # data
    Break_data <- data.table::fread(
    " Start        End          Break       Year
     2016-02-24 2016-02-29   Spring_Break 2016
     2016-03-23 2016-03-28  Easter_Recess 2016
     2016-10-05 2016-10-10 Mid_Term_Break 2016
     2017-03-01 2017-03-06   Spring_Break 2017
     2017-04-12 2017-04-17  Easter_Recess 2017
     2017-10-04 2017-10-09 Mid_Term_Break 2017
     2018-02-28 2018-03-05   Spring_Break 2018
     2018-03-28 2018-04-02  Easter_Recess 2018"
    )
    df <- data.frame(
      date = c("2016-02-05","2016-02-05", "2016-02-05" ,"2016-02-05", "2016-02-05", "2016-02-05",
               "2016-02-26", "2016-10-07", "2018-03-30",
                "2018-07-12","2018-07-12", "2018-07-12", "2018-07-12", "2018-07-12" ,"2018-07-12")
    )
    
    # mapping
    
    Break_data$Start <- ymd(Break_data$Start)
    Break_data$End <- ymd(Break_data$End)
    df$date <- ymd(df$date)
    LU <-  Map(`:`, Break_data$Start, Break_data$End)
    LU <- data.frame(value = unlist(LU),
                     index = rep(seq_along(LU), lapply(LU, length)))
    df$Break <- Break_data$Break[LU$index[match(df$date, LU$value)]]
    
    
    # if not mapped(df$Break ==NA), then set it to "Non_break"
    df$Break <- ifelse(is.na(df$Break), "Non_Break", df$Break)
    df$Break <- factor(df$Break)
    df
    #>          date          Break
    #> 1  2016-02-05      Non_Break
    #> 2  2016-02-05      Non_Break
    #> 3  2016-02-05      Non_Break
    #> 4  2016-02-05      Non_Break
    #> 5  2016-02-05      Non_Break
    #> 6  2016-02-05      Non_Break
    #> 7  2016-02-26   Spring_Break
    #> 8  2016-10-07 Mid_Term_Break
    #> 9  2018-03-30  Easter_Recess
    #> 10 2018-07-12      Non_Break
    #> 11 2018-07-12      Non_Break
    #> 12 2018-07-12      Non_Break
    #> 13 2018-07-12      Non_Break
    #> 14 2018-07-12      Non_Break
    #> 15 2018-07-12      Non_Break
    

    reprex 패키지 (v0.2.0)에서 2018-08-19에 생성되었습니다.

    편집 : 전체 솔루션

  2. from https://stackoverflow.com/questions/51887163/pasting-factor-variable-values-in-r-or-python-based-on-date-creating-school-br by cc-by-sa and MIT license