R中的时间序列日期格式问题(Time Series date format issue in R)
我正在使用[dowjones] [1]数据集,但我想我的日期格式可能不正确,因为当我运行zoo函数来生成数据时间序列时,我得到警告:
如果“order.by”中的索引条目不是唯一的,则“zoo”对象的某些方法不起作用
我的代码:
dow = read.table('dow_jones_index.data', header=T, sep=',')
dowts = zoo(dow$close, as.Date(as.character(dow$date), format = "%m/%d/%Y"))
日期如下:2011年5月6日
我的错误是否与使用不正确的日期格式有关? 或者是其他东西?
谢谢。
编辑:
hist(dowts, xlab='close change rate', prob=TRUE, main='Histogram',ylim=c(0,.07))
hist.default出错(dowts,xlab =“close change rate”,prob = TRUE,:character(0)另外:警告消息:1:在zoo(rval [i],index(x)[i])中:如果'order.by'中的索引条目不是唯一的,则“zoo”对象的某些方法不起作用2:在pretty.default中(范围(x),n = break,min.n = 1):强制引入的NAs [1]: https : //archive.ics.uci.edu/ml/datasets/Dow+Jones+Index
I am using the [dowjones][1] dataset but I think maybe my date format is incorrect because when I run the zoo function to make the data time series I get the warning:
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
My code:
dow = read.table('dow_jones_index.data', header=T, sep=',')
dowts = zoo(dow$close, as.Date(as.character(dow$date), format = "%m/%d/%Y"))
The dates look like this: 5/6/2011
Does my error have to do with using an incorrect date format? Or something else?
Thank you.
EDIT:
hist(dowts, xlab='close change rate', prob=TRUE, main='Histogram',ylim=c(0,.07))
Error in hist.default(dowts, xlab = "close change rate", prob = TRUE, : character(0) In addition: Warning messages: 1: In zoo(rval[i], index(x)[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In pretty.default(range(x), n = breaks, min.n = 1) : NAs introduced by coercion [1]: https://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index
原文:https://stackoverflow.com/questions/35111052
更新时间:2021-03-14 14:03
最满意答案
警告消息指示的问题是您的日期值不是唯一的。 这是因为您的数据是长格式的,有多个股票。 时间序列必须是类似矩阵的结构,每列代表一个股票,每一行代表一个时间点。 随着dcast从包reshape2这个straigthforward:
library(zoo)
library(reshape2)
dow
# delete $ symbol and coerce to numeric
dow$close
tmp
dowts
The problem as the warning message indicates is that your date values are not unique. This is because your data is in long format with multiple stocks. A timeseries has to be in a matrix like structure with each column representing a stock and each row a point in time. With dcast from the package reshape2 this straigthforward:
library(zoo)
library(reshape2)
dow
# delete $ symbol and coerce to numeric
dow$close
tmp
dowts
相关问答
以下是来自几个不同包的方法。 GGPLOT2 ggplot2包在数据帧上运行得最好,因此我建议您使用数据创建数据框。 另外,不确定为什么要使用geom_freqpoly 。 我认为geom_line适用于时间序列数据。 library(ggplot2)
set.seed(123)
N
M
x
y
Date
...
警告消息指示的问题是您的日期值不是唯一的。 这是因为您的数据是长格式的,有多个股票。 时间序列必须是类似矩阵的结构,每列代表一个股票,每一行代表一个时间点。 随着dcast从包reshape2这个straigthforward: library(zoo)
library(reshape2)
dow
# delete $ symbo
...
你在找这样的东西吗? library(dplyr)
library(tidyr)
library(ggplot2)
#Create data.frame
Date
seq(as.Date("2001-05-10"), as.Date("2001-12-17"), by = 1),
...
只需排除NAs 。 在这种情况下只是第一个。 GOOG_stationary = 100 * diff(log(GOOG$GOOG.Adjusted))[-1]
summary(arma(GOOG_stationary, order = c(2,2)))
Call:
arma(x = GOOG_stationary, order = c(2, 2))
Model:
ARMA(2,2)
Residuals:
Min 1Q Median 3Q
...
假设OP在数据中缺少Timestamp变量的值并寻找填充它的方法。 在这种情况下,来自zoo包的na.approx非常方便。 # na.approx from zoo to populate missing values of Timestamp
sasan$Timestamp
sasan
# 1 2017-12-27 00:15:00 50.05
# 2 2017-
...
library(reshape2)
d = read.csv("data.csv")
d.molten = melt(d,
id.vars=c("Facility.ID", "Facility.Name", "State", "Utility.Type", "Supplier", "Account.No.", "Unit.Name"),
variable.name = "Date"
)
melt函数将“宽”格式(具有未定义的列数)分解为“长”格式,其中每行是观察。 这实际上是你在R中
...
我们需要指定format as.Date(df$date, "%d/%m/%Y")
We need to specify the format as.Date(df$date, "%d/%m/%Y")
尝试将其作为动物园对象读取,然后转换: Lines
2012-09-12,5
2012-09-13,10
"
library(zoo)
# replace first argument with: file="C:things.csv"
z
x
Try reading it in as a zoo object and then c
...
也许是这样的: z = ts(dat, start =1, end =8, frequency = 1)
class(z)
#[1] "2017-09-25 13:34:53 GMT" "2017-09-25 13:56:43 GMT" "2017-09-25 14:33:40 GMT" "2017-09-25 14:34:24 GMT" "2017-09-25 14:43:33 GMT" "2017-09-25 15:34:53 GMT" "2
...
使用dplyr,您可以获得如下每月费率: library(dplyr)
report %
group_by(sale_month) %>%
summarise(retention30 = round(sum(is.na(days_to_cancel) | days_to_cancel > 30) / n() * 100, 1),
rentention60 = round(sum(is.na(days_to_cancel) | days_to_canc
...