Theme elements: Axis

本文介绍使用ggplot2进行数据可视化时,如何调整轴的各种属性,包括文字颜色、大小、角度,线条类型及厚度等,使图表更加美观且易于理解。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

主题:axis

基本知识点

ElementSetterDesceription
axis.lineelement_line()line parallel to axis (hidden in default themes)
axis.textelement_text()tick labels
axis.text.xelement_text()x-axis tick labels
axis.text.yelement_text()y-axis tick labels
axis.titleelement_text()axis titles
axis.title.xelement_text()x-axis title
axis.title.yelement_text()y-axis title
axis.tickselement_line()axis tick marks
axis.ticks.lengthunit()length of tick marks

实践绘图

element_text():文字属性,如color、size等。
element_line():线条属性,如linetype, size等。

df <- data.frame(x = 1:100, y = sample(c(1:100),100))
base <- ggplot(df, aes(x, y)) + geom_point()
base

base + 
  ylab("This is y axis title")+
  xlab("This is x axis title")+
  theme(
    axis.text.x = element_text(color = "blue", size = 12),
    axis.text.y = element_text(colour = "red", size = 16),
    axis.title.x = element_text(face = "bold", colour = "red", size = 12),
    axis.title.y = element_text(face = "italic", colour = "green", size = 16),
    axis.line = element_line(colour = "blue", size = 4, linetype = 3),
    axis.ticks.length = unit(0.4,"in"),
    axis.ticks = element_line(colour = "purple", size = 2, linetype = 3)
  )

在这里插入图片描述
x axis 角度(angle) 在科研绘图时经常用到,这里参考官方文件推荐设置方式,负角度

The most common adjustment is to rotate the x-axis labels to avoid long overlapping labels. If you do this, note negative angles tend to look best and you should set hjust = 0 and vjust = 1


base + 
  theme(
    axis.text.x = element_text(color = "blue", size = 12, angle = -45, hjust = 0, vjust = 1)
  )

在这里插入图片描述

小结

1、设置的对象是字体,就是element_text()
2、设置的对象是线条,就是element_line()
3、注意tick label 和title的区别

好的,我可以为您提供将该项目的Python代码转换为R语言代码的实现过程,以下是详细步骤: 1. 导入数据: Python代码: ```python train_df = pd.read_csv('../input/titanic/train.csv') test_df = pd.read_csv('../input/titanic/test.csv') combine = [train_df, test_df] ``` R语言代码: ```R train_df <- read.csv("../input/titanic/train.csv", header = TRUE) test_df <- read.csv("../input/titanic/test.csv", header = TRUE) combine <- list(train_df, test_df) ``` 2. 数据清洗和特征工程: Python代码: ```python # 填充缺失值 for dataset in combine: dataset['Age'].fillna(dataset['Age'].median(), inplace=True) dataset['Embarked'].fillna(dataset['Embarked'].mode()[0], inplace=True) dataset['Fare'].fillna(dataset['Fare'].median(), inplace=True) # 将分类变量转换为数值变量 for dataset in combine: dataset['Sex'] = dataset['Sex'].map({'female': 1, 'male': 0}).astype(int) dataset['Embarked'] = dataset['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int) # 创建新特征 for dataset in combine: dataset['FamilySize'] = dataset['SibSp'] + dataset['Parch'] + 1 dataset['IsAlone'] = 0 dataset.loc[dataset['FamilySize'] == 1, 'IsAlone'] = 1 # 删除无用特征 drop_elements = ['PassengerId', 'Name', 'Ticket', 'Cabin', 'SibSp', 'Parch', 'FamilySize'] train_df = train_df.drop(drop_elements, axis=1) test_df = test_df.drop(drop_elements, axis=1) ``` R语言代码: ```R # 填充缺失值 for (dataset in combine) { dataset$Age[is.na(dataset$Age)] <- median(dataset$Age, na.rm = TRUE) dataset$Embarked[is.na(dataset$Embarked)] <- names(which.max(table(dataset$Embarked))) dataset$Fare[is.na(dataset$Fare)] <- median(dataset$Fare, na.rm = TRUE) } # 将分类变量转换为数值变量 for (dataset in combine) { dataset$Sex <- as.integer(factor(dataset$Sex, levels = c("male", "female"))) dataset$Embarked <- as.integer(factor(dataset$Embarked, levels = c("S", "C", "Q"))) } # 创建新特征 for (dataset in combine) { dataset$FamilySize <- dataset$SibSp + dataset$Parch + 1 dataset$IsAlone <- 0 dataset$IsAlone[dataset$FamilySize == 1] <- 1 } # 删除无用特征 drop_elements <- c("PassengerId", "Name", "Ticket", "Cabin", "SibSp", "Parch", "FamilySize") train_df <- train_df[, !(names(train_df) %in% drop_elements)] test_df <- test_df[, !(names(test_df) %in% drop_elements)] ``` 3. 数据可视化: Python代码: ```python import matplotlib.pyplot as plt import seaborn as sns # 绘制直方图 sns.histplot(train_df['Age'], kde=False) plt.show() # 绘制条形图 sns.barplot(x='Sex', y='Survived', data=train_df) plt.show() # 绘制热力图 corr = train_df.corr() sns.heatmap(corr, annot=True, cmap='coolwarm') plt.show() ``` R语言代码: ```R library(ggplot2) library(reshape2) # 绘制直方图 ggplot(train_df, aes(x = Age)) + geom_histogram(binwidth = 5, fill = "lightblue", col = "black") + labs(title = "Age Distribution", x = "Age", y = "Count") # 绘制条形图 ggplot(train_df, aes(x = Sex, y = Survived, fill = factor(Sex))) + geom_bar(stat = "summary", fun = mean, position = "dodge") + scale_fill_manual(values = c("lightblue", "pink"), name = "Sex") + labs(title = "Survival Rate by Sex", x = "Sex", y = "Survival Rate") # 绘制热力图 cor_matrix <- cor(train_df) melted_cor_matrix <- melt(cor_matrix) ggplot(melted_cor_matrix, aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_gradient2(low = "lightblue", mid = "white", high = "pink") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(title = "Correlation Matrix") ``` 4. 建立模型: Python代码: ```python from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier # 数据预处理 X_train = train_df.drop('Survived', axis=1) y_train = train_df['Survived'] X_test = test_df.drop('Survived', axis=1) # 逻辑回归模型 logreg = LogisticRegression() logreg_scores = cross_val_score(logreg, X_train, y_train, cv=10) print('Logistic Regression Accuracy: {:.2f}%'.format(logreg_scores.mean()*100)) # 随机森林模型 rf = RandomForestClassifier(n_estimators=100) rf_scores = cross_val_score(rf, X_train, y_train, cv=10) print('Random Forest Accuracy: {:.2f}%'.format(rf_scores.mean()*100)) ``` R语言代码: ```R library(caret) # 数据预处理 X_train <- train_df[, !(names(train_df) %in% c("Survived"))] y_train <- train_df$Survived X_test <- test_df[, !(names(test_df) %in% c("Survived"))] # 逻辑回归模型 logreg_model <- train(x = X_train, y = y_train, method = "glm", family = "binomial") logreg_scores <- logreg_model$results$Accuracy print(paste0("Logistic Regression Accuracy: ", round(mean(logreg_scores)*100, 2), "%")) # 随机森林模型 rf_model <- train(x = X_train, y = y_train, method = "rf", ntree = 100) rf_scores <- rf_model$results$Accuracy print(paste0("Random Forest Accuracy: ", round(mean(rf_scores)*100, 2), "%")) ``` 以上是将该项目的Python代码转换为R语言代码的过程,您可以在Kaggle的R语言环境中运行这段代码,完成数据处理、可视化和建模的过程。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值