python处理出租车轨迹数据_1-出租车数据的基础处理，由gps生成OD（pandas）.ipynb...

最新推荐文章于 2024-02-21 16:26:17 发布

VIP文章 weixin_39765840

最新推荐文章于 2024-02-21 16:26:17 发布

阅读量1.5k

点赞数

文章标签： python处理出租车轨迹数据

{

"cells": [

{

"cell_type": "markdown",

"metadata": {},

"source": [

"在这个教程中，你将会学到如何使用python的pandas包对出租车GPS数据进行数据清洗，识别出行OD\n",

"\n",

提供的基础数据是：

数据：
\n",

" 1.出租车原始GPS数据(在data-sample文件夹下，原始数据集的抽样500辆车的数据)

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"[pandas包的简介](https://baike.baidu.com/item/pandas/17209606?fr=aladdin)"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 读取数据"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"首先，读取出租车数据。"

]

{

"cell_type": "code",

"execution_count": 2,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:53.552930Z",

"start_time": "2020-01-18T04:51:52.397018Z"

}

"outputs": [],

"source": [

"import pandas as pd\n",

"#读取数据\n",

"data = pd.read_csv(r'data-sample/TaxiData-Sample',header = None)\n",

"#给数据命名列\n",

"data.columns = ['VehicleNum', 'Stime', 'Lng', 'Lat', 'OpenStatus', 'Speed']"

]

{

"cell_type": "code",

"execution_count": 3,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:58.299239Z",

"start_time": "2020-01-18T04:51:58.271312Z"

}

"outputs": [

{

"data": {

"text/html": [

\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

" \n",

\n",

VehicleNum\n",

Stime\n",

Lng\n",

Lat\n",

OpenStatus\n",

Speed\n",

\n",

0\n",

22271\n",

22:54:04\n",

114.167000\n",

22.718399\n",

0\n",

\n",

1\n",

22271\n",

18:26:26\n",

114.190598\n",

22.647800\n",

0\n",

4\n",

\n",

2\n",

22271\n",

18:35:18\n",

114.201401\n",

22.649700\n",

0\n",

\n",

3\n",

22271\n",

16:02:46\n",

114.233498\n",

22.725901\n",

0\n",

24\n",

\n",

4\n",

22271\n",

21:41:17\n",

114.233597\n",

22.720900\n",

0\n",

19\n",

\n",

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

"execution_count": 3,

"metadata": {},

"output_type": "execute_result"

}

"source": [

"#显示数据的前5行\n",

"data.head(5)"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的格式：\n",

"\n",

">VehicleNum —— 车牌 \n",

"Stime —— 时间 \n",

"Lng —— 经度 \n",

"Lat —— 纬度 \n",

"OpenStatus —— 是否有乘客(0没乘客，1有乘客) \n",

"Speed —— 速度 "

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 基础的数据操作"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## DataFrame和Series"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"DataFrame和Series\n",

"\n",

" > 当我们读一个数据的时候，我们读进来的就是DataFrame格式的数据表，而一个DataFrame中的每一列，则为一个Series \n",

" 也就是说，DataFrame由多个Series组成\n"

]

{

"cell_type": "code",

"execution_count": 87,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:25.713432Z",

"start_time": "2020-01-18T04:52:25.708450Z"

}

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

"execution_count": 87,

"metadata": {},

"output_type": "execute_result"

}

"source": [

"type(data)"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"如果我们想取DataFrame的某一列，想得到的是Series，那么直接用以下代码\n",

"\n",

" > data[列名]"

]

{

"cell_type": "code",

"execution_count": 88,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:32.097575Z",

"start_time": "2020-01-18T04:52:32.090592Z"

}

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.series.Series"

]

"execution_count": 88,

"metadata": {},

"output_type": "execute_result"

}

"source": [

"type(data['Lng'])"

]

{

"cell_type": "markdown",

"metadata": {

"ExecuteTime": {

"end_time": "2019-09-06T09:22:43.642625Z",

"start_time": "2019-09-06T09:22:43.638487Z"

}

"source": [

"如果我们想取DataFrame的某一列或者某几列，想得到的是DataFrame，那么直接用以下代码\n",

"\n",

"> data2[[列名,列名]]"

]

{

"cell_type": "code",

"execution_count": 89,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:33.013124Z",

"start_time": "2020-01-18T04:52:32.990186Z"

}

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

"execution_count": 89,

"metadata": {},

"output_type": "execute_result"

}

"source": [

"type(data[['Lng']])"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## 数据的筛选"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的筛选:\n",

"\n",

" 在筛选数据的时候，我们一般用data[条件]的格式\n",

" 其中的条件，是对data每一行数据的true和false布尔变量的Series"

]

{

"cell_type": "markdown",

"metadata": {},

"source": [

" 例如，我们想得到车牌照为22271的所有数据\n",

" 首先我们要获得一个布尔变量的Series，这个Series对应的是data的每一行，如果车牌照为\"粤B4H2K8\"则为true，不是则为false\n",

" 这样子的Series很容易获得，只需要\n",

" data['VehicleNum']==22271"

]

{

"cell_type": "code",

"execution_count": 90,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:44.078571Z",

"start_time": "2020-01-18T04:52:44.049646Z"

}

"outputs": [

{

"data": {

"text/plain": [

"0 True\n",

"1 True\n",

"2 True\n",

"3 True\n",

"4 True\n",

"Name: VehicleNum, dtype: bool"

]

"execution_count": 90,

"metadata": {},

"output_type": "execute_result"

}

"source": [

"(data['VehicleNum']==22271).head(5)"

]

{

"cell_type": "code",

"execution_count": 92,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:51.723416Z",

"start_time": "2020-01-18T04:52:51.688510Z"

}

"outputs": [

{

"data": {

"text/html": [

\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

" \n",

\n",

VehicleNum\n",

Stime\n",

Lng\n",

Lat\n",

OpenStatus\n",

Speed\n",

\n",

0\n",

22271\n",

22:54:04\n",

114.167000\n",

22.718399\n",

0\n",

\n",

1\n",

22271\n",

18:26:26\n",

114.190598\n",

22.647800\n",

0\n",

4\n",

\n",

2\n",

22271\n",

18:35:18\n",

114.201401\n",

22.649700\n",

0\n",

\n",

3\n",

22271\n",

16:02:46\n",

114.233498\n",

22.725901\n",

0\n",

24\n",

\n",

4\n",

22271\n",

21:41:17\n",

114.233597\n",

22.720900\n",

0\n",

19\n",

\n",

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

"execution_count": 92,

"metadata": {},

"output_type": "execute_result"

}

"source": [

"#得到车牌照为22271的所有数据\n",

"data[data['VehicleNum']==22271].head(5)"

]

{

"cell_type": "markdown",

"metadata": {},

"sou

最低0.47元/天解锁文章

weixin_39765840

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
python处理出租车轨迹数据_1-出租车数据的基础处理，由gps生成OD（pandas）.ipynb...

{"cells": [{"cell_type": "markdown","metadata": {},"source": ["在这个教程中，你将会学到如何使用python的pandas包对出租车GPS数据进行数据清洗，识别出行OD\n","\n","提供的基础数据是：数据： \n"," 1.出租车原始GPS数据(在data-sample文件夹下，原始数据集的抽样500辆车的数据)"...
复制链接

扫一扫