python influxdb 读写dataframe

一、tags设定

Python pandas dataframe to Influxdb with column and other tags

AUTHOR manish DATE August 6, 2020

Python pandas is very powerful data processing library. It can take large data from variety of sources (CSVs, databases, excel etc) and process it in memory.

After processing the data you may have to dump it in influxdb. In influxdb each measurement has tags and fields. While writing pandas dataframe to influxdb we can define these tags.

We can use specific columns of data frame as tags along with some default tags. Let us understand this with a simple implementation.

Consider following CSV, sample.csv, having schools data of a City, State, country. Which means city, state and country are fixed tag and additionally we will make “School”, “Class” as dataframe row tags.


TimeStamp,School,Class,Students,Teachers,Absent
5/9/20 5:45:00 PM GMT,USSMS,UKG,30,2,3
5/9/20 5:45:00 PM GMT,USSMS,10,50,3,2
5/9/20 5:45:00 PM GMT,USSMS,8,40,2,0
5/9/20 6:45:00 PM GMT,St Thomas,Nur,30,2,4
5/9/20 6:45:00 PM GMT,St Thomas,6,55,3,4
5/9/20 7:45:00 PM GMT,St Thomas,9,34,2,2

Now let us look at the code, pyinflx.py.


#!/usr/bin/python3
import pandas as pd
import json
from influxdb import DataFrameClient

dbhost = 'localhost'
dbport = 8086
dbuser = 'admin'
dbpasswd = 'xxxxx'
dbname = 'schooldb'
protocol = 'line'

# Use only following fields from CSV. 
Fields = ['TimeStamp','School','Class','Students','Absent']
# Define tag fields
datatags = ['School','Class']
# Define fixed tags
fixtags = {"Country": "India", "State": "Haryana", "City": "Kurukshetra"}

# Read data from CSV without index and parse 'TimeStamp' as date.
df = pd.read_csv("sample.csv", sep=',', index_col=False, parse_dates=['TimeStamp'], usecols=Fields)

# Set 'TimeStamp' field as index of dataframe
df.set_index('TimeStamp', inplace = True)

print(df.head())

client = DataFrameClient(dbhost, dbport, dbuser, dbpasswd, dbname)
# Write data to "SchoolData" measurement of "schooldb" database.
client.write_points(df,"SchoolData",tags=fixtags,tag_columns=datatags,protocol=protocol)

Fire the code !


./pyinflx.py
                              School Class  Students  Absent
TimeStamp
2020-05-09 17:45:00+00:00      USSMS   UKG        30       3
2020-05-09 17:45:00+00:00      USSMS    10        50       2
2020-05-09 17:45:00+00:00      USSMS     8        40       0
2020-05-09 18:45:00+00:00  St Thomas   Nur        30       4
2020-05-09 18:45:00+00:00  St Thomas     6        55       4

Check data in influxdb


> select * from SchoolData
name: SchoolData
time                 Absent City        Class Country School    State   Students
----                 ------ ----        ----- ------- ------    -----   --------
2020-05-09T17:45:00Z 2      Kurukshetra 10    India   USSMS     Haryana 50
2020-05-09T17:45:00Z 3      Kurukshetra UKG   India   USSMS     Haryana 30
2020-05-09T17:45:00Z 0      Kurukshetra 8     India   USSMS     Haryana 40
2020-05-09T18:45:00Z 4      Kurukshetra 6     India   St Thomas Haryana 55
2020-05-09T18:45:00Z 4      Kurukshetra Nur   India   St Thomas Haryana 30
2020-05-09T19:45:00Z 2      Kurukshetra 9     India   St Thomas Haryana 34

Check tags in influxdb


> show tag keys
name: SchoolData
tagKey
------
City
Class
Country
School
State

二、设置dataframe的index为influx的time格式

"""
Import VIX - CBOE Volatility Index - from "vix-daily.csv" file into InfluxDB 2.0
https://datahub.io/core/finance-vix#data
"""
from collections import OrderedDict
from csv import DictReader

import ciso8601
import requests
import rx
from pytz import UTC
from rx import operators as ops

from influxdb_client import InfluxDBClient, WriteOptions
from influxdb_client.client.write.point import EPOCH

_progress = 0


def parse_row(row: OrderedDict):
    """Parse row of CSV file into LineProtocol with structure:
    CSV format:
        date,symbol,open,close,low,high,volume
        2016-01-05,WLTW,123.43,125.839996,122.309998,126.25,2163600.0
        2016-01-06,WLTW,125.239998,119.980003,119.940002,125.540001,2386400.0
        2016-01-07,WLTW,116.379997,114.949997,114.93,119.739998,2489500.0
        2016-01-08,WLTW,115.480003,116.620003,113.5,117.440002,2006300.0
        2016-01-11,WLTW,117.010002,114.970001,114.089996,117.330002,1408600.0
        2016-01-12,WLTW,115.510002,115.550003,114.5,116.059998,1098000.0
        2016-01-13,WLTW,116.459999,112.849998,112.589996,117.07,949600.0
        ...
    :param row: the row of CSV file
    :return: Parsed csv row to LineProtocol
    """
    global _progress
    _progress += 1

    if _progress % 10000 == 0:
        print(_progress)

    time = (UTC.localize(ciso8601.parse_datetime(row["date"])) - EPOCH).total_seconds() * 1e9

    return f'financial-analysis,symbol={row["symbol"]} ' \
           f'close={row["close"]},high={row["high"]},low={row["low"]},open={row["open"]} ' \
           f'{int(time)}'


def main():
    parse_row.progress = 0

    url = "https://github.com/influxdata/influxdb-client-python/wiki/data/stock-prices-example.csv"
    response = requests.get(url, stream=True)
    data = rx \
        .from_iterable(DictReader(response.iter_lines(decode_unicode=True))) \
        .pipe(ops.map(lambda row: parse_row(row)))

    client = InfluxDBClient(url="http://localhost:8086", token="my-token", org="my-org", debug=False)
    write_api = client.write_api(write_options=WriteOptions(batch_size=50_000, flush_interval=10_000))

    write_api.write(bucket="my-bucket", record=data)
    write_api.close()

    query = '''
    from(bucket:"my-bucket")
            |> range(start: 0, stop: now())
            |> filter(fn: (r) => r._measurement == "financial-analysis")
            |> filter(fn: (r) => r.symbol == "AAPL")
            |> filter(fn: (r) => r._field == "close")
            |> drop(columns: ["_start", "_stop", "table", "_field","_measurement"])
    '''

    result = client.query_api().query_data_frame(query=query)
    print(result.head(100))

    """
    Close client
    """
    client.close()
    # %%


if __name__ == '__main__':
    main()

原文:influxdb-client-python/stock_predictions_import_data.py at master · influxdata/influxdb-client-python · GitHub

三、读写dataframe双向

Getting Started with InfluxDB and Pandas

By Anais Dotis-Georgiou / January 16, 2020 / CommunityDeveloperFluxInfluxDBTutorial / 3 Comments

3 minutes

InfluxData prides itself on prioritizing developer happiness. A large part of maintaining developer happiness is providing client libraries that allow users to interact with the database through the language and library of their choosing. Data analysis is the task most broadly associated with Python use cases, accounting for 58% of Python tasks, so it makes sense that Pandas is the second most popular library for Python users. The 2.0 InfluxDB Python Client Data supports Pandas DataFrames to invite those data scientists to use InfluxDB with ease.

In this tutorial, we’ll learn how to query our InfluxDB instance and return the data as a DataFrame. We’ll also explore some data science resources that exist as a part of the Client repo. To learn about how to get started with the InfluxDB Python Client Library, please take a look at this blog.

Me eagerly consuming Pandas and InfluxDB Documentation. Photo by Sid Balachandran on Unsplash.

Data science resources

A variety of data science resources have been included in the InfluxDB Python Client repo to help you take advantage of the Pandas functionality of the client. I encourage you to take a look at the example notebooks. They are a collection of Jupyter Notebooks providing examples with a variety of time series data science and analytics solutions, e.g. how to integrate Tensorflow and Keras for predictions.

From InfluxDB to a DataFrame

Import the client and Pandas:

from influxdb_client import InfluxDBClient
import pandas as pd

Copy

Supply auth parameters:

my_token = my-token
my_org = "my-org"
bucket = "system"

Copy

Write your Flux query:

query= '''
from(bucket: "system")
|> range(start:-5m, stop: now())
|> filter(fn: (r) => r._measurement == "cpu")
|> filter(fn: (r) => r._field == "usage_user")
|> filter(fn: (r) => r.cpu == "cpu-total")'''

Copy

Query InfluxDB and return a Dataframe:

client = InfluxDBClient(url="http://localhost:9999", token=my_token, org=my_org, debug=False)
system_stats = client.query_api().query_data_frame(org=my_org, query=query)
display(system_stats.head())

Copy

From DataFrame to InfluxDB

Write DataFrame to InfluxDB::

From DataFrame to InfluxDB

from influxdb_client import InfluxDBClient, Point, WriteOptions
from influxdb_client.client.write_api import SYNCHRONOUS
# Preparing Dataframe: 
system_stats.drop(columns=['result', 'table','start','stop'])
# DataFrame must have the timestamp column as an index for the client. 
system_stats.set_index("_time")
_write_client.write(bucket.name, record=system_stats, data_frame_measurement_name='cpu',
                    data_frame_tag_columns=['cpu'])

Copy

Close client:

_write_client.__del__()
client.__del__()

Copy

Pandas to complement Flux with InfluxDB

Although Flux has many of the data transformation capabilities that Pandas does, InfluxDB values developers’ time. If you are dealing with a smaller dataset, you might not have much incentive to do those transformations on the server side or learn Flux. Hopefully this Pandas capability can help you execute your time series analysis faster. As always, if you run into hurdles, please share them on our community site or Slack channel. We’d love to get your feedback and help you with any problems you run into.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值