postgresql学习_在PostgreSQL中学习这些快速技巧-CSDN博客

postgresql学习

PostgreSQL is one of the most popular open source SQL dialects. One of its main advantages is the ability to extend its functionality with some inbuilt tools.

PostgreSQL是最流行的开源SQL方言之一。它的主要优点之一是可以使用某些内置工具来扩展其功能。

Here, let's look at a few PostgreSQL tricks that you can start using to take your SQL skills to the next level.

在这里，让我们看看一些PostgreSQL技巧，您可以开始使用这些技巧将SQL技能提升到一个新的水平。

You'll find out how to:

您将了解如何：

Quickly copy files into a database
快速将文件复制到数据库
Summarise data in crosstab format
以交叉表格式汇总数据
Take advantage of arrays and JSON data in SQL
利用SQL中的数组和JSON数据
Work with geometric data
处理几何数据
Run statistical analyses directly on your database
直接在数据库上运行统计分析
Use recursion to solve problems
使用递归来解决问题

从文件复制数据 (Copy data from a file)

An easy way to quickly import data from an external file is to use the COPY function. Simply create the table you want to use, then pass in the filepath of your dataset to the COPY command.

快速从外部文件导入数据的简单方法是使用COPY功能。只需创建要使用的表，然后将数据集的文件路径传递给COPY命令。

The example below creates a table called revenue and fills it from a randomly generated CSV file.

下面的示例创建一个名为Revenue的表，并从随机生成的CSV文件中填充该表。

You can include extra parameters, to indicate the filetype (here, the file is a CSV) and whether to read the first row as column headers.

您可以包括其他参数，以指示文件类型(此处为CSV文件)以及是否将第一行读取为列标题。

You can learn more here.

您可以在此处了解更多信息。

CREATE TABLE revenue (
  store VARCHAR,
  year INT,
  revenue INT,
  PRIMARY KEY (product, year)
);

COPY revenue FROM '~/Projects/datasets/revenue.csv' WITH HEADER CSV;

使用交叉表功能汇总数据 (Summarise data using the crosstab function)

If you fancy yourself as a spreadsheet pro, you will probably be familiar with creating pivot tables from dumps of data. You can do the same in PostgreSQL with the crosstab function.

如果您喜欢作为电子表格专业人士，您可能会熟悉从数据转储创建数据透视表。您可以在PostgreSQL中使用crosstab函数执行相同的操作。

The crosstab function can take data in the form on the left, and summarise it in the form on the right (which is much easier to read). The example here will follow on with the revenue data from before.

交叉表功能可以采用左侧表格中的数据，并采用右侧表格中的数据进行汇总(这更容易阅读)。此处的示例将继续处理之前的收入数据。

First, enable the tablefunc extension with the command below:

首先，使用以下命令启用tablefunc扩展名：

CREATE EXTENSION tablefunc;

Next, write a query using the crosstab function:

接下来，使用交叉表函数编写查询：

SELECT * FROM CROSSTAB(
  'SELECT
  		*
	FROM revenue
	ORDER BY 1,2'
  ) 
AS summary(
	store VARCHAR, 
    "2016" INT, 
    "2017" INT, 
    "2018" INT
    );

There are two things to consider when using this function.

使用此功能时需要考虑两点。

First, pass in a query selecting data from your underlying table. You may simply select the table as it is (as shown here). However, you might want to filter, join or aggregate if required. Be sure to order the data correctly.
首先，传递查询以从基础表中选择数据。您可以按原样选择表格(如下所示)。但是，您可能需要过滤，加入或聚合(如果需要)。确保正确订购数据。
Then, define the output (in the example, the output is called 'summary', but you can call it any name). List the column headers you want to use and the data type they will contain.
然后，定义输出(在示例中，输出称为“摘要”，但您可以将其称为任何名称)。列出您要使用的列标题以及它们将包含的数据类型。

The output will be as shown below:

输出将如下所示：

store  |  2016   |  2017   |  2018   
---------+---------+---------+---------
 Alpha   | 1637000 | 2190000 | 3287000
 Bravo   | 2205000 |  982000 | 3399000
 Charlie | 1549000 | 1117000 | 1399000
 Delta   |  664000 | 2065000 | 2931000
 Echo    | 1795000 | 2706000 | 1047000
(5 rows)

使用数组和JSON (Work with arrays and JSON)

PostgreSQL supports multi-dimensional array data types. These are comparable to similar data types in many other languages, including Python and JavaScript.

PostgreSQL支持多维数组数据类型。这些可与许多其他语言(包括Python和JavaScript)中的类似数据类型进行比较。

You might want to use them in situations where it helps to work with more dynamic, less-structured data.

您可能希望在有助于处理更多动态，结构较少的数据的情况下使用它们。

For example, imagine a table describing published articles and subject tags. An article could have no tags, or it could have many. Trying to store this data in a structured table format would be unnecessarily complicated.

例如，假设有一张描述已发表文章和主题标签的表格。文章可以没有标签，也可以有很多标签。试图以结构化表格格式存储此数据将不必要地变得复杂。

You can define arrays using a data type, followed by square brackets. You can optionally specify their dimensions (however, this is not enforced).

您可以使用数据类型定义数组，后跟方括号。您可以选择指定其尺寸(但是，这不是强制性的)。

For example, to create a 1-D array of any number of text elements, you would use text[]. To create a three-by-three two dimensional array of integer elements, you would use int[3][3].

例如，要创建任意数量的文本元素的一维数组，可以使用text[] 。要创建一个三乘三的二维整数数组，可以使用int[3][3] 。

Take a look at the example below:

看下面的例子：

CREATE TABLE articles (
  title VARCHAR PRIMARY KEY,
  tags TEXT[]
);

To insert arrays as records, use the syntax '{"first","second","third"}'.

要将数组插入为记录，请使用语法'{"first","second","third"}' 。

INSERT INTO articles (title, tags)
  VALUES 
  ('Lorem ipsum', '{"random"}'),
  ('Placeholder here', '{"motivation","random"}'),
  ('Postgresql tricks', '{"data","self-reference"}');

There are a lot of things you can do with arrays in PostgreSQL.

在PostgreSQL中，可以对数组做很多事情。

For a start, you can check if an array contains a given element. This is useful for filtering. You can use the "contains" operator @> to do this. The query below finds all the articles which have the tag "random".

首先，您可以检查数组是否包含给定的元素。这对于过滤很有用。您可以使用“包含”运算符@>来执行此操作。下面的查询查找所有带有标签“ random”的文章。

SELECT
  *
FROM articles
WHERE tags @> '{"random"}';

You can also concatenate (join together) arrays using the || operator, or check for overlapping elements with the && operator.

您也可以使用||连接(连接)数组。运算符，或者使用&&运算符检查元素是否重叠。

You can search arrays by index (unlike many languages, PostgreSQL arrays start counting from one, instead of zero).

您可以按索引搜索数组(与许多语言不同，PostgreSQL数组从1开始计数，而不是从0开始计数)。

SELECT
	tags[1]
FROM articles;

As well as arrays, PostgreSQL also lets you use JSON as a data type. Again, this provides the advantages of working with unstructured data. You can also access elements by their key name.

除了数组之外，PostgreSQL还允许您使用JSON作为数据类型。同样，这提供了使用非结构化数据的优势。您还可以通过其键名访问元素。

CREATE TABLE sessions (
	session_id SERIAL PRIMARY KEY,
    session_info JSON
);

INSERT INTO sessions (session_info)
VALUES
('{"app_version": 1.0, "device_type": "Android"}'),
('{"app_version": 1.2, "device_type": "iOS"}'),
('{"app_version": 1.4, "device_type": "iOS", "mode":"default"}');

Again, there are many things you can do with JSON data in PostgreSQL. You can use the -> and ->> operators to "unpackage" the JSON objects to use in queries.

同样，在PostgreSQL中，您可以对JSON数据做很多事情。您可以使用->和->>运算符来“解包”要在查询中使用的JSON对象。

For example, this query finds the values of the device_type key:

例如，此查询查找device_type键的值：

SELECT
  session_info -> 'device_type' AS devices
FROM sessions;

And this query counts how many sessions were on app version 1.0 or earlier:

此查询计算应用程序版本1.0或更早版本上有多少会话：

SELECT
  COUNT(*)
FROM sessions
WHERE CAST(session_info ->> 'app_version' AS decimal) <= 1.0;

运行统计分析 (Run statistical analyses)

Often, people see SQL as good for storing data and running simple queries, but not for running more in-depth analyses. For that, you should use another tool such as Python or R or your favourite spreadsheet software.

通常，人们认为SQL对于存储数据和运行简单查询很有用，但对于运行更深入的分析却不利。为此，您应该使用其他工具，例如Python或R或您喜欢的电子表格软件。

However, PostgreSQL brings with it enough statistical capabilities to get you started.

但是，PostgreSQL具有足够的统计功能，可以帮助您入门。

For instance, it can calculate summary statistics, correlation, regression and random sampling. The table below contains some simple data to play around with.

例如，它可以计算汇总统计信息，相关性，回归和随机抽样。下表包含一些简单的数据。

CREATE TABLE stats (
  sample_id SERIAL PRIMARY KEY,
  x INT,
  y INT
);

INSERT INTO stats (x,y)
  VALUES 
  (1,2), (3,4), (6,5), (7,8), (9,10);

You can find the mean, variance and standard deviation using the functions below:

您可以使用以下功能找到均值，方差和标准差：

SELECT
	AVG(x),
    VARIANCE(x),
    STDDEV(x)
FROM stats;

You can also find the median (or any other percentile value) using the percentile_cont function:

您还可以使用percentile_cont函数找到中位数(或任何其他百分比值)：

-- median
SELECT
  PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY x) 
FROM stats;

-- 90th percentile
SELECT
  PERCENTILE_CONT(0.9)
WITHIN GROUP (ORDER BY x) 
FROM stats;

Another trick lets you calculate the correlation coefficients between different columns. Simply use the corr function.

另一个技巧是让您计算不同列之间的相关系数。只需使用corr函数。

SELECT
	CORR(x,y)
FROM stats;

PostgreSQL lets you run linear regression (sometimes called the most basic form of machine learning) via a set of inbuilt functions.

PostgreSQL使您可以通过一组内置函数运行线性回归 (有时被称为机器学习的最基本形式)。

SELECT
	REGR_INTERCEPT(x,y),
    REGR_SLOP(x,y),
    REGR_R2(x,y)
FROM stats;

You can even run Monte Carlo simulations with single queries. The query below uses the generate_series and random number functions to estimate the value of π by randomly sampling one million points inside a unit circle.

您甚至可以通过单个查询运行Monte Carlo模拟。下面的查询使用generate_series和随机数函数通过在一个单位圆内随机采样一百万个点来估计π的值。

SELECT 
	CAST(
		COUNT(*) * 4 AS FLOAT
		) / 1000000 AS pi 
FROM GENERATE_SERIES(1,1000000)
WHERE CIRCLE(POINT(0.5,0.5),0.5) @> POINT(RANDOM(), RANDOM());

处理形状数据 (Work with shape data)

Another unusual data type available in PostgreSQL is geometric data.

PostgreSQL中可用的另一种异常数据类型是几何数据。

That's right, you can work with points, lines, polygons and circles within SQL.

没错，您可以在SQL中使用点，线，多边形和圆。

Points are the basic building block for all geometric data types in PostgreSQL. They are represented as (x, y) coordinates.

点是PostgreSQL中所有几何数据类型的基本构建块。它们表示为(x，y)坐标。

SELECT
	POINT(0,0) AS "origin",
    POINT(1,1) AS "point";

You can also define lines. These can either be infinite lines (specified by giving any two points on the line). Or, they can be line segments (specified by giving the 'start' and 'end' points of the line).

您也可以定义线。这些可以是无限线(通过在该线上给出任意两个点来指定)。或者，它们可以是线段(通过指定线的“起点”和“终点”来指定)。

SELECT
	LINE '((0,0),(1,1))' AS "line",
    LSEG '((2,2),(3,3))' AS "line_segment";

Polygons are defined by a longer series of points.

多边形由更长的点定义。

SELECT
	POLYGON '((0,0),(1,1),(0,2))' AS "triangle",
	POLYGON '((0,0),(0,1),(1,1),(1,0))' AS "square",
    POLYGON '((0,0),(0,1),(2,1),(2,0))' AS "rectangle";

Circles are defined by a central point and a radius.

圆由中心点和半径定义。

SELECT
	CIRCLE '((0,0),1)' as "small_circle",
    CIRCLE '(0,0),5)' as "big_circle";

There are many functions and operators that can be applied to geometric data in PostgreSQL.

有许多函数和运算符可应用于PostgreSQL中的几何数据。

You can:

您可以：

Check if two lines are parallel with the ?|| operator:
检查两行是否与?||平行操作员：

SELECT
	LINE '((0,0),(1,1))' ?|| LINE '((2,3),(3,4))';

Find the distance between two objects with the <-> operator:
使用<->运算符找到两个对象之间的距离：

SELECT 
	POINT(0,0) <-> POINT(1,1);

Check if two shapes overlap at any point with the && operator:
使用&&运算符检查两个形状是否在任意点重叠：

SELECT
	CIRCLE '((0,0),1)' &&  CIRCLE '((1,1),1)';

Translate (shift position) a shape using the + operator:
使用+运算符平移(移动位置)形状：

SELECT
	POLYGON '((0,0),(1,2),(1,1))' + POINT(0,3);

And lots more besides - check out the documentation for more detail!

还有很多-请查看文档以获取更多详细信息！

使用递归查询 (Use recursive queries)

Recursion is a programming technique that can be used to solve problems using a function which calls itself. Did you know that you can write recursive queries in PostgreSQL?

递归是一种编程技术，可以使用调用自身的函数来解决问题。您知道您可以在PostgreSQL中编写递归查询吗？

There are three parts required to do this:

为此需要三个部分：

First, you define a starting expression.
首先，定义一个起始表达式。
Then, define a recursive expression that will be evaluated repeatedly
然后，定义一个递归表达式，该表达式将被重复求值
Finally, define a "termination criteria" - a condition which tells the function to stop calling itself, and return an output.
最后，定义一个“终止条件”-一种条件，该条件告诉函数停止调用自身，并返回输出。

The query below returns the first hundred numbers in the Fibonacci sequence:

下面的查询返回斐波那契数列的前一百个数字：

WITH RECURSIVE fibonacci(n,x,y) AS (
	SELECT
    	1 AS n ,
  		0 :: NUMERIC AS x,
    	1 :: NUMERIC AS y
  	UNION ALL
  	SELECT
    	n + 1 AS n,
  		y AS x,
    	x + y AS y 
  	FROM fibonacci 
  	WHERE n < 100
	)
SELECT
	x 
FROM fibonacci;

Let's break this down.

让我们分解一下。

First, it uses the WITH clause to define a (recursive) Common Table Expression called fibonacci. Then, it defines an initial expression:

首先，它使用WITH子句定义一个称为fibonacci的(递归) 公共表表达式。然后，它定义一个初始表达式：

WITH RECURSIVE fibonacci(n,x,y) AS (
	SELECT
    	1 AS n ,
  		0 :: NUMERIC AS x,
    	1 :: NUMERIC AS y...

Next, it defines the recursive expression that queries fibonacci:

接下来，它定义查询fibonacci的递归表达式：

...UNION ALL
  	SELECT
    	n + 1 AS n,
  		y AS x,
    	x + y AS y 
  	FROM fibonacci...

Finally, it uses a WHERE clause to define the termination criteria, and then selects column x to give the output sequence:

最后，它使用WHERE子句定义终止条件，然后选择列x给出输出序列：

...WHERE n < 100
		)
	SELECT
		x 
	FROM fibonacci;

Perhaps you can think of another example of recursion that could be implemented in PostgreSQL?

也许您可以想到另一个可以在PostgreSQL中实现的递归示例？

结束语 (Final remarks)

So, there you have it - a quick run through of some great features you may or may not have known PostgreSQL could provide. There are no doubt more features worth covering that didn't make it into this list.

因此，您已掌握了它-快速了解PostgreSQL可能提供或可能不提供的一些出色功能。毫无疑问，没有值得列入的其他功能。

PostgreSQL is a rich and powerful programming language in its own right. So, next time you are stuck figuring out how to solve a data related problem, take a look and see if PostgreSQL has you covered. You might surprised how often it does!

PostgreSQL本身就是一种丰富而强大的编程语言。因此，下一次您需要解决如何解决与数据相关的问题时，请看看PostgreSQL是否已覆盖您。您可能会惊讶它的执行频率！

Thanks for reading!

谢谢阅读！