postgres 工具_Postgres内部:构建描述工具

postgres 工具

In previous blog posts, we have described the Postgres database and ways to interact with it using Python. Those posts provided the basics, but if you want to work with databases in production systems, then it is necessary to know how to make your queries faster and more efficient. To understand what efficiency means in Postgres, it’s important to learn how Postgres works under the hood.

在以前的博客 文章中 ,我们描述了Postgres数据库以及使用Python与之交互的方式。 这些帖子提供了基础知识,但是如果您想在生产系统中使用数据库,那么有必要知道如何使查询更快,更有效。 要了解效率对Postgres意味着什么,重要的是要了解Postgres的工作原理。

In this post, we will focus on the more advanced concepts of Postgres and relational databases. To begin, we will learn how Postgres stores its own internal data for describing, debugging, and identifying the bottlenecks in a system. Then, we’ll use our knowledge on Postgres internal data to build our own version of a database description tool in Python.

在本文中,我们将重点介绍Postgres和关系数据库的更高级概念。 首先,我们将学习Postgres如何存储自己的内部数据以描述,调试和识别系统中的瓶颈。 然后,我们将利用对Postgres内部数据的了解,使用Python构建我们自己的数据库描述工具版本。

Like our previous blog posts, we will be using the following tools:

像我们以前的博客文章一样,我们将使用以下工具:

  • A local version of Postgres (v9.2 or higher)
  • Python 3
  • The Python driver for Postgres, psycopg2
  • Postgres的本地版本(v9.2或更高版本)
  • Python 3
  • Postgres的Python驱动程序psycopg2

The data set we’ll use comes from the United States’ Department of Housing and Urban Development – also known as HUD. We have packaged the files into a zip file that contains the data as CSVs, and a Python 3 script (load_hud_tables.py) that will copy the CSVs into your local running Postgres. If you are running a Postgres server without the default connection, you’ll want to update the connection string in the script.

我们将使用的数据集来自美国住房和城市发展部 (也称为HUD) 。 我们已经将文件打包成一个zip文件 ,其中包含以CSV格式存储的数据,以及一个Python 3脚本( load_hud_tables.py ),该脚本会将CSV文件复制到本地运行的Postgres中。 如果您运行的是没有默认连接的Postgres服务器,则需要更新脚本中的连接字符串。

Using the HUD’s data set, we will work on a real world data example using available commands in every Postgres engine. Starting from a blank slate, we will investigate the tables and their datatypes. Then, we will explore the HUD tables using internal Postgres tables to give us a detailed description about the contents of the database.

使用HUD的数据集,我们将使用每个Postgres引擎中的可用命令来处理真实数据示例。 从空白开始,我们将研究表及其数据类型。 然后,我们将使用内部Postgres表探索HUD表,从而为我们提供有关数据库内容的详细描述。

To start, download and unzip the dq_postgres_internals.zip file. Go into the dq_postgres_internals/ directory, change the connection parameters in load_hud_tables.py, and then run the script. This will load the CSV files into your local Postgres instance.

首先,下载并解压缩dq_postgres_internals.zip 文件 。 进入dq_postgres_internals/目录,在load_hud_tables.py更改连接参数,然后运行脚本。 这会将CSV文件加载到本地Postgres实例中。

Once you have loaded the files into your Postgres server, we can start off by connecting to the database. Change the connection values if your local Postgres instance differs from the defaults. Throughout this post, any future reference to the the object cur will be the following connected cursor:

将文件加载到Postgres服务器后,我们可以通过连接数据库开始。 如果本地Postgres实例与默认实例不同,请更改连接值。 在整个这篇文章中,将来对对象cur任何引用将是以下连接的游标:

import psyocpg2

conn = psycopg2.connect(host="127.0.0.1", dbname="postgres", user="postgres", password="password")
cur = conn.cursor()

import psyocpg2

conn = psycopg2.connect(host="127.0.0.1", dbname="postgres", user="postgres", password="password")
cur = conn.cursor()

Now that we’re connected, it’s time to start exploring what tables we have available. First, we’ll examine the Postgres internal tables which provide us details about our database. Let’s begin by describing what these internal tables are.

现在我们已经建立了连接,现在该开始探索可用的表了。 首先,我们将检查Postgres内部表,这些表向我们提供了有关数据库的详细信息。 让我们从描述这些内部表开始。

In every Postgres engine, there are a set of internal tables Postgres uses to manage its entire structure. These tables are located in the Postgres documentation as the groups information_schema and system catalogs. These contain all the information about data, names of tables, and types stored in a Postgres database. For example, when we use the attribute, cur.description, it is pulling information from the internal tables to display to the user.

在每个Postgres引擎中,都有一组内部表,Postgres使用这些内部表来管理其整个结构。 这些表作为组information_schema系统目录位于Postgres文档中。 这些包含有关存储在Postgres数据库中的数据,表名和类型的所有信息。 例如,当我们使用属性cur.description ,它将从内部表中提取信息以显示给用户。

Unfortunately, there is no detailed schema of the data sets. As a result, we need to create our own detailed description of what’s included. Let’s use an internal table from the information_schema to get a high level overview of what tables are stored in the database. The internal table is called information_schema.tables, and we can see from the documentation that there are plenty of columns to choose from:

不幸的是,没有数据集的详细架构。 因此,我们需要为包含的内容创建自己的详细描述。 让我们使用来自information_schema的内部表来大致了解数据库中存储了哪些表。 内部表称为information_schema.tables ,从文档中我们可以看到有很多列可供选择:

Name 名称 Data Type 数据类型 Description 描述
table_catalog table_catalog sql_identifier sql_identifier Name of the database that contains the table (always the current database) 包含表的数据库的名称(总是当前数据库)
table_schema table_schema sql_identifier sql_identifier Name of the schema that contains the table 包含表的模式的名称
table_name table_name sql_identifier sql_identifier Name of the table 表名
table_type table_type character_data character_data Type of the table: BASE TABLE for a persistent base table (the normal table type), VIEW for a view, FOREIGN TABLE for a foreign table, or LOCAL TEMPORARY for a temporary table 表的类型:BASE TABLE用于持久性基表(普通表类型),VIEW用于视图,FOREIGN TABLE用于外部表或LOCAL TEMPORARY用于临时表
self_referencing_column_name self_referencing_column_name sql_identifier sql_identifier Applies to a feature not available in PostgreSQL 适用于PostgreSQL中不可用的功能
reference_generation reference_generation character_data character_data Applies to a feature not available in PostgreSQL 适用于PostgreSQL中不可用的功能
user_defined_type_catalog user_defined_type_catalog sql_identifier sql_identifier If the table is a typed table, the name of the database that contains the underlying data type (always the current database), else null. 如果表是类型化表,则包含基础数据类型的数据库的名称(总是当前数据库),否则为null。
user_defined_type_schema user_defined_type_schema sql_identifier sql_identifier If the table is a typed table, the name of the schema that contains the underlying data type, else null. 如果表是类型化表,则包含基础数据类型的架构名称,否则为null。
user_defined_type_name user_defined_type_name sql_identifier sql_identifier If the table is a typed table, the name of the underlying data type, else null. 如果表是类型化表,则为基础数据类型的名称,否则为null。
is_insertable_into is_insertable_into yes_or_no 是还是不是 YES if the table is insertable into, NO if not (Base tables are always insertable into, views not necessarily.) 如果该表可插入,则为YES,否则为否(基本表始终可插入,而视图不一定)。
is_typed is_typed yes_or_no 是还是不是 YES if the table is a typed table, NO if not 如果表是类型化表,则为是;否则为否
commit_action commit_action character_data character_data If the table is a temporary table, then PRESERVE, else null. (The SQL standard defines other commit actions for temporary tables, which are not supported by PostgreSQL.) 如果表是临时表,则为PRESERVE,否则为null。 (SQL标准为临时表定义了其他提交操作,而PostgreSQL不支持。)

At this point, we are only concerned with the names of the tables that are in the database. Taking a look at the table description above, there is a column table_name that exposes this information. Let’s query that column and see what we’re dealing with.

此时,我们只关心数据库中表的名称。 查看上面的表描述,有一列table_name公开了此信息。 让我们查询该列,看看我们正在处理什么。

When you run the previous command, you’ll notice that the output contains an overwhelming amount of tables. Each one of these tables are found in the postgres database. The problem is: we’re including internal tables in our results.

当您运行上一个命令时,您会注意到输出中包含大量的表。 这些表中的每一个都可以在postgres数据库中找到。 问题是:我们在结果中包括内部表。

In the output, you’ll notice that many tables started with the prefix pg_*. Each one of these tables is part of the pg_catalog group of internal tables. These are the system catalog tables we described earlier.

在输出中,您会注意到许多表都以pg_*前缀pg_* 。 这些表中的每一个都是内部表的pg_catalog组的一部分。 这些是我们前面介绍的系统目录表。

Then, as you may have guessed, there is also a group of other tables under the information_schema name. These tables are harder to find, though, as they follow no obvious prefix pattern. How do we know which tables are internal and which tables are user created?

然后,您可能已经猜到了,在information_schema名称下还有一组其他表。 但是,这些表很难找到,因为它们没有明显的前缀模式。 我们如何知道哪些表是内部的,哪些表是用户创建的?

We need to describe schemas to explain the above question. When dealing with relational databases, the word schema is generalized as a term with multiple meanings. You may have heard schema being used to describe tables (their datatypes, name, columns, etc.) or schema as the the blueprint of a database.

我们需要描述模式以解释上述问题。 在处理关系数据库时,模式一词被概括为具有多种含义的术语。 您可能听说过架构被用来描述表(它们的数据类型,名称,列等)或架构作为数据库的蓝图

The ambigious meanings of schema only serves to confuse us. For Postgres, however, the term schema has been reserved for a specific purpose. In Postgres, schemas are used as a namespace for tables, with the distinct purpose of seperating them into isolated groups or sets within a single database.

模式的含糊含义只会使我们感到困惑。 但是,对于Postgres,术语“架构”已保留用于特定目的。 在Postgres中,模式用作表的命名空间,其独特目的是将它们分成单个数据库内的隔离组集合

Let’s break this down further. Postgres uses the concept of databases to seperate users and data within a Postgres server. When you create a database, you are creating an isolated environment where users can query against tables that can only be found in that particular database.

让我们进一步分解。 Postgres使用数据库的概念来分隔Postgres服务器中的用户和数据。 创建数据库时,您正在创建一个隔离的环境,用户可以在其中查询只能在该特定数据库中找到的表。

Here’s an example: suppose the Department of Homeland Security (DHS) and HUD shared the same government Postgres database, but they wanted to seperate their users and data. Then, they would use databases to seperate their data and users with a seperate database for each agency. Now, when a user wants to connect to their data, they need to specify which database they will connect to and there, and only there, can they work with their tables.

这是一个示例:假设国土安全部(DHS)和HUD共享了相同的政府Postgres数据库,但是他们想分隔用户和数据。 然后,他们将使用数据库为每个机构使用单独的数据库来分隔其数据和用户。 现在,当用户想要连接到他们的数据时,他们需要指定将要连接到那里的数据库,只有在那里,他们才能使用他们的表。

However, suppose that there were analysts that wanted to do a cross-section analysis against citizenship data (citizens table) and urban housing developments (developments table). Well, then they would want to query both the table in the dhs database and the hud database. With Postgres, though, this is not possible.

但是,假设有些分析人员想对公民身份数据( citizens表)和城市住房发展( developments表)进行横断面分析。 好了,接下来他们将要查询的表都dhs数据库和hud数据库。 但是,对于Postgres,这是不可能的。

# Connect to the `dhs` database.
conn = psycopg2.connect(dbname='dhs')
cur = conn.cursor()

# This query works.
cur.execute('SELECT * FROM citizens') 

# This query will fail because it is in the `hud` database.
cur.execute('SELECT * FROM developments')

# Connect to the `dhs` database.
conn = psycopg2.connect(dbname='dhs')
cur = conn.cursor()

# This query works.
cur.execute('SELECT * FROM citizens') 

# This query will fail because it is in the `hud` database.
cur.execute('SELECT * FROM developments')

What if we wanted to seperate tables into distinct groups, but still allow cross table querying? This is the perfect use case for schemas. Instead of databases, using a different schema for each agency will seperate the tables using a namespace, but still allow analysts to query both tables.

如果我们想将表分成不同的组,但仍然允许跨表查询怎么办? 这是架构的完美用例。 为每个代理机构使用不同的架构而不是数据库,将使用名称空间将表分开,但仍允许分析人员查询两个表。

This is how Postgres divides their internal tables (and user created tables as well!). When a database is created, there are 3 schemas instantiated: pg_catalog (for the system catalog tables), information_schema (for the information schema tables), and public (the default schema for user created tables). Each time you issue a CREATE TABLE command in a database, by default Postgres assigns that table to the public schema.

这就是Postgres分割内部表(以及用户创建的表!)的方式。 创建数据库时,将实例化3个模式: pg_catalog (用于系统目录表), information_schema (用于信息模式表)和public (用户创建的表的默认模式)。 每次您在数据库中发出CREATE TABLE命令时,默认情况下Postgres都会将该表分配给public模式。

Now, returning to our problem from before, how can we seperate user created tables from the internal tables? Take a look at the columns for information_schema.tables again. Now, check to see if there exists a column that can seperate user created tables from internal tables.

现在,回到以前的问题,我们如何将用户创建的表与内部表分开? 再次查看information_schema.tables的列。 现在,检查是否存在可以将用户创建的表与内部表分开的列。

Name 名称 Data Type 数据类型 Description 描述
table_catalog table_catalog sql_identifier sql_identifier Name of the database that contains the table (always the current database) 包含表的数据库的名称(总是当前数据库)
table_schema table_schema sql_identifier sql_identifier Name of the schema that contains the table 包含表的模式的名称
table_name table_name sql_identifier sql_identifier Name of the table 表名
table_type table_type character_data character_data Type of the table: BASE TABLE for a persistent base table (the normal table type), VIEW for a view, FOREIGN TABLE for a foreign table, or LOCAL TEMPORARY for a temporary table 表的类型:BASE TABLE用于持久性基表(普通表类型),VIEW用于视图,FOREIGN TABLE用于外部表或LOCAL TEMPORARY用于临时表
self_referencing_column_name self_referencing_column_name sql_identifier sql_identifier Applies to a feature not available in PostgreSQL 适用于PostgreSQL中不可用的功能
reference_generation reference_generation character_data character_data Applies to a feature not available in PostgreSQL 适用于PostgreSQL中不可用的功能
user_defined_type_catalog user_defined_type_catalog sql_identifier sql_identifier If the table is a typed table, the name of the database that contains the underlying data type (always the current database), else null. 如果表是类型化表,则包含基础数据类型的数据库的名称(总是当前数据库),否则为null。
user_defined_type_schema user_defined_type_schema sql_identifier sql_identifier If the table is a typed table, the name of the schema that contains the underlying data type, else null. 如果表是类型化表,则包含基础数据类型的架构名称,否则为null。
user_defined_type_name user_defined_type_name sql_identifier sql_identifier If the table is a typed table, the name of the underlying data type, else null. 如果表是类型化表,则为基础数据类型的名称,否则为null。
is_insertable_into is_insertable_into yes_or_no 是还是不是 YES if the table is insertable into, NO if not (Base tables are always insertable into, views not necessarily.) 如果该表可插入,则为YES,否则为否(基本表始终可插入,而视图不一定)。
is_typed is_typed yes_or_no 是还是不是 YES if the table is a typed table, NO if not 如果表是类型化表,则为是;否则为否
commit_action commit_action character_data character_data If the table is a temporary table, then PRESERVE, else null. (The SQL standard defines other commit actions for temporary tables, which are not supported by PostgreSQL.) 如果表是临时表,则为PRESERVE,否则为null。 (SQL标准为临时表定义了其他提交操作,而PostgreSQL不支持。)

There’s a column named table_schema that will fit our requirement. We can filter on this column to select all public tables. Here’s how we can write the query:

有一个名为table_schema的列table_schema我们的要求。 我们可以过滤此列以选择所有公共表。 这是我们编写查询的方法:

conn = psycopg2.connect(dbname="dq", user="hud_admin", password="eRqg123EEkl")
cur = conn.cursor()

cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public' ORDER BY table_name")
for table_name in cur.fetchall():
    name = table_name[0]
    print(name)

conn = psycopg2.connect(dbname="dq", user="hud_admin", password="eRqg123EEkl")
cur = conn.cursor()

cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public' ORDER BY table_name")
for table_name in cur.fetchall():
    name = table_name[0]
    print(name)

From the output, there are only three tables we will be working with. These are:

从输出中,我们将只使用三个表。 这些是:

Using the table names, we can call the cur.description attribute to get a detailed look at the columns, types, and any other meta information for each table. Before, we learned that it was possible to issue a SELECT query and then call the description attribute to get the table information. However, what if we wanted to do it within the for loop for each table?

使用表名,我们可以调用cur.description属性来详细查看每个表的列,类型和任何其他元信息。 之前,我们了解到可以发出SELECT查询,然后调用description属性来获取表信息。 但是,如果我们想在每个表的for循环中这样做呢?

Hopefully your first thought was not to use .format(). In previous posts, we have already mentioned the issues surrounding string interpolation with .format(). The answer is to mogirfy the string using the mogrify() method, or the second positional argument in the execute() method.

希望您的第一个想法是使用.format() 。 在以前的文章中,我们已经提到了有关.format()字符串内插的问题。 答案是使用mogrify()方法或execute()方法中的第二个位置参数对字符串进行mogirfy。

Unfortunately, things are not that easy. Attempting to insert the table name – using mogirfy() – instead of a column name, filter key, or group by key results in an error. Here’s an example of this error:

不幸的是,事情并不是那么容易。 尝试使用mogirfy()插入表名而不是列名,过滤键或按键分组会导致错误。 这是此错误的示例:

table_name = "state_info"
bad_interpolation = cur.mogrify("SELECT * FROM %s LIMIT 0", [table_name])
# This will execute the query: SELECT * FROM 'state_info' LIMIT 0
# Notice the single quotation marks around state_info.
cur.execute(bad_interpolation)  # Throws an error

table_name = "state_info"
bad_interpolation = cur.mogrify("SELECT * FROM %s LIMIT 0", [table_name])
# This will execute the query: SELECT * FROM 'state_info' LIMIT 0
# Notice the single quotation marks around state_info.
cur.execute(bad_interpolation)  # Throws an error

From the code snippet, you may have noticed that using mogrify() on the table name "state_info" converts the name to a Postgres string. This would be required behaviour for a column name or filter query, but not a table name. Instead, you have to use a class from the psycopg2.extensions module named AsIs.

从代码片段中,您可能已经注意到,在表名称"state_info"上使用mogrify()会将名称转换为Postgres字符串。 这对于列名或过滤器查询是必需的,但对于表名则不是必需的。 相反,您必须使用psycopg2.extensions模块中名为AsIs

The table name in the SELECT query does not need to be string quoted. Therefore, AsIs keeps it as a valid SQL representation of non-string quoted instead of converting it. Using AsIs, we can check the description for each of the tables without having to write out each request!

SELECT查询中的表名不需要用字符串引起来。 因此, AsIs将其保留为非字符串引用的有效SQL表示形式,而不是对其进行转换。 使用AsIs ,我们可以检查每个表的描述,而不必写出每个请求!

from psycopg2.extensions import AsIs

cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
for table in cur.fetchall():
    table = table[0]
    cur.execute("SELECT * FROM %s LIMIT 0", [AsIs(table)])
    print(cur.description, "n")

from psycopg2.extensions import AsIs

cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
for table in cur.fetchall():
    table = table[0]
    cur.execute("SELECT * FROM %s LIMIT 0", [AsIs(table)])
    print(cur.description, "n")

With each of the descriptions printed out, we now have a detailed look at the tables we will be working with. Here’s a snippet of the output from the homeless_by_coc description:

在打印完每个描述之后,我们现在将详细研究将要使用的表。 这是homeless_by_coc描述的输出片段:

Understanding the description attribute, you should be comfortable with the metadata available. However, we are once again faced with an integer type_code instead of a human readable type. There is too much mental overhead when remembering what human readable types they represent (ie. TEXT, INTEGER, or BOOLEAN).

了解了description属性,您应该对可用的元数据感到满意。 但是,我们再次面对整数type_code而不是人类可读的类型。 记住它们所代表的人类可读类型(即TEXTINTEGERBOOLEAN )时,精神上的开销太大。

You can use psycopg2 type values to find the approximate type for each column, but we can do better than approximate those values. Using an internal table, we can accurately map the types for every column in the HUD tables.

您可以使用psycopg2类型值来查找每一列的近似类型,但是我们可以做得比近似那些值更好。 使用内部表,我们可以准确地映射HUD表中每一列的类型。

The internal table we will use comes from the system catalog schema, pg_catalog, and it is accurately named pg_type. We advise checking out the table description in the documentation as there are too many rows to add in this section. You can find the table description here.

我们将使用的内部表来自系统目录架构pg_catalog ,并且准确地命名为pg_type 。 我们建议您检查文档中的表说明,因为本节中要添加的行太多。 您可以在此处找到表格说明。

Within this table, there are a lot of defined columns – many that you do not need to be concerned about. However, one interesting thing to note about this table is that it can be used to create your own Postgres types from scratch. For example, using this table, you could create a HEX type that could be used to only store hexadecimal characters in your columns.

在此表中,有很多已定义的列–您无需担心许多列。 但是,关于此表需要注意的一件事是它可用于从头开始创建自己的Postgres类型。 例如,使用此表,您可以创建一个HEX类型,该类型只能用于在列中存储十六进制字符。

Let’s loop through the returned SELECT query and map the integer type code to the string. It should look something similar to this:

让我们循环遍历返回的SELECT查询,并将整数类型代码映射到字符串。 它看起来应该类似于以下内容:

type_mappings = {
    16: 'bool',
    18: 'char',
    19: 'name',
    ...
}

type_mappings = {
    16: 'bool',
    18: 'char',
    19: 'name',
    ...
}

Using dictionary comprehension, we can write the following:

使用字典理解,我们可以编写以下内容:

Great! Now we have a mapping of all type codes to their type names. Using the type_mappings dictionary, the types are provided without looking them up in the documentation.

大! 现在,我们将所有类型代码映射到其类型名称。 使用type_mappings字典,可以提供类型,而无需在文档中查找它们。

Let’s put all this together and create our own table descriptions. We want to rewrite the description attributes from a list of tuples towards something human readable. We’ll want to assemble output from the previous exercises into this dictionary:

让我们将所有这些放在一起并创建我们自己的表描述。 我们想从元组列表中将description属性重写为易于理解的内容。 我们将把先前练习的输出组装到该字典中:

{
    "homeless_by_coc":
        {
            columns: [
                {
                    name: "id"
                    type: "int4"
                    internal_size: 4
                },
                {
                    name: "year",
                    type: "date",
                    internal_size: 4
                },
                {
                    name: "state",
                    type: "char",
                    internal_size: 2
                },
                {
                    name: "coc_number",
                    type: "char",
                    internal_size: 128
                },
                {
                    name: "measures",
                    type: "varchar",
                    internal_size: 64 
                },
                {
                    name: "count",
                    type: "int4",
                    internal_size: 4
                }
            ]
            
        }
    ...
}

{
    "homeless_by_coc":
        {
            columns: [
                {
                    name: "id"
                    type: "int4"
                    internal_size: 4
                },
                {
                    name: "year",
                    type: "date",
                    internal_size: 4
                },
                {
                    name: "state",
                    type: "char",
                    internal_size: 2
                },
                {
                    name: "coc_number",
                    type: "char",
                    internal_size: 128
                },
                {
                    name: "measures",
                    type: "varchar",
                    internal_size: 64 
                },
                {
                    name: "count",
                    type: "int4",
                    internal_size: 4
                }
            ]
            
        }
    ...
}

Using type_mappings, and table_names, the steps that will get us the required result are:

使用type_mappingstable_names ,将为我们提供所需结果的步骤是:

  1. Loop through the table_names with the table variable.
  2. Get the description attribute for the given table.
  3. Map the name of the table to a dictionary with a columns key.
  4. Recreate the columns list from the screen example by iterating through the description, and mapping the appropriate types.
  1. 使用table变量在table_names循环。
  2. 获取给定tabledescription属性。
  3. 使用columns键将table名映射到字典。
  4. 通过遍历description并映射适当的类型,从屏幕示例重新创建columns列表。

Things are starting to come together. Now, to complete our investigation, we want to provide some additional information about the rows in our tables. Let’s provide our description with the number of rows in the table.

事情开始融合在一起。 现在,为了完成我们的调查,我们想提供有关表中行的其他信息。 让我们用表中的行数提供描述。

We can find the number of rows using the COUNT() aggregate function. This is extremely similar to SQLite’s aggreggate function as well as other implementations of the SQL syntax. If you want to learn more about Postgres’ aggregate functions, they are all defined in the pg_catalog.pg_aggregate internal table.

我们可以使用COUNT()聚合函数找到行数。 这与SQLite的aggreggate函数以及其他SQL语法实现极为相似。 如果您想了解有关Postgres聚合函数的更多信息,它们都在pg_catalog.pg_aggregate内部表中定义。

As a reminder, here’s how you would use the COUNT() function in Postgres:

提醒一下,这是在Postgres中使用COUNT()函数的方式:

SELECT COUNT(*) FROM example_table

SELECT COUNT(*) FROM example_table

We want the description table to look like:

我们希望描述表如下所示:

Instead of iterating through the table_names list, we’ll iterate on the readable_description dictionary keys:

除了遍历table_names列表之外,我们将遍历readable_description _描述字典键:

for table in readable_description.keys():
    cur.execute("SELECT COUNT(*) FROM %s", [AsIs(table)])
    readable_description[table]["total"] = cur.fetchone()

for table in readable_description.keys():
    cur.execute("SELECT COUNT(*) FROM %s", [AsIs(table)])
    readable_description[table]["total"] = cur.fetchone()

Finally, let’s add some sample rows the readable_description dictionary. Since there are a lot of rows for the homeless_by_coc table, we should add on a limit for each query. Even if you add a higher limit than rows available, the query will still execute.

最后,让我们在readable_description字典中添加一些示例行。 由于homeless_by_coc表有很多行,因此我们应该为每个查询增加一个限制。 即使您添加的限制比可用行高,查询仍将执行。

We’ll add the limit query within the same loop as retrieving the count. Instead of iterating over the keys twice, we can perform both operations within the same loop. However, we should be cautious of the call order of cur.fetchall().

我们将在检索计数的同一循环中添加限制查询。 无需在键上重复两次,我们可以在同一循环中执行这两个操作。 但是,我们应注意cur.fetchall()的调用顺序。

We can overwrite query executions if we fail to immediately fetch their results. The cur.execute() command does not return read results, and it’s the user’s responsibility to request them. For example, the following query will only return the results of the LIMIT and not the COUNT:

如果我们无法立即获取查询结果,则可以覆盖查询执行。 cur.execute()命令不返回读取结果,用户有责任请求读取结果。 例如,以下查询将仅返回LIMIT的结果,而不返回COUNT

cur.execute("SELECT COUNT(*) FROM homeless_by_coc")
cur.execute("SELECT * FROM homless_by_coc LIMIT 100")
# Calling .fetchall() will only return the rows in the LIMIT query.
print(cur.fetchall())

cur.execute("SELECT COUNT(*) FROM homeless_by_coc")
cur.execute("SELECT * FROM homless_by_coc LIMIT 100")
# Calling .fetchall() will only return the rows in the LIMIT query.
print(cur.fetchall())

Let’s add those two queries into a single code block:

让我们将这两个查询添加到单个代码块中:

Putting it all together, we now have a general script that will return a human readable dictionary of all user created tables in a database.

放在一起,我们现在有了一个通用脚本,该脚本将返回数据库中所有用户创建的表的人类可读字典。

import psyocpg2
from psyocpg2.extensions import AsIs

conn = psycopg2.connect(host="127.0.0.1", dbname="postgres", user="postgres", password="password")
cur = conn.cursor()

cur.execute("SELECT oid, typname FROM pg_catalog.pg_type")
type_mappings = {
    int(oid): typname
    for oid, typname in cur.fetchall()
}

cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
table_names = [table[0] for table in cur.fetchall()]

readable_description = {}
for table in table_names:
    cur.execute("SELECT * FROM %s LIMIT 0", [AsIs(table)])
    readable_description[table] = dict(
        columns=[
            dict(
                name=col.name,
                type=type_mappings[col.type_code],
                length=col.internal_size
            )
            for col in cur.description
        ]
    )

import psyocpg2
from psyocpg2.extensions import AsIs

conn = psycopg2.connect(host="127.0.0.1", dbname="postgres", user="postgres", password="password")
cur = conn.cursor()

cur.execute("SELECT oid, typname FROM pg_catalog.pg_type")
type_mappings = {
    int(oid): typname
    for oid, typname in cur.fetchall()
}

cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
table_names = [table[0] for table in cur.fetchall()]

readable_description = {}
for table in table_names:
    cur.execute("SELECT * FROM %s LIMIT 0", [AsIs(table)])
    readable_description[table] = dict(
        columns=[
            dict(
                name=col.name,
                type=type_mappings[col.type_code],
                length=col.internal_size
            )
            for col in cur.description
        ]
    )

翻译自: https://www.pybloggers.com/2018/01/postgres-internals-building-a-description-tool/

postgres 工具

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值