shp批量导入Postgres并输出值域

本文链接：https://blog.csdn.net/sinwingyeah/article/details/132625568

工作中常常会通过各种提资收集到大量shp以及dwg文件，经常会遇到shp中属性表字段命名不规范问题。今天先记录一下shp文件的批量处理过程。

首先本地环境包括：
数据库：PostgreSQL + PostGIS
操作系统：MacOS
GIS工具辅助：QGIS

PostGIS提供了shp2pgsql工具，shell可直接调用，参考：
http://www.bostongis.com/pgsql2shp_shp2pgsql_quickguide_20.bqg

弄成shell脚本可批量操作

#!/bin/bash

# 定义数据库连接参数
HOST="localhost"
DBNAME="postgres"
USER="postgres"
PASSWORD="yourpassword"
DIR="/path/to/data" # 请替换为你的Shapefile所在的目录

# 遍历目录中的所有.shp文件
find "$DIR" -name "*.shp" | while IFS= read -r shp; do
    # 使用shp2pgsql将Shapefile转换为SQL文件
    sql_file="${shp%.*}.sql" # 替换.shp扩展名为.sql
    # shp2pgsql -s your_srid -I $shp > $sql_file # 替换your_srid为你的坐标系统SRID
    shp2pgsql -I "$shp" > "$sql_file"

    # 使用psql将SQL文件导入到数据库中
    # 注意：以下命令使用了密码环境变量，这在某些情况下可能不是最安全的方式
    PGPASSWORD=$PASSWORD psql -h $HOST -U $USER -d $DBNAME -a -f "$sql_file" >> /dev/null

    # 输出已处理文件的信息
    echo "Processed: $shp"
done

全部执行成功之后，可以在pg里面查询所有的元数据，然后再关联上相应的值域分析结果。其中值域分析通过一个函数来查询。

CREATE OR REPLACE FUNCTION get_field_summary(table_name_input TEXT)
RETURNS TABLE (
    col_name TEXT,
    data_typ TEXT,
    max_value TEXT,
    min_value TEXT,
    null_count BIGINT,
    total_count BIGINT,
    distinct_count BIGINT
) AS $$
DECLARE
    col_info RECORD;
    sql TEXT;
BEGIN
    FOR col_info IN (SELECT column_name, data_type
                     FROM information_schema.columns
                     WHERE table_name = table_name_input 
                       AND table_schema = 'public')
    LOOP
        sql := format(
            'SELECT
                %L AS column_name,
                %L AS data_type,
                MAX(%I)::TEXT AS max_value,
                MIN(%I)::TEXT AS min_value,
                COUNT(*) FILTER (WHERE %I IS NULL) AS null_count,
                COUNT(*) AS total_count,
                COUNT(DISTINCT %I) AS distinct_count
             FROM %I',
            col_info.column_name,
            col_info.data_type,
            col_info.column_name,
            col_info.column_name,
            col_info.column_name,
            col_info.column_name,
            table_name_input
        );

        RETURN QUERY EXECUTE sql;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

跟表与字段信息进行关联查询，一次性导出值域分析结果。注意field_info里面的查询可以过滤只保留你刚刚入库的shp名称。

WITH field_info AS (
    SELECT
        table_name,
        column_name,
        data_type,
        is_nullable,
        column_default
    FROM
        information_schema.columns
    WHERE
        table_schema = 'public'
)

SELECT
    fi.table_name,
    fi.column_name,
    fi.data_type,
    fi.is_nullable,
    fi.column_default,
    gs.max_value,
    gs.min_value,
    gs.null_count,
    gs.total_count,
    gs.distinct_count
FROM
    field_info fi
JOIN LATERAL get_field_summary(fi.table_name) gs ON gs.col_name = fi.column_name
ORDER BY
    fi.table_name, fi.column_name;