基于python以底层形式读取shp

Rur1

已于 2023-11-18 18:01:02 修改

阅读量772

点赞数 4

文章标签： python 开发语言 arcgis 经验分享笔记

于 2023-11-18 17:59:01 首次发布

本文链接：https://blog.csdn.net/weixin_65627866/article/details/134480946

版权

写这篇文章的起因：GNSS课的老师喊我们编程读取shp文件，不能调库，不用pyshp或者其它的库，就以读二进制的形式，顺便深刻地理解一下shp文件到底长啥样。

我曾在csdn用力寻找相关以二进制的形式读取文件的字节，奈何所有的教程要么调包，要么C++，没有一样既符合要求又符合我自身的现状（C++编程能力约等于不会），我也费尽千辛万苦找gpt老师，但奈何其功力不足，也沦为掉包侠的奴隶，遂弃用上述两种方案；

因此，我参考了几篇CSDN上的文章后，经过3天的折腾，终于给整出来了：

总体思想是：遇到大端序逆着读，遇到小端序顺着读
先读头100个字节的坐标头文件，再去读剩下的shp实体信息（分为记录头和记录内容两个部分）
对不同的shp实体采用不同的实体信息读取方式，依据记录头判断实体是哪一种（此处以polygon为例，它的编号为5），再对应编写读取该实体的记录内容的代码

附上代码：

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
@Project ：GNSS课程 
@File    ：读取二进制文件shp.py
@IDE     ：PyCharm 
@Author  ：Rur1sama
@Date    ：2023/11/18 17:41 
'''
import struct


def OnChangeByteOrder(indata):
    hex_str = format(indata, '08x')
    hex_str = hex_str[6:8] + hex_str[4:6] + hex_str[2:4] + hex_str[0:2]
    return int(hex_str, 16)


def OnReadLineShp(ShpFile_fp):
    num = 0
    csvFilePath = "output.csv"

    with open(csvFilePath, "w") as csvFile:
        csvFile.write("X,Y,Part\n")  # 写入CSV文件头部

        # 从ShpFile_fp中读取RecordNumber
        FileCode_bytes = ShpFile_fp.read(4)  # 文件码 应该是\x00\x00,固定为十进制的9994
        FileCode = int.from_bytes(FileCode_bytes, byteorder='little')
        FileCode = OnChangeByteOrder(FileCode)  # 改方向

        # 跳过字节范围(unused)
        ShpFile_fp.seek(20, 1)  # 从当前位置向前移动16个字节

        # 从ShpFile_fp中读取FileLength
        FileLength_bytes = ShpFile_fp.read(4)  # FileLength是指总文件长度，大小为：头文件长度 + （记录头长度 + 记录内容长度） * 记录数
        FileLength = int.from_bytes(FileLength_bytes, byteorder='little')
        FileLength = OnChangeByteOrder(FileLength)  # 改方向

        # 从ShpFile_fp中读取dimension
        dimension_bytes = ShpFile_fp.read(4)
        dimension = int.from_bytes(dimension_bytes, byteorder='little')

        # 从ShpFile_fp中读取shapeType
        shapeType_bytes = ShpFile_fp.read(4)
        shapeType = int.from_bytes(shapeType_bytes, byteorder='little')

        # 读取Box
        Box = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
        for i in range(8):
            Box[i] = struct.unpack('d', ShpFile_fp.read(8))[0]

        # ----------------------------------此处开始读shp实体信息（记录头和记录内容）---------------------------------------------------
        index = 0   #   初始化实体信息编号
        # ——————————————————————————————————读取记录号（Record Number）和记录内容长度(Content Length)——————————————————————————————————
        while True:

            # 每个记录的记录头都储存着记录号（Record Number）和记录内容长度(Content Length)，位序都是big，类型是int，记录头的长度是 8 个字节，记录号从 1 开始。
            Record_Number_bytes = ShpFile_fp.read(4)  # 代表当前记录的编号
            if not Record_Number_bytes:
                break  # 读取完所有记录时退出循环
            Record_Number = int.from_bytes(Record_Number_bytes, byteorder='little')
            Record_Number = OnChangeByteOrder(Record_Number)  # 改方向
            print("当前记录点数:", Record_Number)
            Content_Length_bytes = ShpFile_fp.read(4)  # 代表当前记录内容的长度，不包括RecordNumber和ContentLength本身的长度
            Content_Length = int.from_bytes(Content_Length_bytes, byteorder='little')
            Content_Length = OnChangeByteOrder(Content_Length)  # 改方向
            print("当前记录长度:", Content_Length)
            # ——————————————————————————————————读取记录号（Record Number）和记录内容长度(Content Length)——————————————————————————————————
            # ——————————————————————————————————读取记录内容————————————————————————————————————————————————————————————————————————————

            Newshpty = ShpFile_fp.read(4)
            Newshpty = int.from_bytes(Newshpty, byteorder='little')

            if Newshpty == 5:
                NewBox = [0.0, 0.0, 0.0, 0.0]
                for i in range(4):
                    NewBox[i] = struct.unpack('d', ShpFile_fp.read(8))[0]

                NumParts_bytes = ShpFile_fp.read(4)
                NumParts = int.from_bytes(NumParts_bytes,
                                          byteorder='little')  # NumParts代表Polygon的子环数，一个Polygon可能由多个环。
                print('有', NumParts, '个子环')
                NumPoints_bytes = ShpFile_fp.read(4)
                NumPoints = int.from_bytes(NumPoints_bytes, byteorder='little')
                print('共有', NumPoints, '个折点')

                # 读取Parts，Parts是一个int类型数组，数组元素个数等于NumParts，每个元素记录了每个子环的坐标在Points数组中的起始位置。
                Parts = []
                for i in range(NumParts):
                    Parts.append(int.from_bytes(ShpFile_fp.read(4), byteorder='little'))

                for i in range(NumParts):
                    if i != NumParts - 1:
                        pointNum = Parts[i + 1] - Parts[i]  # 每个子环的长度 ，非最后一个环
                    else:
                        pointNum = NumPoints - Parts[i]  # 最后一个环

                    PointsX = []
                    PointsY = []
                    Polygon_Part = []

                    for j in range(pointNum):
                        PointsX.append(struct.unpack('d', ShpFile_fp.read(8))[0])
                        PointsY.append(struct.unpack('d', ShpFile_fp.read(8))[0])
                        Polygon_Part.append(index) # 输入多边形编号

                    # 将坐标点写入CSV文件
                    for j in range(pointNum):
                        csvFile.write("{:.6f},{:.6f},{}\n".format(PointsX[j], PointsY[j], Polygon_Part[j]))
                    index += 1  # 当前记录加一，继续读下面一个多边形
print("坐标点已成功写入CSV文件.")


# 使用示例
with open("F:\作业\GNSS\两个多边形\第三组两个多边形边界.shp", "rb") as ShpFile_fp:
    OnReadLineShp(ShpFile_fp)

至于代码的解释和shp文件内容长啥样自己看注释吧www😏

参考文章：

shp系列（二）——利用C++进行shp文件的读（打开）

shp格式详解（一）

Rur1

关注

4
点赞
踩
4

收藏

觉得还不错? 一键收藏
3
评论
基于python以底层形式读取shp

我曾在csdn用力寻找相关以二进制的形式读取文件的字节，奈何所有的教程要么调包，要么C++，没有一样既符合要求又符合我自身的现状（C++编程能力约等于不会），我也费尽千辛万苦找gpt老师，但奈何其功力不足，也沦为掉包侠的。写这篇文章的起因：GNSS课的老师喊我们编程读取shp文件，不能调库，不用pyshp或者其它的库，就以读二进制的形式，顺便深刻地理解一下shp文件到底长啥样。至于代码的解释和shp文件内容长啥样自己看注释吧www😏。，遂弃用上述两种方案；
复制链接

扫一扫