百度大脑增值税发票识别使用攻略

作者:wangwei8638

一.平台接入

此步骤比较简单,不多阐述。可参照之前文档:

https://ai.baidu.com/forum/topic/show/943028

二.分析接口文档

  1. https://ai.baidu.com/docs#/OCR-API/5099e085

    (1)接口描述

识别并结构化返回增值税发票的各个字段及其对应值,包含了发票基础信息9项,货物相关信息12项,共30项结构化字段。

(2)请求说明

需要用到的信息有:

请求URL:https://aip.baidubce.com/rest/2.0/ocr/v1/vat_invoice

Header格式:Content-Type:application/x-www-form-urlencoded

请求参数:image, 图像数据,base64编码,要求base64编码后大小不超过4M,最短边至少15px,最长边最大4096px,支持jpg/png/bmp格式 。注意:图片需要base64编码、去掉编码头后再进行urlencode。

2.获取accesstoken

#client_id 为官网获取的AK, client_secret 为官网获取的SK
client_id =【百度云应用的AK】
client_secret =【百度云应用的SK】

#获取token
def get_token():
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + client_id + '&client_secret=' + client_secret
request = urllib.request.Request(host)
request.add_header('Content-Type', 'application/json; charset=UTF-8')
response = urllib.request.urlopen(request)
token_content = response.read()
if token_content:
token_info = json.loads(token_content.decode("utf-8"))
token_key = token_info['access_token']
return token_key

三.识别结果
在这里插入图片描述
识别结果输出:

在这里插入图片描述
四.源码共享

# -*- coding: utf-8 -*-

#!/usr/bin/env python

import urllib

import urllib.parse

import urllib.request

import base64

import json

#client_id 为官网获取的AK, client_secret 为官网获取的SK

client_id = '**************'

client_secret = '********************'



#获取token

def get_token():

    host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + client_id + '&client_secret=' + client_secret

    request = urllib.request.Request(host)

    request.add_header('Content-Type', 'application/json; charset=UTF-8')

    response = urllib.request.urlopen(request)

    token_content = response.read()

    if token_content:

        token_info = json.loads(token_content.decode("utf-8"))

        token_key = token_info['access_token']

    return token_key



     # 读取图片

def get_file_content(filePath):

    with open(filePath, 'rb') as fp:

        return fp.read()





#获取增值税发票信息

def get_license_plate(path):



    request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/vat_invoice"

   

    f = get_file_content(path)

    access_token=get_token()

    img = base64.b64encode(f)

    params = {"image": img}

    params = urllib.parse.urlencode(params).encode('utf-8')

    request_url = request_url + "?access_token=" + access_token

    request = urllib.request.Request(url=request_url, data=params)

    request.add_header('Content-Type', 'application/x-www-form-urlencoded')

    response = urllib.request.urlopen(request)

    content = response.read()

    if content:

        receipts = json.loads(content.decode("utf-8"))

        strover = '识别结果:\n'

        words_result = receipts['words_result']

          # 发票类型

        InvoiceType = words_result['InvoiceType']

        strover += '  发票类型:{} \n '.format(InvoiceType)

          # 发票代码:

        InvoiceCode = words_result['InvoiceCode']

        strover += '  发票代码:{} \n '.format(InvoiceCode)

          # 发票号码:

        InvoiceNum = words_result['InvoiceNum']

        strover += '  发票号码:{} \n '.format(InvoiceNum)

         # 开票人

        NoteDrawer = words_result['NoteDrawer']

        strover += '  开票人:{} \n '.format(NoteDrawer)

          # 开票日期

        InvoiceDate = words_result['InvoiceDate']

        strover += '  开票日期:{} \n '.format(InvoiceDate)

          # 复核人

        Checker = words_result['Checker']

        strover += '  复核人:{} \n '.format(Checker)

          # 收款人

        Payee = words_result['Payee']

        strover += '  收款人:{} \n  \n '.format(Payee)

        # 销售方名称

        Unit_name = words_result['SellerName']

        strover += '  销售方名称:{} \n '.format(Unit_name)

          # 销售方纳税人识别号:

        SellerRegisterNum = words_result['SellerRegisterNum']

        strover += '  销售方纳税人识别号:{} \n '.format(SellerRegisterNum)

          # 销售方银行

        SellerBank = words_result['SellerBank']

        strover += '  销售方银行:{} \n '.format(SellerBank)

          # 购买方地址

        SellerAddress = words_result['SellerAddress']

        strover += '  销售方地址:{} \n  \n '.format(SellerAddress)

          # 购买方名称

        PurchaserName = words_result['PurchaserName']

        strover += '  购买方名称:{} \n '.format(PurchaserName)

          # PurchaserRegisterNum

        PurchaserRegisterNum = words_result['PurchaserRegisterNum']

        strover += '  购买方纳税人识别号:{} \n '.format(PurchaserRegisterNum)

          # 购买方银行

        PurchaserBank = words_result['PurchaserBank']

        strover += '  购买方银行:{} \n '.format(PurchaserBank)

          # 购买方地址

        PurchaserAddress = words_result['PurchaserAddress']

        strover += '  购买方地址:{} \n '.format(PurchaserAddress)

          # 服务名称

        CommodityName = words_result['CommodityName']

        strover += '  服务名称:{} \n '.format(CommodityName[0]['word'])

          # 价税合计

        AmountInFiguers = words_result['AmountInFiguers']

        strover += '  价税合计:{} \n '.format(AmountInFiguers)

#        print (content.decode("utf-8"))

        print (strover)

        return content

    else:

        return ''



image_path='F:\paddle\p3.jpg'

get_license_plate(image_path)

五.结论

支持对增值税普票或专票所有30个字段进行结构化识别,包括发票基本信息、销售方及购买方信息、商品信息、价税信息等,其中四要素识别准确率超过99.9%。

发布了5 篇原创文章 · 获赞 1 · 访问量 2万+
展开阅读全文

没有更多推荐了,返回首页

分享到微信朋友圈

×

扫一扫,手机浏览