Skip to content

表格提取工具使用指南

注意:在学习使用不同函数之前,建议先阅读请求描述,了解基本的PDF处理流程。使用不同函数时,可以在上传文件时设置各自特殊的参数。其他基本步骤一致。

表格提取:

java
 {
        "lang": 8,
        }

所需参数:

lang:OCR识别语言,支持的类型和定义,1: 中文简体、2: 中文繁体、3: 英语、4: 韩语、5: 日语、6: 拉丁语、7: 梵文、8: 自动。

Java 示例:

您需要将 apiKey 替换为您从控制台获取的 publicKey ,将 file 替换为您要转换的文件 ,language 替换为您想要的接口错误提示语言类型。

java
import java.io.*;
import okhttp3.*;
public class main {
  public static void main(String []args) throws IOException{
    OkHttpClient client = new OkHttpClient().newBuilder()
      .build();
    MediaType mediaType = MediaType.parse("text/plain");
    RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
      .addFormDataPart("file","{{file}}",
 RequestBody.create(MediaType.parse("application/octet-stream"),
                                          new File("<file>")))
      .addFormDataPart("language","{{language}}")
      .addFormDataPart("password","")
      .addFormDataPart("parameter","{    \"lang\": 8    }")
      .build();
    Request request = new Request.Builder()
      .url("https://api-server.compdf.com/server/v1/process/documentAI/tableRec")
      .method("POST", body)
      .addHeader("x-api-key", "{{apiKey}}")
      .build();
    Response response = client.newCall(request).execute();
  }
}

结果:

文件类型文件描述
.json完成后的 Json 文件

内容:

参数名描述
cost表格识别花费时间
type表格的类型
angle表格旋转的角度
width表格所占宽度
height表格所占高度
rows表格行数
cols表格列数
position表格的矩形框位置信息
height_of_rows表格每一行的高度
width_of_cols表格每一列的宽度
table_cells表格包含的所有单元格信息
table_cells: start_row单元格起始行
table_cells: end_row单元格结束行
table_cells: start_col单元格起始列
table_cells: end_col单元格结束列
table_cells: text单元格内包括的文字
table_cells: position单元格的矩形框位置信息
table_cells: lines单元格内包括的文本行信息
table_cells: lines: text文本行文字
table_cells: lines: score文本行识别的分数
table_cells: lines: position文本行位置信息
java
{
  "cost": 7566,
  "json_items": [
    {
      "type": "table_with_line",
      "angle": 0.0,
      "width": 488,
      "height": 191,
      "rows": 4,
      "cols": 4,
      "position": [
        114,
        657,
        602,
        657,
        602,
        848,
        114,
        848
      ],
      "height_of_rows": [
        65,
        30,
        31,
        36
      ],
      "width_of_cols": [
        122,
        122,
        118,
        122
      ],
      "table_cells": [
        {
          "start_row": 1,
          "end_row": 1,
          "start_col": 1,
          "end_col": 1,
          "text": "",
          "position": [
            2,
            2,
            124,
            2,
            124,
            67,
            2,
            67
          ],
          "lines": []
        },
        {
          "start_row": 2,
          "end_row": 2,
          "start_col": 1,
          "end_col": 1,
          "text": "Absorbed",
          "position": [
            2,
            64,
            125,
            64,
            125,
            95,
            2,
            95
          ],
          "lines": [
            {
              "text": "Absorbed",
              "score": 1.0,
              "position": [
                29,
                65,
                99,
                65,
                99,
                88,
                29,
                88
              ]
            }
          ]
        }
      ]
    }
  ],
  "html_items": [
    "<table border=\ "1\" width='488px' height='191px'>\n
  <tr>
    <th width='122px' height='65px'></th>
    <th width='122px' height='65px' style=\ "white-space: pre-line\">Absorbed</th>
    <th width='118px' height='65px' style=\ "white-space: pre-line\">Neuter</th>
    <th width='122px' height='65px' style=\ "white-space: pre-line\">Fatigue</th>
  </tr>\n
  <tr>
    <th width='122px' height='30px' style=\ "white-space: pre-line\">Absorbed</th>
    <th width='122px' height='30px'>
    </th>
    <th width='118px' height='30px' style=\ "white-space: pre-line\">2</th>
    <th width='122px' height='30px'>
    </th>
  </tr>\n
  <tr>
    <th width='122px' height='31px' style=\ "white-space: pre-line\">Neuter</th>
    <th width='122px' height='31px'>
    </th>
    <th width='118px' height='31px'>
    </th>
    <th width='122px' height='31px'>
    </th>
  </tr>\n
  <tr>
    <th width='122px' height='36px' style=\ "white-space: pre-line\">Fatigue</th>
    <th width='122px' height='36px'>
    </th>
    <th width='118px' height='36px'>
    </th>
    <th width='122px' height='36px' style=\ "white-space: pre-line\">8</th>\t</tr>\n</table>", "
<table border=\ "1\" width='489px' height='166px'>\n
  <tr>
    <th width='123px' height='61px' style=\ "white-space: pre-line\">Expression</th>
    <th width='117px' height='61px' style=\ "white-space: pre-line\">Image Num</th>
    <th width='118px' height='61px' style=\ "white-space: pre-line\">Correct</th>
    <th width='125px' height='61px' style=\ "white-space: pre-line\">Recognition Rate</th>
  </tr>\n
  <tr>
    <th width='123px' height='31px' style=\ "white-space: pre-line\">Absorbed</th>
    <th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
    <th width='118px' height='31px' style=\ "white-space: pre-line\">7</th>
    <th width='125px' height='31px' style=\ "white-space: pre-line\">77.8%</th>
  </tr>\n
  <tr>
    <th width='123px' height='30px' style=\ "white-space: pre-line\">Neuter</th>
    <th width='117px' height='30px' style=\ "white-space: pre-line\">9</th>
    <th width='118px' height='30px'>
    </th>
    <th width='125px' height='30px' style=\ "white-space: pre-line\">55.6%</th>
  </tr>\n
  <tr>
    <th width='123px' height='31px' style=\ "white-space: pre-line\">Fatigue</th>
    <th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
    <th width='118px' height='31px'>
    </th>
    <th width='125px' height='31px' style=\ "white-space: pre-line\">88.9%</th>
  </tr>\n
  <tr>
    <th width='483px' height='33px' colspan=\ "4\" style=\ "white-space: pre-line\">Average recognition rate: 74.1%</th>\t</tr>\n</table>"
  ]
}