智能表格提取
注意:在学习使用不同函数之前,建议先阅读请求描述,了解基本的PDF处理流程。使用不同函数时,可以在上传文件时设置各自特殊的参数。其他基本步骤一致。
智能表格提取:
java
{
"lang": "auto",
}
必需参数
lang:支持的类型和定义
- auto - 自动分类语言
- english - 英文
- chinese - 简体中文
- chinese_tra - 繁体中文
- korean - 韩文
- japanese - 日语
- latin - 拉丁文
- devanagari - 梵文字母
示例
授权
您需要将认证响应中的 和 替换为从控制台获取的 publicKey 和 secretKey,并使用 accessToken。
curlcurl --location --request POST 'https://api-server.compdf.com/server/v1/oauth/token' \ --header 'Content-Type: application/json' \ --data-raw '{ "publicKey": "publicKey", "secretKey": "secretKey" }'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, "{\n \"publicKey\": \"{{public_key}}\",\n \"secretKey\": \"{{secret_key}}\"\n}"); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/oauth/token") .method("POST", body) .build(); Response response = client.newCall(request).execute(); } }
创建任务
您需要将 替换为上一步获取的 accessToken,将 ***替换为所需的界面和任务错误消息语言类型。请求成功后,您将从响应中收到 taskId。
curlcurl --location --request GET 'https://api-server.compdf.com/server/v1/task/documentAI/tableRec' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/task/documentAI/tableRec?language={{language}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
上传文件
将 替换为您要转换的文件,将 替换为您上一步获取到的 taskId,将 替换为您需要的界面错误信息语言类型,将 替换为您第一步获取到的 accessToken。
-支持的图像格式: jpg,jpeg,png,bmp
curlcurl --location --request POST 'https://api-server.compdf.com/server/v1/file/upload' \ --header 'Authorization: Bearer accessToken' \ --form 'file=@"test.pdf"' \ --form 'taskId="taskId"' \ --form 'password=""' \ --form 'parameter="{ \"lang\": \"auto\" }"'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM) .addFormDataPart("file","{{file}}", RequestBody.create(MediaType.parse("application/octet-stream"), new File("<file>"))) .addFormDataPart("taskId","{{taskId}}") .addFormDataPart("language","{{language}}") .addFormDataPart("parameter","{ \"lang\": \"auto\" }") .build(); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/file/upload") .method("POST", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
执行任务
将 替换为 创建任务 步骤中获取的 taskId,将 替换为第一步获取的 access_token,将 替换为所需的界面错误信息语言类型。
curlcurl --location -g --request GET 'https://api-server.compdf.com/server/v1/execute/start?taskId=taskId' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/execute/start?taskId={{taskId}}&language={{language}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
获取任务信息
将 替换为 创建任务 步骤中获取的 taskId,将 替换为第一步中获取的 access_token。
curlcurl --location -g --request GET 'https://api-server.compdf.com/server/v1/task/taskInfo?taskId=taskId' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/task/taskInfo?taskId={{taskId}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
结果
文件类型 | 说明 |
---|---|
.JSON | 表单识别结果 |
内容
参数 | 说明 |
---|---|
cost | 表单识别所花费的时间 |
type | 表单类型 |
angle | 表单旋转的角度 |
width | 表单宽度 |
height | 表单高度 |
rows | 表单中的行数 |
cols | 表单中的列数 |
position | 表单的矩形框位置 |
height_of_rows | 表单每行的高度 |
width_of_cols | 表单每列的宽度 |
table_cells | 表单中所有单元格的信息 |
table_cells: start_row | 单元格的起始行 |
table_cells: end_row | 单元格的结束行 |
table_cells: start_col | 单元格的起始列 |
table_cells: end_col | 单元格的结束列 |
table_cells: text | 单元格中的文本 |
table_cells: position | 单元格的矩形框位置信息 |
table_cells: lines | 单元格中包含的文本行 |
table_cells: lines: text | 文本行 |
table_cells: lines: score | 文本行标识的分数 |
table_cells: lines: position | 文本行位置信息 |
java
{
"cost": 7566,
"json_items": [
{
"type": "table_with_line",
"angle": 0.0,
"width": 488,
"height": 191,
"rows": 4,
"cols": 4,
"position": [
114,
657,
602,
657,
602,
848,
114,
848
],
"height_of_rows": [
65,
30,
31,
36
],
"width_of_cols": [
122,
122,
118,
122
],
"table_cells": [
{
"start_row": 1,
"end_row": 1,
"start_col": 1,
"end_col": 1,
"text": "",
"position": [
2,
2,
124,
2,
124,
67,
2,
67
],
"lines": []
},
{
"start_row": 2,
"end_row": 2,
"start_col": 1,
"end_col": 1,
"text": "Absorbed",
"position": [
2,
64,
125,
64,
125,
95,
2,
95
],
"lines": [
{
"text": "Absorbed",
"score": 1.0,
"position": [
29,
65,
99,
65,
99,
88,
29,
88
]
}
]
}
]
}
],
"html_items": [
"<table border=\ "1\" width='488px' height='191px'>\n
<tr>
<th width='122px' height='65px'></th>
<th width='122px' height='65px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='118px' height='65px' style=\ "white-space: pre-line\">Neuter</th>
<th width='122px' height='65px' style=\ "white-space: pre-line\">Fatigue</th>
</tr>\n
<tr>
<th width='122px' height='30px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='122px' height='30px'>
</th>
<th width='118px' height='30px' style=\ "white-space: pre-line\">2</th>
<th width='122px' height='30px'>
</th>
</tr>\n
<tr>
<th width='122px' height='31px' style=\ "white-space: pre-line\">Neuter</th>
<th width='122px' height='31px'>
</th>
<th width='118px' height='31px'>
</th>
<th width='122px' height='31px'>
</th>
</tr>\n
<tr>
<th width='122px' height='36px' style=\ "white-space: pre-line\">Fatigue</th>
<th width='122px' height='36px'>
</th>
<th width='118px' height='36px'>
</th>
<th width='122px' height='36px' style=\ "white-space: pre-line\">8</th>\t</tr>\n</table>", "
<table border=\ "1\" width='489px' height='166px'>\n
<tr>
<th width='123px' height='61px' style=\ "white-space: pre-line\">Expression</th>
<th width='117px' height='61px' style=\ "white-space: pre-line\">Image Num</th>
<th width='118px' height='61px' style=\ "white-space: pre-line\">Correct</th>
<th width='125px' height='61px' style=\ "white-space: pre-line\">Recognition Rate</th>
</tr>\n
<tr>
<th width='123px' height='31px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='31px' style=\ "white-space: pre-line\">7</th>
<th width='125px' height='31px' style=\ "white-space: pre-line\">77.8%</th>
</tr>\n
<tr>
<th width='123px' height='30px' style=\ "white-space: pre-line\">Neuter</th>
<th width='117px' height='30px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='30px'>
</th>
<th width='125px' height='30px' style=\ "white-space: pre-line\">55.6%</th>
</tr>\n
<tr>
<th width='123px' height='31px' style=\ "white-space: pre-line\">Fatigue</th>
<th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='31px'>
</th>
<th width='125px' height='31px' style=\ "white-space: pre-line\">88.9%</th>
</tr>\n
<tr>
<th width='483px' height='33px' colspan=\ "4\" style=\ "white-space: pre-line\">Average recognition rate: 74.1%</th>\t</tr>\n</table>"
]
}