Intelligent Table Extraction
Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.
Intelligent Table Extraction:
{
"lang": "auto" ,
}
Needed Parameters
lang:Supported types and definitions
- auto - Automatic classification language
- english - English
- chinese - Simplified Chinese
- chinese_tra - Traditional Chinese
- korean - Korean
- japanese - Japanese
- latin - Latin
- devanagari - Sanskrit alphabet
Example
Authentication
You need to replace and with accessToken in the publicKey and secretKey authentication getback values you get from the console.
curlcurl --location --request POST 'https://api-server.compdf.com/server/v1/oauth/token' \ --header 'Content-Type: application/json' \ --data-raw '{ "publicKey": "publicKey", "secretKey": "secretKey" }'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, "{\n \"publicKey\": \"{{public_key}}\",\n \"secretKey\": \"{{secret_key}}\"\n}"); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/oauth/token") .method("POST", body) .build(); Response response = client.newCall(request).execute(); } }
Create Task
You need to replace with the accessToken which was obtained from the previous step, and replace with the language type you want to display the error information. After replacing them, you will get the taskId in the response data.
curlcurl --location --request GET 'https://api-server.compdf.com/server/v1/task/documentAI/tableRec' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/task/documentAI/tableRec?language={{language}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Upload Files
Replace with the file you want to convert, with the taskId obtained in the previous step, with the language type you want to display the error information, and with the accessToken obtained in the first step.
-Supported image formats: jpg,jpeg,png,bmp
curlcurl --location --request POST 'https://api-server.compdf.com/server/v1/file/upload' \ --header 'Authorization: Bearer accessToken' \ --form 'file=@"test.pdf"' \ --form 'taskId="taskId"' \ --form 'password=""' \ --form 'parameter="{ \"lang\": \"auto\" }"'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM) .addFormDataPart("file","{{file}}", RequestBody.create(MediaType.parse("application/octet-stream"), new File("<file>"))) .addFormDataPart("taskId","{{taskId}}") .addFormDataPart("language","{{language}}") .addFormDataPart("parameter","{ \"lang\": \"auto\" }") .build(); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/file/upload") .method("POST", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Process Files
Replace with the taskId you obtained from the Create task, and with the accessToken obtained in the first step, and replace with the language type you want to display the error information.
curlcurl --location -g --request GET 'https://api-server.compdf.com/server/v1/execute/start?taskId=taskId' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/execute/start?taskId={{taskId}}&language={{language}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Get Task Information
Replace with you from Create the task obtained in the taskId, replaced by access_token obtained in the first step.
curlcurl --location -g --request GET 'https://api-server.compdf.com/server/v1/task/taskInfo?taskId=taskId' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/task/taskInfo?taskId={{taskId}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Result
File Type | Description |
---|---|
.JSON | Form Recognition results |
Content
Parameter | Description |
---|---|
cost | time spent on form identification |
type | type of form |
angle | The angle at which the form is rotated |
width | width of the form |
height | height of the form |
rows | number of rows in the form |
cols | Number of columns in the form |
position | The rectangular box position of the form |
height_of_rows | height of each row of the form |
width_of_cols | width of each column of the form |
table_cells | information about all cells in the form |
table_cells: start_row | The start row of a cell |
table_cells: end_row | The end row of a cell |
table_cells: start_col | The start column of a cell |
table_cells: end_col | The end column of a cell |
table_cells: text | Text in cells |
table_cells: position | Rectangular box position information for cells |
table_cells: lines | The text lines included in the cell |
table_cells: lines: text | The text line |
table_cells: lines: score | The score identified by the text line |
table_cells: lines: position | text line position information |
{
"cost": 7566,
"json_items": [
{
"type": "table_with_line",
"angle": 0.0,
"width": 488,
"height": 191,
"rows": 4,
"cols": 4,
"position": [
114,
657,
602,
657,
602,
848,
114,
848
],
"height_of_rows": [
65,
30,
31,
36
],
"width_of_cols": [
122,
122,
118,
122
],
"table_cells": [
{
"start_row": 1,
"end_row": 1,
"start_col": 1,
"end_col": 1,
"text": "",
"position": [
2,
2,
124,
2,
124,
67,
2,
67
],
"lines": []
},
{
"start_row": 2,
"end_row": 2,
"start_col": 1,
"end_col": 1,
"text": "Absorbed",
"position": [
2,
64,
125,
64,
125,
95,
2,
95
],
"lines": [
{
"text": "Absorbed",
"score": 1.0,
"position": [
29,
65,
99,
65,
99,
88,
29,
88
]
}
]
}
]
}
],
"html_items": [
"<table border=\ "1\" width='488px' height='191px'>\n
<tr>
<th width='122px' height='65px'></th>
<th width='122px' height='65px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='118px' height='65px' style=\ "white-space: pre-line\">Neuter</th>
<th width='122px' height='65px' style=\ "white-space: pre-line\">Fatigue</th>
</tr>\n
<tr>
<th width='122px' height='30px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='122px' height='30px'>
</th>
<th width='118px' height='30px' style=\ "white-space: pre-line\">2</th>
<th width='122px' height='30px'>
</th>
</tr>\n
<tr>
<th width='122px' height='31px' style=\ "white-space: pre-line\">Neuter</th>
<th width='122px' height='31px'>
</th>
<th width='118px' height='31px'>
</th>
<th width='122px' height='31px'>
</th>
</tr>\n
<tr>
<th width='122px' height='36px' style=\ "white-space: pre-line\">Fatigue</th>
<th width='122px' height='36px'>
</th>
<th width='118px' height='36px'>
</th>
<th width='122px' height='36px' style=\ "white-space: pre-line\">8</th>\t</tr>\n</table>", "
<table border=\ "1\" width='489px' height='166px'>\n
<tr>
<th width='123px' height='61px' style=\ "white-space: pre-line\">Expression</th>
<th width='117px' height='61px' style=\ "white-space: pre-line\">Image Num</th>
<th width='118px' height='61px' style=\ "white-space: pre-line\">Correct</th>
<th width='125px' height='61px' style=\ "white-space: pre-line\">Recognition Rate</th>
</tr>\n
<tr>
<th width='123px' height='31px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='31px' style=\ "white-space: pre-line\">7</th>
<th width='125px' height='31px' style=\ "white-space: pre-line\">77.8%</th>
</tr>\n
<tr>
<th width='123px' height='30px' style=\ "white-space: pre-line\">Neuter</th>
<th width='117px' height='30px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='30px'>
</th>
<th width='125px' height='30px' style=\ "white-space: pre-line\">55.6%</th>
</tr>\n
<tr>
<th width='123px' height='31px' style=\ "white-space: pre-line\">Fatigue</th>
<th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='31px'>
</th>
<th width='125px' height='31px' style=\ "white-space: pre-line\">88.9%</th>
</tr>\n
<tr>
<th width='483px' height='33px' colspan=\ "4\" style=\ "white-space: pre-line\">Average recognition rate: 74.1%</th>\t</tr>\n</table>"
]
}