Table Extraction Tool Guide
Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.
Table Extraction :
{
"lang": 8 ,
}
Request Parameter:
lang
: OCR recognition language. Supported types and definitions: 1: Simplified Chinese; 2: Traditional Chinese; 3: English; 4: Korean; 5: Japanese; 6: Latin; 7: Sanskrit; 8: Auto.
Request Example:
Replace apiKey with the publicKey obtained from the dashboard, file with the file you want to convert, and language with your preferred interface error prompt language type.
curl --location --request POST 'https://api-server.compdf.com/server/v2/process/documentAI/tableRec' \
--header 'x-api-key: apiKey' \
--header 'Accept: */*' \
--header 'Connection: keep-alive' \
--header 'Content-Type: multipart/form-data' \
--form 'file=@"file"' \
--form 'password="" \
--form 'parameter="{ \"ocrRecognitionLang\": \"AUTO\" }"' \
--form 'language="1"'
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
.addFormDataPart("file","{{file}}",
RequestBody.create(MediaType.parse("application/octet-stream"),
new File("<file>")))
.addFormDataPart("language","{{language}}")
.addFormDataPart("password","")
.addFormDataPart("parameter","{ \"ocrRecognitionLang\": \"AUTO\" }")
.build();
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v2/process/documentAI/tableRec")
.method("POST", body)
.addHeader("x-api-key", "{{apiKey}}")
.build();
Response response = client.newCall(request).execute();
}
}
Response Information:
A successful request returns an HTTP 200 OK
status code and a JSON response body showing the order details.
Response type:application/json
Response Parameter | Data Type | Description |
---|---|---|
code | String | HTTP request status, "200" indicates success |
message | String | Request message |
data | Object | Return result |
+taskId | String | Task ID |
+taskFileNum | int | Number of files processed in the task |
+taskSuccessNum | int | Number of files successfully processed in the task |
+taskFailNum | int | Number of files failed in the task |
+taskStatus | String | Task status |
+assetTypeId | int | Used asset type ID |
+taskCost | int | Task cost |
+taskTime | int | Task duration |
+sourceType | String | Original format |
+targetType | String | Target format |
+fileInfoDTOList | Array | Task file information |
++fileKey | String | File key |
++taskId | String | Task ID |
++fileName | String | Original file name |
++downFileName | String | Download file name |
++fileUrl | String | Original file URL |
++downloadUrl | String | Processed result file download URL |
++sourceType | String | Original format |
++targetType | String | Target format |
++fileSize | int | File size |
++convertSize | int | Processed result file size |
++convertTime | int | Processing time |
++status | String | File processing status |
++failureCode | String | File processing failure error code |
++failureReason | String | File processing failure description |
++fileParameter | String | Processing parameter |
Response Example:
"code": "200",
"msg": "success",
"data": {
"taskId": "f416dbcf-0c10-4f93-ab9e-a835c1f5dba1",
"taskFileNum": 1,
"taskSuccessNum": 1,
"taskFailNum": 0,
"taskStatus": "<taskStatus>",
"assetTypeId": 0,
"taskCost": 1,
"taskTime": 1,
"sourceType": "<sourceType>",
"targetType": "<targetType>",
"fileInfoDTOList": [
{
"fileKey": "<fileKey>",
"taskId": "<taskId>",
"fileName": "<fileName>",
"downFileName": "<downFileName>",
"fileUrl": "<fileUrl>",
"downloadUrl": "<downloadUrl>",
"sourceType": "<sourceType>",
"targetType": "<targetType>",
"fileSize": 24475,
"convertSize": 6922,
"convertTime": 8,
"status": "<status>",
"failureCode": "",
"failureReason": "",
"fileParameter": "<fileParameter>"
}
]
}
Result:
File Type | Description |
---|---|
.JSON | Form Recognition results |
Content:
Parameter | Description |
---|---|
cost | time spent on form identification |
type | type of form |
angle | The angle at which the form is rotated |
width | width of the form |
height | height of the form |
rows | number of rows in the form |
cols | Number of columns in the form |
position | The rectangular box position of the form |
height_of_rows | height of each row of the form |
width_of_cols | width of each column of the form |
table_cells | information about all cells in the form |
table_cells: start_row | The start row of a cell |
table_cells: end_row | The end row of a cell |
table_cells: start_col | The start column of a cell |
table_cells: end_col | The end column of a cell |
table_cells: text | Text in cells |
table_cells: position | Rectangular box position information for cells |
table_cells: lines | The text lines included in the cell |
table_cells: lines: text | The text line |
table_cells: lines: score | The score identified by the text line |
table_cells: lines: position | text line position information |
{
"cost": 7566,
"json_items": [
{
"type": "table_with_line",
"angle": 0.0,
"width": 488,
"height": 191,
"rows": 4,
"cols": 4,
"position": [
114,
657,
602,
657,
602,
848,
114,
848
],
"height_of_rows": [
65,
30,
31,
36
],
"width_of_cols": [
122,
122,
118,
122
],
"table_cells": [
{
"start_row": 1,
"end_row": 1,
"start_col": 1,
"end_col": 1,
"text": "",
"position": [
2,
2,
124,
2,
124,
67,
2,
67
],
"lines": []
},
{
"start_row": 2,
"end_row": 2,
"start_col": 1,
"end_col": 1,
"text": "Absorbed",
"position": [
2,
64,
125,
64,
125,
95,
2,
95
],
"lines": [
{
"text": "Absorbed",
"score": 1.0,
"position": [
29,
65,
99,
65,
99,
88,
29,
88
]
}
]
}
]
}
],
"html_items": [
"<table border=\ "1\" width='488px' height='191px'>\n
<tr>
<th width='122px' height='65px'></th>
<th width='122px' height='65px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='118px' height='65px' style=\ "white-space: pre-line\">Neuter</th>
<th width='122px' height='65px' style=\ "white-space: pre-line\">Fatigue</th>
</tr>\n
<tr>
<th width='122px' height='30px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='122px' height='30px'>
</th>
<th width='118px' height='30px' style=\ "white-space: pre-line\">2</th>
<th width='122px' height='30px'>
</th>
</tr>\n
<tr>
<th width='122px' height='31px' style=\ "white-space: pre-line\">Neuter</th>
<th width='122px' height='31px'>
</th>
<th width='118px' height='31px'>
</th>
<th width='122px' height='31px'>
</th>
</tr>\n
<tr>
<th width='122px' height='36px' style=\ "white-space: pre-line\">Fatigue</th>
<th width='122px' height='36px'>
</th>
<th width='118px' height='36px'>
</th>
<th width='122px' height='36px' style=\ "white-space: pre-line\">8</th>\t</tr>\n</table>", "
<table border=\ "1\" width='489px' height='166px'>\n
<tr>
<th width='123px' height='61px' style=\ "white-space: pre-line\">Expression</th>
<th width='117px' height='61px' style=\ "white-space: pre-line\">Image Num</th>
<th width='118px' height='61px' style=\ "white-space: pre-line\">Correct</th>
<th width='125px' height='61px' style=\ "white-space: pre-line\">Recognition Rate</th>
</tr>\n
<tr>
<th width='123px' height='31px' style=\ "white-space: pre-line\">Absorbed</th>
<th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='31px' style=\ "white-space: pre-line\">7</th>
<th width='125px' height='31px' style=\ "white-space: pre-line\">77.8%</th>
</tr>\n
<tr>
<th width='123px' height='30px' style=\ "white-space: pre-line\">Neuter</th>
<th width='117px' height='30px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='30px'>
</th>
<th width='125px' height='30px' style=\ "white-space: pre-line\">55.6%</th>
</tr>\n
<tr>
<th width='123px' height='31px' style=\ "white-space: pre-line\">Fatigue</th>
<th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
<th width='118px' height='31px'>
</th>
<th width='125px' height='31px' style=\ "white-space: pre-line\">88.9%</th>
</tr>\n
<tr>
<th width='483px' height='33px' colspan=\ "4\" style=\ "white-space: pre-line\">Average recognition rate: 74.1%</th>\t</tr>\n</table>"
]
}
Asynchronous Request
If you need to use the file asynchronous processing flow, please read the Asynchronous Request Instructions.