Intelligent Document Parsing Tool Guide

Note：Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.

Intelligent Document Parsing:

java

{
    "getImage": "both",
    "isAllowOcr": 1,
    "imageOutputType": "base64str"
}

Required Parameters:

getImage: Image extraction type: page returns the entire page image for each page, objects returns image objects within the page, both returns both entire page images and image objects

isAllowOcr: Whether to use OCR (0: Disable; 1: Enable).

imageOutputType: Image storage type: base64str, url base64str: Images are returned directly in base64 format in the API result (this method can result in large response volumes, not recommended for long documents). url: Images are returned as platform links, which you can download to your local storage or upload to your cloud storage.

Request Example:

You need to replace apiKey with the publicKey obtained from the console, file with the file you want to convert, and language with the desired interface error prompt language type.

curljava

curl

curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentParsing' \
--header 'x-api-key: apiKey' \
--header 'Accept: */*' \
--header 'Connection: keep-alive' \
--header 'Content-Type: multipart/form-data' \
--form 'file=@"file"' \
--form 'password="" \
--form 'parameter="{  \"getImage\": \"objects\",\"isAllowOcr\":1,\"imageOutputType\":\"url\"}"' \
--form 'language="1"'

java

import java.io.*;
import okhttp3.*;
public class main {
  public static void main(String []args) throws IOException{
    OkHttpClient client = new OkHttpClient().newBuilder()
      .build();
    MediaType mediaType = MediaType.parse("text/plain");
    RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
      .addFormDataPart("file","{{file}}",
 RequestBody.create(MediaType.parse("application/octet-stream"),
                                          new File("<file>")))
      .addFormDataPart("language","{{language}}")
      .addFormDataPart("password","")
      .addFormDataPart("parameter","{  \"getImage\": \"objects\",\"isAllowOcr\":1,\"imageOutputType\":\"url\"}") 
      .build();
    Request request = new Request.Builder()
      .url("https://api-server.compdf.com/server/v2/process/idp/documentParsing")
      .method("POST", body)
      .addHeader("x-api-key", "{{apiKey}}")
      .build();
    Response response = client.newCall(request).execute();
  }
}

Response Information:

A successful request returns an HTTP 200 OK status code and a JSON response body showing the order details.

Response type:application/json

Response Parameter	Data Type	Description
code	String	HTTP request status, "200" indicates success
message	String	Request message
data	Object	Return result
+taskId	String	Task ID
+taskFileNum	int	Number of files processed in the task
+taskSuccessNum	int	Number of files successfully processed in the task
+taskFailNum	int	Number of files failed in the task
+taskStatus	String	Task status
+assetTypeId	int	Used asset type ID
+taskCost	int	Task cost
+taskTime	int	Task duration
+sourceType	String	Original format
+targetType	String	Target format
+fileInfoDTOList	Array	Task file information
++fileKey	String	File key
++taskId	String	Task ID
++fileName	String	Original file name
++downFileName	String	Download file name
++fileUrl	String	Original file URL
++downloadUrl	String	Processed result file download URL
++sourceType	String	Original format
++targetType	String	Target format
++fileSize	int	File size
++convertSize	int	Processed result file size
++convertTime	int	Processing time
++status	String	File processing status
++failureCode	String	File processing failure error code
++failureReason	String	File processing failure description
++fileParameter	String	Processing parameter

Response Example:

json

"code": "200",
"msg": "success",
"data": {
    "taskId": "f416dbcf-0c10-4f93-ab9e-a835c1f5dba1",
    "taskFileNum": 1,
    "taskSuccessNum": 1,
    "taskFailNum": 0,
    "taskStatus": "<taskStatus>",
    "assetTypeId": 0,
    "taskCost": 1,
    "taskTime": 1,
    "sourceType": "<sourceType>",
    "targetType": "<targetType>",
    "fileInfoDTOList": [
      {
        "fileKey": "<fileKey>",
        "taskId": "<taskId>",
        "fileName": "<fileName>",
        "downFileName": "<downFileName>",
        "fileUrl": "<fileUrl>",
        "downloadUrl": "<downloadUrl>",
        "sourceType": "<sourceType>",
        "targetType": "<targetType>",
        "fileSize": 24475,
        "convertSize": 6922,
        "convertTime": 8,
        "status": "<status>",
        "failureCode": "",
        "failureReason": "",
        "fileParameter": "<fileParameter>"
      }
    ]
}

Result:

File Type	File Description
.json	JSON file with intelligent document parsing completed

Return Data Structure Explanation:

json

◆ code (integer) Operation status code
◆ message (string) Description message
◆ version (string) Version number
◆ duration (integer) Total processing time (in milliseconds)
◆ x_request_id (string) Request ID
◆ image_process (array) Whether there is a watermark
◆ msg (string) Description message
◆ result (object) Core data
  ├─ markdown (string) Markdown-formatted text of the entire document
  ├─ total_count (integer) Total number of pages in the PDF document
  ├─ total_page_number (integer) Total number of pages in the PDF document
  ├─ success_count (integer) Total number of successfully processed pages
  ├─ total_count (integer) Total number of pages in the PDF document
  ├─ valid_page_number (integer) Number of successfully parsed valid pages
  ├─ excel_base64 (string) Excel file base64 encoding
  ├─ catalog (object) Table of contents tree structure
  │  └─ toc (array)
  │     ├─ pos (array): Coordinates of the four corners of the directory area, in order: left-top, right-top, right-bottom, left-bottom.
  │     ├─ paragraph_id (integer): ID of the paragraph where the title is located
  │     ├─ page_id (integer): Page number where the title is located (minimum page number is 1)
  │     ├─ hierarchy (integer): Title level, 1 for level 1 title, 2 for level 2 title, and so on
  │     ├─ pos_list (array): When title merging occurs, the coordinates of multiple titles before merging. When no title merging occurs, the coordinates of the title.
  │     ├─ title (string): Title content
  │     └─ sub_type (string): Title type: text_title, image_title, table_title
  │
  ├─ pages (array) Paginated data container
  │  ├─ status (string): Page processing status/error message
  │  ├─ page_id (number): Current page number
  │  ├─ durations (number): Page processing time (milliseconds)
  │  ├─ image_id (string): Image address
  │  ├─ width (integer): Document page width (pixels)
  │  ├─ height (integer): Document page height (pixels)
  │  ├─ angle (integer): Text orientation angle (0°: ▲ (upright)/90°: ▶ (right rotation)/180°: ▼ (inverted)/270°: ◀ (left rotation))
  │  ├─ content (array): Basic data: text lines or images, refer to textline and image descriptions
  │  └─ structured (array): Structured data, one of textblock, table, imageblock, footer, header
  │
  └─ detail (array) Markdown detailed information (structure reused "paragraph data" model)
     ├─ page_id (integer): Current paragraph page number
     ├─ paragraph_id (integer): Current paragraph ID
     ├─ outline_level (integer): Title level: (up to 5 levels supported) -1. Body text 0. Level 1 title 1. Level 2 title ...
     ├─ text (string): Text
     ├─ type (string): Type, paragraph (paragraph type, including body text, titles, formulas, etc.), image (image type), table (table type)
     ├─ image_url (string): Image address
     ├─ content (integer): Content type 0 Body text (paragraph, image, table) 1 Non-body text (header, footer, sidebar)
     ├─ position (array): Coordinates of the four corners of the directory area, in order: left-top, right-top, right-bottom, left-bottom.
     ├─ sub_type (string): Subtype. When type is paragraph, possible values are catalog (table of contents), header (page header), footer (page footer), sidebar (sidebar), text (body text), text_title (text title), image_title (image title), table_title (table title); when type is image, possible values are stamp (seal), chart (chart), qrcode (QR code), barcode (barcode); when type is table, possible values are bordered (bordered table), borderless (borderless table).
     ├─ tags (array): Indicates whether there are special texts within the paragraph, including formula and handwritten.
     │─ cells (array): Cell array, returned only when type is table
     │  ├─ row_span (integer): Cell row span, default is 1
     │  ├─ text (integer):
     │  ├─ type (integer):
     │  ├─ col (integer): Cell column number
     │  ├─ col_span (integer): Cell column span, default is 1
     │  ├─ page_id (integer):
     │  ├─ position (integer): Coordinates of the four corners of the cell, in order: left-top, right-top, right-bottom, left-bottom.
     │  └─ row (integer): Cell row number
     │
     └─ caption_id (object): Original OCR text result
        ├─ page_id (integer): Page number where the title is located
        └─ paragraph_id (integer): Paragraph ID where the title is located

◆ metrics (array) Page-level performance metrics
  ├─ page_image_width (integer): Current page rendering width (pixels)
  ├─ page_image_height (integer): Current page rendering height (pixels)
  ├─ dpi (integer): Image resolution
  ├─ durations (number): Page processing time (milliseconds)
  ├─ status (string): Page processing status
  ├─ page_id (number): Current page number
  ├─ angle (integer): Text orientation angle (0°: ▲ (upright)/90°: ▶ (right rotation)/180°: ▼ (inverted)/270°: ◀ (left rotation))
  └─ image_id (string): Page image ID (download method same as pages.image_id)

Structured Data Specification:

Content (Text Line/Image)

Image Data

Parameter	Type	Description
id	integer	Data ID
type	string	Data type (fixed value: image)
pos	array	Text line four corner coordinates Format: `[top-left (x,y), top-right (x,y), bottom-right (x,y), bottom-left (x,y)]`
size	array	Image dimensions `[width, height]`
data	object	Image content object
↳ data.region	array	Image region coordinates on the page
↳ data.path	string	Image file path
↳ data.base64	string	Image file (jpg/png) base64 string

Textline Data

Parameter	Type	Description
id	integer	Data ID (unique within the page)
type	string	Data type (fixed value: line)
text	string	Text line content (When sub_type=stamp, it is the seal text)
pos	array	Text line four corner coordinates
score	number	Character confidence (Generated only when OCR is performed on the input image)

Structured Data

Textblock

Parameter	Type	Description
id	integer	Data ID
type	string	Block type (fixed value: textblock)
pos	array	Text block four corner coordinates
content	array	Contained text line ID array
sub_type	string	Subtype (title/list/formula, etc.)
text	string	Block text content
outline_level	integer	Title level: `-1`=Body text, `0`=Level 1 title, `1`=Level 2 title... (Up to five levels supported)

Table Data

Parameter	Type	Description
id	integer	Data ID
type	string	Block type (fixed value: table)
sub_type	string	Table type (Default value: bordered, borderless tables need special marking)
pos	array	Table four corner coordinates
rows	integer	Total number of rows
cols	integer	Total number of columns
columns_width	array	Column width array
rows_height	array	Row height array
text	string	Table content (HTML/Markdown format)

Imageblock

Parameter	Type	Description
id	integer	Data ID
type	string	Block type (fixed value: image)
pos	array	Image block four corner coordinates
text	string	Image annotation text (HTML/Markdown format)
image_url	string	Image file path
base64str	string	Image base64 encoded string

Footer Block

Parameter	Type	Description
type	string	Block type (fixed value: footer)
pos	array	Block four corner coordinates
blocks	array	Content block array (Can contain textblock/imageblock/table)

Header Block

Parameter	Type	Description
type	string	Block type (fixed value: header)
pos	array	Block four corner coordinates
image_url	string	Header image path
base64str	string	Header image base64 encoding
blocks	array	Content block array (Can contain textblock/imageblock/table)

Asynchronous Request

If you need to use the file asynchronous processing flow, please read the Asynchronous Request Instructions.

Intelligent Document Parsing Tool Guide ​

Intelligent Document Parsing Tool Guide