Skip to content

Intelligent Text Extraction Tool Guide

Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.

Text Extraction:

java
{    
  "lang": 8,
  "outputFormat": 1
}

Request Parameter:

lang: OCR recognition language. Supported types and definitions: 1: Simplified Chinese; 2: Traditional Chinese; 3: English; 4: Korean; 5: Japanese; 6: Latin; 7: Sanskrit; 8: Auto.

outputFormat: Output format (1: JSON; 2: TXT).

Java Example:

Replace apiKey with the publicKey obtained from the dashboard, file with the file you want to convert, and language with your preferred interface error prompt language type.

java
import java.io.*;
import okhttp3.*;
public class main {
  public static void main(String []args) throws IOException{
    OkHttpClient client = new OkHttpClient().newBuilder()
      .build();
    MediaType mediaType = MediaType.parse("text/plain");
    RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
      .addFormDataPart("file","{{file}}",
 RequestBody.create(MediaType.parse("application/octet-stream"),
                                          new File("<file>")))
      .addFormDataPart("language","{{language}}")
      .addFormDataPart("password","")
      .addFormDataPart("parameter","{    \"lang\": 8    }") 
      .build();
    Request request = new Request.Builder()
      .url("https://api-server.compdf.com/server/v1/process/documentAI/ocr")
      .method("POST", body)
      .addHeader("x-api-key", "{{apiKey}}")
      .build();
    Response response = client.newCall(request).execute();
  }
}

Result:

File TypeDescription
.JSONOCR recognition results

Content

ParameterDescription
costOCR recognition time
boxesAll detected object box positions of the input image
textOCR recognition content
rec_scoresOCR text recognition score, the higher the score, the more credible the result
java
{
        "cost": 149,
        "boxes": [
            [
                150,
                71,
                198,
                71,
                198,
                110,
                150,
                110
            ],
            [
                74,
                117,
                274,
                120,
                273,
                166,
                73,
                163
            ],
            [
                99,
                179,
                249,
                182,
                249,
                208,
                99,
                205
            ],
            [
                65,
                203,
                276,
                205,
                276,
                230,
                65,
                228
            ]
        ],
        "text": [
            "EPPING",
            "Twinned with",
            "Eppingen,Germany"
        ],
        "rec_scores": [
            0.46275457739830017,
            0.9971449971199036,
            0.9649983048439026,
            0.9587073922157288
        ]
}