Intelligent Text Extraction Tool Guide
Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.
Text Extraction:
java
{
"lang": 8,
"outputFormat": 1
}
Request Parameter:
lang
: OCR recognition language. Supported types and definitions: 1: Simplified Chinese; 2: Traditional Chinese; 3: English; 4: Korean; 5: Japanese; 6: Latin; 7: Sanskrit; 8: Auto.
outputFormat
: Output format (1: JSON; 2: TXT).
Java Example:
Replace apiKey with the publicKey obtained from the dashboard, file with the file you want to convert, and language with your preferred interface error prompt language type.
java
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
.addFormDataPart("file","{{file}}",
RequestBody.create(MediaType.parse("application/octet-stream"),
new File("<file>")))
.addFormDataPart("language","{{language}}")
.addFormDataPart("password","")
.addFormDataPart("parameter","{ \"lang\": 8 }")
.build();
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/process/documentAI/ocr")
.method("POST", body)
.addHeader("x-api-key", "{{apiKey}}")
.build();
Response response = client.newCall(request).execute();
}
}
Result:
File Type | Description |
---|---|
.JSON | OCR recognition results |
Content
Parameter | Description |
---|---|
cost | OCR recognition time |
boxes | All detected object box positions of the input image |
text | OCR recognition content |
rec_scores | OCR text recognition score, the higher the score, the more credible the result |
java
{
"cost": 149,
"boxes": [
[
150,
71,
198,
71,
198,
110,
150,
110
],
[
74,
117,
274,
120,
273,
166,
73,
163
],
[
99,
179,
249,
182,
249,
208,
99,
205
],
[
65,
203,
276,
205,
276,
230,
65,
228
]
],
"text": [
"EPPING",
"Twinned with",
"Eppingen,Germany"
],
"rec_scores": [
0.46275457739830017,
0.9971449971199036,
0.9649983048439026,
0.9587073922157288
]
}