### 实时语音转写

Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md

通过WebSocket协议，将连续的音频流内容，实时识别返回对应的文字流内容。

```APIDOC
## 实时语音转写

### Description
实时语音转写（Real-time ASR）基于深度全序列卷积神经网络框架，通过 WebSocket
协议，建立应用与语言转写核心引擎的长连接，开发者可实现将连续的音频流内容，实时识别返回对应的文字流内容。
支持的音频格式： 采样率为16K，采样深度为16bit的pcm_s16le音频

### Parameters
#### Query Parameters
- **lang** (string) - Optional - 实时语音转写语种，不传默认为中文
  - 语种类型：中文、中英混合识别：cn；英文：en；小语种及方言可到控制台-实时语音转写-方言/语种处添加，添加后会显示该方言/语种参数值。
- **targetLang** (string) - Optional - 目标翻译语种
  - 例如：targetLang="en"
  - 如果使用中文实时翻译为英文传参示例如下：
  "lang=cn&transType=normal&transStrategy=2&targetLang=en"
  *注意：需控制台开通翻译功能*

### Request Example
```json
{
  "lang": "cn",
  "targetLang": "en"
}
```

### Response
#### Success Response (200)
- **text** (string) - 识别出的文本
- **audio_end** (boolean) - 音频是否结束
```

--------------------------------

### 音频文件语音转写

Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md

将长段音频（5小时以内）数据转换成文本数据，为信息处理和数据挖掘提供基础。

```APIDOC
## 音频文件语音转写

### Description
语音转写（Long Form ASR）基于深度全序列卷积神经网络，将长段音频（5小时以内）数据转换成文本数据，为信息处理和数据挖掘提供基础。
转写的是已录制音频（非实时），音频文件上传成功后进入等待队列，待转写成功后用户即可获取结果，返回结果时间受音频时长以及排队任务量的影响。
如遇转写耗时比平时延长，大概率表示当前时间段出现转写高峰，请耐心等待即可，我们承诺有效任务耗时最大不超过5小时 。
另外，为使转写服务更加通畅，请尽量转写5分钟以上的音频文件。

### Parameters
#### Query Parameters
- **speaker_number** (string) - Optional - 发音人个数，可选值：0-10，0表示盲分
  *注*：发音人分离目前还是测试效果达不到商用标准，如测试无法满足您的需求，请慎用该功能。
- **has_seperate** (string) - Optional - 转写结果中是否包含发音人分离信息
- **role_type** (string) - Optional - 支持两种参数
  1: 通用角色分离
  2: 电话信道角色分离（适用于speaker_number为2的说话场景）该字段只有在开通了角色分离功能的前提下才会生效，正确传入该参数后角色分离效果会有所提升。
  如果该字段不传，默认采用 1 类型
- **language** (string) - Optional - 语种
  cn:中英文&中文（默认）
  en:英文（英文不支持热词）

### Request Example
```json
{
  "speaker_number": "2",
  "has_seperate": "true",
  "role_type": "1",
  "language": "cn"
}
```

### Response
#### Success Response (200)
- **result** (string) - 转写结果的JSON字符串
- **task_id** (string) - 任务ID
```

--------------------------------

### 语音听写（流式版）

Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md

提供1分钟内的即时语音转文字技术，支持实时返回识别结果。

```APIDOC
## 语音听写（流式版）

### Description
语音听写流式接口，用于1分钟内的即时语音转文字技术，支持实时返回识别结果，达到一边上传音频一边获得识别文本的效果。

### Parameters
#### Query Parameters
- **vad_eos** (int) - Optional - 用于设置端点检测的静默时间，单位是毫秒。即静默多长时间后引擎认为音频结束。默认2000（小语种除外，小语种不设置该参数默认为未开启VAD）。
- **dwa** (string) - Optional - （仅中文普通话支持）动态修正
  - wpgs：开启流式结果返回功能
  *注：该扩展功能若未授权无法使用，可到控制台-语音听写（流式版）-高级功能处免费开通；若未授权状态下设置该参数并不会报错，但不会生效。*

### Request Example
```json
{
  "vad_eos": 3000,
  "dwa": "wpgs"
}
```

### Response
#### Success Response (200)
- **text** (string) - 识别出的文本
- **audio_end** (boolean) - 音频是否结束
```

--------------------------------

### 图片生成（ImageGen）

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

根据自然语言文本描述生成图片，返回 Base64 编码的图片数据。需要解码后保存为 PNG 文件。适用于需要根据文本创意生成图像的场景。

```java
import cn.xfyun.api.ImageGenClient;

ImageGenClient client = new ImageGenClient.Builder(appId, apiKey, apiSecret).build();

// 发送图片生成请求
String resp = client.send("帮我画一只可爱的小猫在草地上玩耍");

JSONObject obj = JSON.parseObject(resp);
if (obj.getJSONObject("header").getInteger("code") != 0) {
    System.err.println("生成失败：" + resp);
    return;
}

// 从响应中提取 Base64 图片数据
String base64Image = obj.getJSONObject("payload")
        .getJSONObject("choices")
        .getJSONArray("text")
        .getJSONObject(0)
        .getString("content");

// 解码并保存为 PNG 文件
byte[] imageBytes = Base64.getDecoder().decode(base64Image);
String outputPath = "src/main/resources/image/gen_output.png";
try (FileOutputStream fos = new FileOutputStream(outputPath)) {
    fos.write(imageBytes);
    System.out.println("图片已保存：" + outputPath);
}
```

--------------------------------

### 人脸比对（FaceCompare）

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

对两张人脸图片进行相似度比对。支持 JPG/PNG 等常见图片格式，结果包含相似度分值。需要先读取图片文件并转为 Base64 编码。

```java
import cn.xfyun.api.FaceCompareClient;

FaceCompareClient client = new FaceCompareClient
        .Builder(appId, apiKey, apiSecret)
        .build();

// 读取两张人脸图片并转为 Base64
byte[] face1Bytes = IoUtil.readBytes(new FileInputStream("image/face1.jpg"));
byte[] face2Bytes = IoUtil.readBytes(new FileInputStream("image/face2.jpg"));
String face1Base64 = Base64.getEncoder().encodeToString(face1Bytes);
String face2Base64 = Base64.getEncoder().encodeToString(face2Bytes);

// 执行人脸比对
String result = client.faceCompare(face1Base64, "jpg", face2Base64, "jpg");
System.out.println("请求地址：" + client.getHostUrl());
System.out.println("比对结果：" + result);
// 输出示例：{"code":0,"data":{"score":0.98},"message":"success"}
// score 越接近 1 表示两张人脸越相似
```

--------------------------------

### Configure Authentication Information

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Reads authentication parameters like appId, apiKey, and apiSecret from the test.properties file. Specific keys for real-time speech transcription and audio file transcription are also available.

```properties
// src/main/resources/test.properties 示例配置
// appId=your_app_id
// apiKey=your_api_key
// apiSecret=your_api_secret
// rtaAPIKey=your_rta_api_key
// lfasrSecretKey=your_lfasr_secret_key
// sparkApiPassword=your_spark_api_password
```

```java
// 读取配置
String appId     = PropertiesConfig.getAppId();
String apiKey    = PropertiesConfig.getApiKey();
String apiSecret = PropertiesConfig.getApiSecret();

// 实时语音转写专用 Key
String rtaAPIKey = PropertiesConfig.getRtaAPIKey();

// 音频文件转写专用 SecretKey
String lfasrSecretKey = PropertiesConfig.getLfasrSecretKey();

// 星火大模型 HTTP 接口专用密码
String sparkApiPassword = PropertiesConfig.getSparkApiPassword();
```

--------------------------------

### 图片生成（ImageGen）

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

根据自然语言文本描述生成图片，返回结果为 Base64 编码的图片数据，可自动解码后保存为 PNG 文件。

```APIDOC
## 图片生成（ImageGen）

根据自然语言文本描述生成图片，返回结果为 Base64 编码的图片数据，自动解码后保存为 PNG 文件。

```java
import cn.xfyun.api.ImageGenClient;

ImageGenClient client = new ImageGenClient.Builder(appId, apiKey, apiSecret).build();

// 发送图片生成请求
String resp = client.send("帮我画一只可爱的小猫在草地上玩耍");

JSONObject obj = JSON.parseObject(resp);
if (obj.getJSONObject("header").getInteger("code") != 0) {
    System.err.println("生成失败：" + resp);
    return;
}

// 从响应中提取 Base64 图片数据
String base64Image = obj.getJSONObject("payload")
        .getJSONObject("choices")
        .getJSONArray("text")
        .getJSONObject(0)
        .getString("content");

// 解码并保存为 PNG 文件
byte[] imageBytes = Base64.getDecoder().decode(base64Image);
String outputPath = "src/main/resources/image/gen_output.png";
try (FileOutputStream fos = new FileOutputStream(outputPath)) {
    fos.write(imageBytes);
    System.out.println("图片已保存：" + outputPath);
}
```
```

--------------------------------

### Audio File Speech Transcription (LFASR)

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

This snippet shows how to perform asynchronous batch speech transcription for audio files using the LfasrClient. It follows a three-step process: upload, poll for results, and retrieve the transcription. It supports various tasks including transcription, translation, and quality inspection, with options for speaker diarization.

```APIDOC
## Audio File Speech Transcription (LFASR)

This section details the usage of the `LfasrClient` for asynchronous batch audio file transcription.

### Description

Asynchronous batch transcription for audio files up to 5 hours. It uses a three-step process: upload, poll for results, and retrieve the final transcription. Supports multiple task types like transcription (`transfer`), translation (`translate`), and quality inspection (`predict`), with an option for speaker diarization.

### Client Construction

```java
import cn.xfyun.api.LfasrClient;

LfasrClient lfasrClient = new LfasrClient.Builder(appId, lfasrSecretKey)
        // .roleType((short) 1)     // Speaker diarization: 1=General, 2=Telephony channel
        // .transLanguage("en")     // Translation target language
        // .audioMode("urlLink")    // Use remote URL for upload
        .build();
```

### Step 1: Upload Audio File

```java
import cn.xfyun.model.response.lfasr.LfasrResponse;

// Upload a local file
LfasrResponse uploadResp = lfasrClient.uploadFile("audio/lfasr.wav");
// Or upload from a remote URL
// LfasrResponse uploadResp = lfasrClient.uploadUrl("https://example.com/audio.wav");

if (!"000000".equals(uploadResp.getCode())) {
    System.err.println("Upload failed: " + uploadResp.getDescInfo());
    return;
}
String orderId = uploadResp.getContent().getOrderId();
System.out.println("Task orderId: " + orderId);
```

### Step 2: Poll for Results

```java
import cn.xfyun.model.enums.LfasrOrderStatusEnum;
import java.util.concurrent.TimeUnit;

int status = LfasrOrderStatusEnum.CREATED.getKey();
while (status != LfasrOrderStatusEnum.COMPLETED.getKey() && status != LfasrOrderStatusEnum.FAILED.getKey()) {
    LfasrResponse resultResp = lfasrClient.getResult(orderId, "transfer");
    status = resultResp.getContent().getOrderInfo().getStatus();
    System.out.println("Order status: " + LfasrOrderStatusEnum.getEnum(status).getValue());

    if (status == LfasrOrderStatusEnum.COMPLETED.getKey()) {
        // Step 3: Parse transcription results (lattice structure)
        LfasrOrderResult orderResult = gson.fromJson(resultResp.getContent().getOrderResult(), LfasrOrderResult.class);
        for (LfasrOrderResult.Lattice lattice : orderResult.getLattice()) {
            System.out.println("Role-" + lattice.getJson1Best().getSt().getRl() + ": " + extractText(lattice));
        }
        break;
    }
    TimeUnit.SECONDS.sleep(20); // Poll every 20 seconds
}
```

### Step 3: Parse Transcription Results (Example within Step 2)

```java
// This part is included within the Step 2 loop when status is COMPLETED.
// Example of parsing the lattice structure:
// LfasrOrderResult orderResult = gson.fromJson(resultResp.getContent().getOrderResult(), LfasrOrderResult.class);
// for (LfasrOrderResult.Lattice lattice : orderResult.getLattice()) {
//     System.out.println("Role-" + lattice.getJson1Best().getSt().getRl() + ": " + extractText(lattice));
// }

// Helper function to extract text from lattice (implementation not provided in source)
// static String extractText(LfasrOrderResult.Lattice lattice) { ... }
```
```

--------------------------------

### Real-time Speech Transcription (IAT) with WebSocket

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Performs real-time speech transcription using WebSocket. Supports file input, microphone capture, and custom streaming. Enables dynamic correction (wpgs) for real-time result refinement. Handles intermediate and final results via callbacks.

```java
import cn.xfyun.api.IatClient;
import cn.xfyun.model.response.iat.IatResponse;
import cn.xfyun.model.response.iat.IatResult;
import cn.xfyun.model.response.iat.Text;
import cn.xfyun.service.iat.AbstractIatWebSocketListener;

// 构建客户端
IatClient iatClient = new IatClient.Builder()
        .signature(appId, apiKey, apiSecret)
        .dwa("wpgs")      // 开启流式结果修正
        .vad_eos(6000)    // 静默结束检测时间（毫秒）
        .build();

List<Text> resultSegments = new ArrayList<>();

// 从文件发送并监听结果
iatClient.send(new File("audio/iat_pcm_16k.pcm"), new AbstractIatWebSocketListener() {
    @Override
    public void onSuccess(WebSocket webSocket, IatResponse resp) {
        if (resp.getCode() != 0) {
            System.err.println("错误码：" + resp.getCode() + "，错误信息：" + resp.getMessage());
            // 错误码查询：https://www.xfyun.cn/document/error-code
            return;
        }
        if (resp.getData() != null && resp.getData().getResult() != null) {
            Text text = resp.getData().getResult().getText();
            // 处理 wpgs 流式修正结果
            if ("rpl".equals(text.getPgs()) && text.getRg() != null) {
                for (int i = text.getRg()[0] - 1; i <= text.getRg()[1] - 1; i++) {
                    resultSegments.get(i).setDeleted(true);
                }
            }
            resultSegments.add(text);
            System.out.println("中间结果：" + getFinalResult(resultSegments));
        }
        if (resp.getData() != null && resp.getData().getStatus() == 2) {
            // status=2 表示全部结果返回完毕
            System.out.println("最终结果：" + getFinalResult(resultSegments));
            iatClient.closeWebsocket();
        }
    }

    @Override
    public void onFail(WebSocket webSocket, Throwable t, Response response) {
        System.err.println("连接失败：" + t.getMessage());
    }
});

// 拼接最终识别结果
static String getFinalResult(List<Text> segments) {
    return segments.stream()
            .filter(t -> t != null && !t.isDeleted())
            .map(Text::getText)
            .collect(Collectors.joining());
}
```

--------------------------------

### 星火智能体（Agent）- 流式调用

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

调用讯飞星火智能体工作流，支持流式（SSE）输出。可传入动态参数，并支持多步工作流进度跟踪。需要实现 AgentCallback 接口处理事件。

```java
import cn.xfyun.api.AgentClient;
import cn.xfyun.model.agent.AgentChatParam;
import cn.xfyun.service.agent.AgentCallback;

AgentClient client = new AgentClient.Builder(apiKey, apiSecret).build();

// 构建智能体请求参数
JSONObject parameter = new JSONObject();
parameter.put("AGENT_USER_INPUT", "今天天气怎么样");
AgentChatParam agentParam = AgentChatParam.builder()
        .flowId("7351431612989308928")  // 工作流 ID（在控制台获取）
        .parameters(parameter)
        .build();

StringBuilder finalResult = new StringBuilder();

// 流式（SSE）调用
client.completion(agentParam, new AgentCallback() {
    @Override
    public void onEvent(Call call, String id, String type, String data) {
        JSONObject obj = JSON.parseObject(data);
        JSONObject delta = obj.getJSONArray("choices").getJSONObject(0).getJSONObject("delta");
        String content = delta.getString("content");
        if (content != null && !content.isEmpty()) {
            finalResult.append(content);
            System.out.print(content); // 流式打印
        }
        // 工作流进度
        JSONObject step = obj.getJSONObject("workflow_step");
        if (step != null) {
            System.out.printf("进度：%.0f%%%n", step.getFloat("progress") * 100);
        }
        String finishReason = obj.getJSONArray("choices")
                .getJSONObject(0).getString("finish_reason");
        if ("stop".equals(finishReason)) {
            System.out.println("\n最终结果：" + finalResult);
        }
    }

    @Override
    public void onFail(Call call, Throwable t) {
        System.err.println("SSE 连接失败：" + t.getMessage());
    }

    @Override
    public void onClosed(Call call) { call.cancel(); }

    @Override
    public void onOpen(Call call, Response response) {
        System.out.println("SSE 连接建立");
    }
});
```

--------------------------------

### 星火大模型多轮对话（WebSocket/HTTP）

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

支持 WebSocket 流式调用和 HTTP POST 同步调用与星火大模型进行多轮对话。可配置函数调用、联网搜索和思维链等高级能力。WebSocket 调用需要实现 AbstractSparkModelWebSocketListener 来处理响应。

```java
import cn.xfyun.api.SparkChatClient;
import cn.xfyun.config.SparkModel;
import cn.xfyun.model.sparkmodel.*;
import cn.xfyun.model.sparkmodel.response.SparkChatResponse;
import cn.xfyun.service.sparkmodel.AbstractSparkModelWebSocketListener;

// 构建多轮对话消息
List<RoleContent> messages = new ArrayList<>();
RoleContent systemMsg = new RoleContent();
systemMsg.setRole("system");
systemMsg.setContent("你是一个智能助手。");
RoleContent userMsg = new RoleContent();
userMsg.setRole("user");
userMsg.setContent("北京今天天气怎么样");
messages.add(systemMsg);
messages.add(userMsg);

// 构建请求参数
SparkChatParam param = SparkChatParam.builder()
        .messages(messages)
        .chatId("session_001")
        .thinkingType("disabled") // 思维链：disabled/enabled
        // .webSearch(webSearch)  // 联网搜索（Pro/Max/Ultra 支持）
        // .functions(functions)  // 函数调用（Max/4.0 Ultra 支持）
        .build();

// 方式一：WebSocket 流式调用
SparkChatClient wsClient = new SparkChatClient.Builder()
        .signatureWs(appId, apiKey, apiSecret, SparkModel.SPARK_X1)
        .build();

StringBuilder finalResult = new StringBuilder();
wsClient.send(param, new AbstractSparkModelWebSocketListener() {
    @Override
    public void onSuccess(WebSocket webSocket, SparkChatResponse resp) {
        if (resp.getHeader().getCode() != 0) {
            System.err.println("错误：" + resp.getHeader().getMessage());
            return;
        }
        resp.getPayload().getChoices().getText().forEach(text -> {
            if (text.getContent() != null) {
                finalResult.append(text.getContent());
                System.out.print(text.getContent()); // 流式打印
            }
        });
        if (resp.getPayload().getChoices().getStatus() == 2) {
            System.out.println("\n完整回复：" + finalResult);
            webSocket.close(1000, "");
        }
    }

    @Override
    public void onFail(WebSocket webSocket, Throwable t, Response response) {
        System.err.println("连接失败：" + t.getMessage());
    }
});

// 方式二：HTTP POST 同步调用
SparkChatClient httpClient = new SparkChatClient.Builder()
        .signatureHttp(sparkApiPassword, SparkModel.SPARK_X1)
        .build();
String result = httpClient.send(param);
JSONObject obj = JSON.parseObject(result);
String content = obj.getJSONArray("choices").getJSONObject(0)
        .getJSONObject("message").getString("content");
System.out.println("HTTP 回复：" + content);
```

--------------------------------

### Real-Time Speech-to-Text (RTASR) with WebSocket

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Use this for continuous audio streams. It supports multiple languages and can be combined with translation for simultaneous interpretation. Input can be from file streams, byte arrays, or microphones.

```java
import cn.xfyun.api.RtasrClient;
import cn.xfyun.model.response.rtasr.RtasrResponse;
import cn.xfyun.service.rta.AbstractRtasrWebSocketListener;

// 构建客户端（仅需 appId 和 rtaAPIKey）
RtasrClient rtasrClient = new RtasrClient.Builder()
        .signature(appId, rtaAPIKey)
        // .lang("cn")          // 语种：cn（默认）/ en
        // .targetLang("en")    // 目标翻译语种（需在控制台开通翻译功能）
        .build();

StringBuffer finalResult = new StringBuffer();
CountDownLatch latch = new CountDownLatch(1);

// 通过输入流发送音频
FileInputStream inputStream = new FileInputStream("audio/rtasr.pcm");
rtasrClient.send(inputStream, new AbstractRtasrWebSocketListener() {
    @Override
    public void onSuccess(WebSocket webSocket, String text) {
        RtasrResponse response = JSONObject.parseObject(text, RtasrResponse.class);
        // 解析 data 字段中的 cn.st.rt 结构获取文字
        String tempResult = handleContent(response.getData());
        System.out.println("实时结果：" + finalResult + tempResult);
    }

    @Override
    public void onFail(WebSocket webSocket, Throwable t, Response response) {
        latch.countDown();
    }

    @Override
    public void onBusinessFail(WebSocket webSocket, String text) {
        System.err.println("业务异常：" + text);
        latch.countDown();
    }

    @Override
    public void onClosed() {
        latch.countDown();
    }
});
latch.await(); // 等待转写完成

// 解析转写结构（type=0 为完整句，type=1 为中间结果）
static String handleContent(String data) {
    JSONObject cn = JSON.parseObject(data).getJSONObject("cn");
    JSONArray rtArr = cn.getJSONObject("st").getJSONArray("rt");
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < rtArr.size(); i++) {
        rtArr.getJSONObject(i).getJSONArray("ws").forEach(ws -> {
            ((JSONObject) ws).getJSONArray("cw").forEach(cw ->
                    sb.append(((JSONObject) cw).getString("w")));
        });
    }
    String type = cn.getJSONObject("st").getString("type");
    if ("0".equals(type)) finalResult.append(sb);
    return "1".equals(type) ? sb.toString() : "";
}
```

--------------------------------

### 语音合成（流式版）

Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md

将文字信息转化为声音信息，同时提供了众多极具特色的发音人（音库）供您选择。

```APIDOC
## 语音合成（流式版）

### Description
语音合成流式接口将文字信息转化为声音信息，同时提供了众多极具特色的发音人（音库）供您选择。

### Parameters
#### Query Parameters
- **vcn** (string) - Required - 发音人，可选值：请到控制台添加试用或购买发音人，添加后即显示发音人参数值。
- **rdn** (string) - Optional - 合成音频数字发音方式
  0：自动判断（默认值）
  1：完全数值
  2：完全字符串
  3：字符串优先

### Request Example
```json
{
  "vcn": "xiaoyan",
  "rdn": "0"
}
```

### Response
#### Success Response (200)
- **audio** (string) - 合成的音频流数据
```

--------------------------------

### 机器翻译（Translate）

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

支持小牛翻译、自研机器翻译（ITS）和自研机器翻译增强版（ITS Pro）三种翻译引擎。ITS Pro 支持个性化术语，翻译结果以 Base64 编码返回，需要解码后使用。

```java
import cn.xfyun.api.TransClient;
import cn.xfyun.model.translate.TransParam;

TransClient client = new TransClient.Builder(appId, apiKey, apiSecret).build();

TransParam param = TransParam.builder()
        .text("神舟十二号载人飞船发射任务取得圆满成功")
        .from("cn")   // 源语种
        .to("en")     // 目标语种
        // .resId("your_term_id")  // 个性化术语ID（仅 ITS Pro 支持）
        .build();

// 小牛翻译
String niuResult = client.sendNiuTrans(param);
System.out.println("小牛翻译结果：" + niuResult);

// 自研机器翻译（ITS）
String itsResult = client.sendIst(param);
System.out.println("ITS 翻译结果：" + itsResult);

// 自研机器翻译增强版（ITS Pro），结果需 Base64 解码
String itsProResult = client.sendIstV2(param);
String textBase64 = JSON.parseObject(itsProResult)
        .getJSONObject("payload")
        .getJSONObject("result")
        .getString("text");
String decoded = new String(Base64.getDecoder().decode(textBase64), StandardCharsets.UTF_8);
System.out.println("ITS Pro 翻译结果：" + decoded);
// 输出示例：The launch mission of Shenzhou-12 crewed spacecraft was a complete success.
```

--------------------------------

### Spark Agent (Agent)

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Calls the Xunfei Spark Agent workflow. Supports both streaming (SSE) and non-streaming modes. Allows passing dynamic parameters, tracking multi-step workflow progress, and interactive nodes (e.g., option selection).

```APIDOC
## Spark Agent (Agent)

Calls the Xunfei Spark Agent workflow. Supports both streaming (SSE) and non-streaming modes. Allows passing dynamic parameters, tracking multi-step workflow progress, and interactive nodes (e.g., option selection).

```java
import cn.xfyun.api.AgentClient;
import cn.xfyun.model.agent.AgentChatParam;
import cn.xfyun.service.agent.AgentCallback;

AgentClient client = new AgentClient.Builder(apiKey, apiSecret).build();

// Construct agent request parameters
JSONObject parameter = new JSONObject();
parameter.put("AGENT_USER_INPUT", "What is the weather today?");
AgentChatParam agentParam = AgentChatParam.builder()
        .flowId("7351431612989308928")  // Workflow ID (obtained from the console)
        .parameters(parameter)
        .build();

StringBuilder finalResult = new StringBuilder();

// Streaming (SSE) call
client.completion(agentParam, new AgentCallback() {
    @Override
    public void onEvent(Call call, String id, String type, String data) {
        JSONObject obj = JSON.parseObject(data);
        JSONObject delta = obj.getJSONArray("choices").getJSONObject(0).getJSONObject("delta");
        String content = delta.getString("content");
        if (content != null && !content.isEmpty()) {
            finalResult.append(content);
            System.out.print(content); // Streamed printing
        }
        // Workflow progress
        JSONObject step = obj.getJSONObject("workflow_step");
        if (step != null) {
            System.out.printf("Progress: %.0f%%%n", step.getFloat("progress") * 100);
        }
        String finishReason = obj.getJSONArray("choices")
                .getJSONObject(0).getString("finish_reason");
        if ("stop".equals(finishReason)) {
            System.out.println("\nFinal Result: " + finalResult);
        }
    }

    @Override
    public void onFail(Call call, Throwable t) {
        System.err.println("SSE connection failed: " + t.getMessage());
    }

    @Override
    public void onClosed(Call call) { call.cancel(); }

    @Override
    public void onOpen(Call call, Response response) {
        System.out.println("SSE connection established");
    }
});
```
```

--------------------------------

### Real-time Speech Transcription (RTASR)

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

This snippet demonstrates how to use the RtasrClient for real-time speech-to-text transcription via WebSocket. It supports various input methods like file streams, byte arrays, and microphones, and can be configured for different languages and translation.

```APIDOC
## Real-time Speech Transcription (RTASR)

This section details the usage of the `RtasrClient` for real-time audio transcription.

### Description

Utilizes WebSocket for continuous audio stream transcription, suitable for scenarios requiring real-time feedback. Supports multiple languages and can be integrated with translation for simultaneous interpretation. Accepts input from file streams, byte arrays, and microphones.

### Client Construction

```java
import cn.xfyun.api.RtasrClient;

RtasrClient rtasrClient = new RtasrClient.Builder()
        .signature(appId, rtaAPIKey)
        // .lang("cn")          // Language: cn (default) / en
        // .targetLang("en")    // Target translation language (requires translation feature enabled in console)
        .build();
```

### Sending Audio Stream

```java
import cn.xfyun.model.response.rtasr.RtasrResponse;
import cn.xfyun.service.rta.AbstractRtasrWebSocketListener;
import java.io.FileInputStream;
import java.util.concurrent.CountDownLatch;

// Using FileInputStream as an example
FileInputStream inputStream = new FileInputStream("audio/rtasr.pcm");
rtasrClient.send(inputStream, new AbstractRtasrWebSocketListener() {
    @Override
    public void onSuccess(WebSocket webSocket, String text) {
        RtasrResponse response = JSONObject.parseObject(text, RtasrResponse.class);
        // Parse the cn.st.rt structure in the data field to get the text
        String tempResult = handleContent(response.getData());
        System.out.println("Real-time result: " + finalResult + tempResult);
    }

    @Override
    public void onFail(WebSocket webSocket, Throwable t, Response response) {
        latch.countDown();
    }

    @Override
    public void onBusinessFail(WebSocket webSocket, String text) {
        System.err.println("Business exception: " + text);
        latch.countDown();
    }

    @Override
    public void onClosed() {
        latch.countDown();
    }
});

latch.await(); // Wait for transcription to complete
```

### Handling Transcription Results

```java
// Parse the transcription structure (type=0 for complete sentences, type=1 for intermediate results)
static String handleContent(String data) {
    JSONObject cn = JSON.parseObject(data).getJSONObject("cn");
    JSONArray rtArr = cn.getJSONObject("st").getJSONArray("rt");
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < rtArr.size(); i++) {
        rtArr.getJSONObject(i).getJSONArray("ws").forEach(ws -> {
            ((JSONObject) ws).getJSONArray("cw").forEach(cw ->
                    sb.append(((JSONObject) cw).getString("w")));
        });
    }
    String type = cn.getJSONObject("st").getString("type");
    if ("0".equals(type)) finalResult.append(sb);
    return "1".equals(type) ? sb.toString() : "";
}
```
```

--------------------------------

### 文字识别 OCR（GeneralWords）

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

支持印刷体和手写体文字识别。图片以 Base64 编码格式传输。适用于文档扫描、票据识别等需要从图片中提取文本的场景。

```java
import cn.xfyun.api.GeneralWordsClient;
import cn.xfyun.config.OcrWordsEnum;

// OcrWordsEnum.PRINT      印刷文字识别
// OcrWordsEnum.HANDWRITING 手写文字识别
GeneralWordsClient client = new GeneralWordsClient
        .Builder(appId, apiKey, OcrWordsEnum.PRINT)
        .build();

// 读取图片并转为 Base64
byte[] imageBytes = IoUtil.readBytes(new FileInputStream("image/document.jpg"));
String imageBase64 = Base64.getEncoder().encodeToString(imageBytes);

// 发送识别请求
String result = client.generalWords(imageBase64);
System.out.println("请求地址：" + client.getHostUrl());
System.out.println("识别结果：" + result);
// 输出示例：{"code":0,"data":{"result":[{"content":"识别到的文字内容"}]},"message":"success"}
```

--------------------------------

### Asynchronous Batch Speech-to-Text (LFASR)

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Use this for asynchronous batch processing of audio files up to 5 hours long. It follows an upload-poll-retrieve process and supports tasks like transcription, translation, and quality inspection, with optional speaker diarization.

```java
import cn.xfyun.api.LfasrClient;
import cn.xfyun.model.response.lfasr.LfasrResponse;

// 构建客户端
LfasrClient lfasrClient = new LfasrClient.Builder(appId, lfasrSecretKey)
        // .roleType((short) 1)     // 发音人分离：1=通用，2=电话信道
        // .transLanguage("en")     // 翻译目标语种
        // .audioMode("urlLink")    // 使用远程URL上传
        .build();

// 第一步：上传本地文件（或远程URL）
LfasrResponse uploadResp = lfasrClient.uploadFile("audio/lfasr.wav");
// LfasrResponse uploadResp = lfasrClient.uploadUrl("https://example.com/audio.wav");
if (!"000000".equals(uploadResp.getCode())) {
    System.err.println("上传失败：" + uploadResp.getDescInfo());
    return;
}
String orderId = uploadResp.getContent().getOrderId();
System.out.println("任务 orderId：" + orderId);

// 第二步：轮询查询结果（每隔20秒查询一次）
int status = LfasrOrderStatusEnum.CREATED.getKey();
while (status != LfasrOrderStatusEnum.COMPLETED.getKey()
       && status != LfasrOrderStatusEnum.FAILED.getKey()) {
    LfasrResponse resultResp = lfasrClient.getResult(orderId, "transfer");
    status = resultResp.getContent().getOrderInfo().getStatus();
    System.out.println("订单状态：" + LfasrOrderStatusEnum.getEnum(status).getValue());

    if (status == LfasrOrderStatusEnum.COMPLETED.getKey()) {
        // 第三步：解析转写结果（lattice 结构）
        LfasrOrderResult orderResult = gson.fromJson(
            resultResp.getContent().getOrderResult(), LfasrOrderResult.class);
        for (LfasrOrderResult.Lattice lattice : orderResult.getLattice()) {
            System.out.println("角色-" + lattice.getJson1Best().getSt().getRl()
                               + "：" + extractText(lattice));
        }
        break;
    }
    TimeUnit.SECONDS.sleep(20);
}
```

--------------------------------

### 文字识别 OCR（GeneralWords）

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

支持印刷文字识别和手写文字识别，图片以 Base64 编码格式传输，适用于文档扫描、票据识别等场景。

```APIDOC
## 文字识别 OCR（GeneralWords）

支持印刷文字识别和手写文字识别，图片以 Base64 编码格式传输，适用于文档扫描、票据识别等场景。

```java
import cn.xfyun.api.GeneralWordsClient;
import cn.xfyun.config.OcrWordsEnum;

// OcrWordsEnum.PRINT      印刷文字识别
// OcrWordsEnum.HANDWRITING 手写文字识别
GeneralWordsClient client = new GeneralWordsClient
        .Builder(appId, apiKey, OcrWordsEnum.PRINT)
        .build();

// 读取图片并转为 Base64
byte[] imageBytes = IoUtil.readBytes(new FileInputStream("image/document.jpg"));
String imageBase64 = Base64.getEncoder().encodeToString(imageBytes);

// 发送识别请求
String result = client.generalWords(imageBase64);
System.out.println("请求地址：" + client.getHostUrl());
System.out.println("识别结果：" + result);
// 输出示例：{"code":0,"data":{"result":[{"content":"识别到的文字内容"}]},"message":"success"}
```
```

--------------------------------

### Face Comparison (FaceCompare)

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Compares two face images to determine their similarity and whether they belong to the same person. Supports common image formats like JPG/PNG and returns a similarity score.

```APIDOC
## Face Comparison (FaceCompare)

Compares two face images to determine their similarity and whether they belong to the same person. Supports common image formats like JPG/PNG and returns a similarity score.

```java
import cn.xfyun.api.FaceCompareClient;

FaceCompareClient client = new FaceCompareClient
        .Builder(appId, apiKey, apiSecret)
        .build();

// Read two face images and convert them to Base64
byte[] face1Bytes = IoUtil.readBytes(new FileInputStream("image/face1.jpg"));
byte[] face2Bytes = IoUtil.readBytes(new FileInputStream("image/face2.jpg"));
String face1Base64 = Base64.getEncoder().encodeToString(face1Bytes);
String face2Base64 = Base64.getEncoder().encodeToString(face2Bytes);

// Perform face comparison
String result = client.faceCompare(face1Base64, "jpg", face2Base64, "jpg");
System.out.println("Request URL: " + client.getHostUrl());
System.out.println("Comparison Result: " + result);
// Example Output: {"code":0,"data":{"score":0.98},"message":"success"}
// A score closer to 1 indicates higher similarity between the two faces.
```
```

--------------------------------

### Machine Translation (Translate)

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Supports three translation engines: Niutrans, ITS (Intelligent Translation System), and ITS Pro. ITS Pro supports personalized terminology. Translation results are returned encoded in Base64.

```APIDOC
## Machine Translation (Translate)

Supports three translation engines: Niutrans, ITS (Intelligent Translation System), and ITS Pro. ITS Pro supports personalized terminology. Translation results are returned encoded in Base64.

```java
import cn.xfyun.api.TransClient;
import cn.xfyun.model.translate.TransParam;

TransClient client = new TransClient.Builder(appId, apiKey, apiSecret).build();

TransParam param = TransParam.builder()
        .text("神舟十二号载人飞船发射任务取得圆满成功")
        .from("cn")   // Source language
        .to("en")     // Target language
        // .resId("your_term_id")  // Personalized terminology ID (only supported by ITS Pro)
        .build();

// Niutrans
String niuResult = client.sendNiuTrans(param);
System.out.println("Niutrans Result: " + niuResult);

// ITS (Intelligent Translation System)
String itsResult = client.sendIst(param);
System.out.println("ITS Translation Result: " + itsResult);

// ITS Pro (Enhanced Intelligent Translation System), result needs Base64 decoding
String itsProResult = client.sendIstV2(param);
String textBase64 = JSON.parseObject(itsProResult)
        .getJSONObject("payload")
        .getJSONObject("result")
        .getString("text");
String decoded = new String(Base64.getDecoder().decode(textBase64), StandardCharsets.UTF_8);
System.out.println("ITS Pro Translation Result: " + decoded);
// Example Output: The launch mission of Shenzhou-12 crewed spacecraft was a complete success.
```
```

--------------------------------

### Speech Synthesis (TTS) with WebSocket

Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt

Converts text to streaming audio using WebSocket. Supports various speakers and audio formats (PCM/MP3). Ensure the output file path is valid.

```java
import cn.xfyun.api.TtsClient;
import cn.xfyun.model.response.TtsResponse;
import cn.xfyun.service.tts.AbstractTtsWebSocketListener;

TtsClient ttsClient = new TtsClient.Builder()
        .signature(appId, apiKey, apiSecret)
        // .vcn("xiaoyan")   // 发音人，默认 xiaoyan（需在控制台添加）
        // .rdn("0")         // 数字发音：0=自动，1=数值，2=字符串
        .build();

File outputFile = new File("audio/tts_output.mp3");

ttsClient.send("今天天气真不错，我想出去走走。", new AbstractTtsWebSocketListener(outputFile) {
    @Override
    public void onSuccess(byte[] bytes) {
        // 音频数据块回调（已自动写入 outputFile）
        System.out.println("收到音频数据块，长度：" + bytes.length);
    }

    @Override
    public void onFail(WebSocket webSocket, Throwable throwable, Response response) {
        System.err.println("合成失败：" + throwable.getMessage());
    }

    @Override
    public void onBusinessFail(WebSocket webSocket, TtsResponse ttsResponse) {
        System.err.println("业务错误：" + ttsResponse.toString());
        // 错误码查询：https://www.xfyun.cn/document/error-code
    }
});
// 合成完成后，outputFile 中即为完整 MP3 音频文件
```