### 实时语音转写 Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md 通过WebSocket协议,将连续的音频流内容,实时识别返回对应的文字流内容。 ```APIDOC ## 实时语音转写 ### Description 实时语音转写(Real-time ASR)基于深度全序列卷积神经网络框架,通过 WebSocket 协议,建立应用与语言转写核心引擎的长连接,开发者可实现将连续的音频流内容,实时识别返回对应的文字流内容。 支持的音频格式: 采样率为16K,采样深度为16bit的pcm_s16le音频 ### Parameters #### Query Parameters - **lang** (string) - Optional - 实时语音转写语种,不传默认为中文 - 语种类型:中文、中英混合识别:cn;英文:en;小语种及方言可到控制台-实时语音转写-方言/语种处添加,添加后会显示该方言/语种参数值。 - **targetLang** (string) - Optional - 目标翻译语种 - 例如:targetLang="en" - 如果使用中文实时翻译为英文传参示例如下: "lang=cn&transType=normal&transStrategy=2&targetLang=en" *注意:需控制台开通翻译功能* ### Request Example ```json { "lang": "cn", "targetLang": "en" } ``` ### Response #### Success Response (200) - **text** (string) - 识别出的文本 - **audio_end** (boolean) - 音频是否结束 ``` -------------------------------- ### 音频文件语音转写 Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md 将长段音频(5小时以内)数据转换成文本数据,为信息处理和数据挖掘提供基础。 ```APIDOC ## 音频文件语音转写 ### Description 语音转写(Long Form ASR)基于深度全序列卷积神经网络,将长段音频(5小时以内)数据转换成文本数据,为信息处理和数据挖掘提供基础。 转写的是已录制音频(非实时),音频文件上传成功后进入等待队列,待转写成功后用户即可获取结果,返回结果时间受音频时长以及排队任务量的影响。 如遇转写耗时比平时延长,大概率表示当前时间段出现转写高峰,请耐心等待即可,我们承诺有效任务耗时最大不超过5小时 。 另外,为使转写服务更加通畅,请尽量转写5分钟以上的音频文件。 ### Parameters #### Query Parameters - **speaker_number** (string) - Optional - 发音人个数,可选值:0-10,0表示盲分 *注*:发音人分离目前还是测试效果达不到商用标准,如测试无法满足您的需求,请慎用该功能。 - **has_seperate** (string) - Optional - 转写结果中是否包含发音人分离信息 - **role_type** (string) - Optional - 支持两种参数 1: 通用角色分离 2: 电话信道角色分离(适用于speaker_number为2的说话场景)该字段只有在开通了角色分离功能的前提下才会生效,正确传入该参数后角色分离效果会有所提升。 如果该字段不传,默认采用 1 类型 - **language** (string) - Optional - 语种 cn:中英文&中文(默认) en:英文(英文不支持热词) ### Request Example ```json { "speaker_number": "2", "has_seperate": "true", "role_type": "1", "language": "cn" } ``` ### Response #### Success Response (200) - **result** (string) - 转写结果的JSON字符串 - **task_id** (string) - 任务ID ``` -------------------------------- ### 语音听写(流式版) Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md 提供1分钟内的即时语音转文字技术,支持实时返回识别结果。 ```APIDOC ## 语音听写(流式版) ### Description 语音听写流式接口,用于1分钟内的即时语音转文字技术,支持实时返回识别结果,达到一边上传音频一边获得识别文本的效果。 ### Parameters #### Query Parameters - **vad_eos** (int) - Optional - 用于设置端点检测的静默时间,单位是毫秒。即静默多长时间后引擎认为音频结束。默认2000(小语种除外,小语种不设置该参数默认为未开启VAD)。 - **dwa** (string) - Optional - (仅中文普通话支持)动态修正 - wpgs:开启流式结果返回功能 *注:该扩展功能若未授权无法使用,可到控制台-语音听写(流式版)-高级功能处免费开通;若未授权状态下设置该参数并不会报错,但不会生效。* ### Request Example ```json { "vad_eos": 3000, "dwa": "wpgs" } ``` ### Response #### Success Response (200) - **text** (string) - 识别出的文本 - **audio_end** (boolean) - 音频是否结束 ``` -------------------------------- ### 图片生成(ImageGen) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 根据自然语言文本描述生成图片,返回 Base64 编码的图片数据。需要解码后保存为 PNG 文件。适用于需要根据文本创意生成图像的场景。 ```java import cn.xfyun.api.ImageGenClient; ImageGenClient client = new ImageGenClient.Builder(appId, apiKey, apiSecret).build(); // 发送图片生成请求 String resp = client.send("帮我画一只可爱的小猫在草地上玩耍"); JSONObject obj = JSON.parseObject(resp); if (obj.getJSONObject("header").getInteger("code") != 0) { System.err.println("生成失败:" + resp); return; } // 从响应中提取 Base64 图片数据 String base64Image = obj.getJSONObject("payload") .getJSONObject("choices") .getJSONArray("text") .getJSONObject(0) .getString("content"); // 解码并保存为 PNG 文件 byte[] imageBytes = Base64.getDecoder().decode(base64Image); String outputPath = "src/main/resources/image/gen_output.png"; try (FileOutputStream fos = new FileOutputStream(outputPath)) { fos.write(imageBytes); System.out.println("图片已保存:" + outputPath); } ``` -------------------------------- ### 人脸比对(FaceCompare) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 对两张人脸图片进行相似度比对。支持 JPG/PNG 等常见图片格式,结果包含相似度分值。需要先读取图片文件并转为 Base64 编码。 ```java import cn.xfyun.api.FaceCompareClient; FaceCompareClient client = new FaceCompareClient .Builder(appId, apiKey, apiSecret) .build(); // 读取两张人脸图片并转为 Base64 byte[] face1Bytes = IoUtil.readBytes(new FileInputStream("image/face1.jpg")); byte[] face2Bytes = IoUtil.readBytes(new FileInputStream("image/face2.jpg")); String face1Base64 = Base64.getEncoder().encodeToString(face1Bytes); String face2Base64 = Base64.getEncoder().encodeToString(face2Bytes); // 执行人脸比对 String result = client.faceCompare(face1Base64, "jpg", face2Base64, "jpg"); System.out.println("请求地址:" + client.getHostUrl()); System.out.println("比对结果:" + result); // 输出示例:{"code":0,"data":{"score":0.98},"message":"success"} // score 越接近 1 表示两张人脸越相似 ``` -------------------------------- ### Configure Authentication Information Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Reads authentication parameters like appId, apiKey, and apiSecret from the test.properties file. Specific keys for real-time speech transcription and audio file transcription are also available. ```properties // src/main/resources/test.properties 示例配置 // appId=your_app_id // apiKey=your_api_key // apiSecret=your_api_secret // rtaAPIKey=your_rta_api_key // lfasrSecretKey=your_lfasr_secret_key // sparkApiPassword=your_spark_api_password ``` ```java // 读取配置 String appId = PropertiesConfig.getAppId(); String apiKey = PropertiesConfig.getApiKey(); String apiSecret = PropertiesConfig.getApiSecret(); // 实时语音转写专用 Key String rtaAPIKey = PropertiesConfig.getRtaAPIKey(); // 音频文件转写专用 SecretKey String lfasrSecretKey = PropertiesConfig.getLfasrSecretKey(); // 星火大模型 HTTP 接口专用密码 String sparkApiPassword = PropertiesConfig.getSparkApiPassword(); ``` -------------------------------- ### 图片生成(ImageGen) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 根据自然语言文本描述生成图片,返回结果为 Base64 编码的图片数据,可自动解码后保存为 PNG 文件。 ```APIDOC ## 图片生成(ImageGen) 根据自然语言文本描述生成图片,返回结果为 Base64 编码的图片数据,自动解码后保存为 PNG 文件。 ```java import cn.xfyun.api.ImageGenClient; ImageGenClient client = new ImageGenClient.Builder(appId, apiKey, apiSecret).build(); // 发送图片生成请求 String resp = client.send("帮我画一只可爱的小猫在草地上玩耍"); JSONObject obj = JSON.parseObject(resp); if (obj.getJSONObject("header").getInteger("code") != 0) { System.err.println("生成失败:" + resp); return; } // 从响应中提取 Base64 图片数据 String base64Image = obj.getJSONObject("payload") .getJSONObject("choices") .getJSONArray("text") .getJSONObject(0) .getString("content"); // 解码并保存为 PNG 文件 byte[] imageBytes = Base64.getDecoder().decode(base64Image); String outputPath = "src/main/resources/image/gen_output.png"; try (FileOutputStream fos = new FileOutputStream(outputPath)) { fos.write(imageBytes); System.out.println("图片已保存:" + outputPath); } ``` ``` -------------------------------- ### Audio File Speech Transcription (LFASR) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt This snippet shows how to perform asynchronous batch speech transcription for audio files using the LfasrClient. It follows a three-step process: upload, poll for results, and retrieve the transcription. It supports various tasks including transcription, translation, and quality inspection, with options for speaker diarization. ```APIDOC ## Audio File Speech Transcription (LFASR) This section details the usage of the `LfasrClient` for asynchronous batch audio file transcription. ### Description Asynchronous batch transcription for audio files up to 5 hours. It uses a three-step process: upload, poll for results, and retrieve the final transcription. Supports multiple task types like transcription (`transfer`), translation (`translate`), and quality inspection (`predict`), with an option for speaker diarization. ### Client Construction ```java import cn.xfyun.api.LfasrClient; LfasrClient lfasrClient = new LfasrClient.Builder(appId, lfasrSecretKey) // .roleType((short) 1) // Speaker diarization: 1=General, 2=Telephony channel // .transLanguage("en") // Translation target language // .audioMode("urlLink") // Use remote URL for upload .build(); ``` ### Step 1: Upload Audio File ```java import cn.xfyun.model.response.lfasr.LfasrResponse; // Upload a local file LfasrResponse uploadResp = lfasrClient.uploadFile("audio/lfasr.wav"); // Or upload from a remote URL // LfasrResponse uploadResp = lfasrClient.uploadUrl("https://example.com/audio.wav"); if (!"000000".equals(uploadResp.getCode())) { System.err.println("Upload failed: " + uploadResp.getDescInfo()); return; } String orderId = uploadResp.getContent().getOrderId(); System.out.println("Task orderId: " + orderId); ``` ### Step 2: Poll for Results ```java import cn.xfyun.model.enums.LfasrOrderStatusEnum; import java.util.concurrent.TimeUnit; int status = LfasrOrderStatusEnum.CREATED.getKey(); while (status != LfasrOrderStatusEnum.COMPLETED.getKey() && status != LfasrOrderStatusEnum.FAILED.getKey()) { LfasrResponse resultResp = lfasrClient.getResult(orderId, "transfer"); status = resultResp.getContent().getOrderInfo().getStatus(); System.out.println("Order status: " + LfasrOrderStatusEnum.getEnum(status).getValue()); if (status == LfasrOrderStatusEnum.COMPLETED.getKey()) { // Step 3: Parse transcription results (lattice structure) LfasrOrderResult orderResult = gson.fromJson(resultResp.getContent().getOrderResult(), LfasrOrderResult.class); for (LfasrOrderResult.Lattice lattice : orderResult.getLattice()) { System.out.println("Role-" + lattice.getJson1Best().getSt().getRl() + ": " + extractText(lattice)); } break; } TimeUnit.SECONDS.sleep(20); // Poll every 20 seconds } ``` ### Step 3: Parse Transcription Results (Example within Step 2) ```java // This part is included within the Step 2 loop when status is COMPLETED. // Example of parsing the lattice structure: // LfasrOrderResult orderResult = gson.fromJson(resultResp.getContent().getOrderResult(), LfasrOrderResult.class); // for (LfasrOrderResult.Lattice lattice : orderResult.getLattice()) { // System.out.println("Role-" + lattice.getJson1Best().getSt().getRl() + ": " + extractText(lattice)); // } // Helper function to extract text from lattice (implementation not provided in source) // static String extractText(LfasrOrderResult.Lattice lattice) { ... } ``` ``` -------------------------------- ### Real-time Speech Transcription (IAT) with WebSocket Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Performs real-time speech transcription using WebSocket. Supports file input, microphone capture, and custom streaming. Enables dynamic correction (wpgs) for real-time result refinement. Handles intermediate and final results via callbacks. ```java import cn.xfyun.api.IatClient; import cn.xfyun.model.response.iat.IatResponse; import cn.xfyun.model.response.iat.IatResult; import cn.xfyun.model.response.iat.Text; import cn.xfyun.service.iat.AbstractIatWebSocketListener; // 构建客户端 IatClient iatClient = new IatClient.Builder() .signature(appId, apiKey, apiSecret) .dwa("wpgs") // 开启流式结果修正 .vad_eos(6000) // 静默结束检测时间(毫秒) .build(); List resultSegments = new ArrayList<>(); // 从文件发送并监听结果 iatClient.send(new File("audio/iat_pcm_16k.pcm"), new AbstractIatWebSocketListener() { @Override public void onSuccess(WebSocket webSocket, IatResponse resp) { if (resp.getCode() != 0) { System.err.println("错误码:" + resp.getCode() + ",错误信息:" + resp.getMessage()); // 错误码查询:https://www.xfyun.cn/document/error-code return; } if (resp.getData() != null && resp.getData().getResult() != null) { Text text = resp.getData().getResult().getText(); // 处理 wpgs 流式修正结果 if ("rpl".equals(text.getPgs()) && text.getRg() != null) { for (int i = text.getRg()[0] - 1; i <= text.getRg()[1] - 1; i++) { resultSegments.get(i).setDeleted(true); } } resultSegments.add(text); System.out.println("中间结果:" + getFinalResult(resultSegments)); } if (resp.getData() != null && resp.getData().getStatus() == 2) { // status=2 表示全部结果返回完毕 System.out.println("最终结果:" + getFinalResult(resultSegments)); iatClient.closeWebsocket(); } } @Override public void onFail(WebSocket webSocket, Throwable t, Response response) { System.err.println("连接失败:" + t.getMessage()); } }); // 拼接最终识别结果 static String getFinalResult(List segments) { return segments.stream() .filter(t -> t != null && !t.isDeleted()) .map(Text::getText) .collect(Collectors.joining()); } ``` -------------------------------- ### 星火智能体(Agent)- 流式调用 Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 调用讯飞星火智能体工作流,支持流式(SSE)输出。可传入动态参数,并支持多步工作流进度跟踪。需要实现 AgentCallback 接口处理事件。 ```java import cn.xfyun.api.AgentClient; import cn.xfyun.model.agent.AgentChatParam; import cn.xfyun.service.agent.AgentCallback; AgentClient client = new AgentClient.Builder(apiKey, apiSecret).build(); // 构建智能体请求参数 JSONObject parameter = new JSONObject(); parameter.put("AGENT_USER_INPUT", "今天天气怎么样"); AgentChatParam agentParam = AgentChatParam.builder() .flowId("7351431612989308928") // 工作流 ID(在控制台获取) .parameters(parameter) .build(); StringBuilder finalResult = new StringBuilder(); // 流式(SSE)调用 client.completion(agentParam, new AgentCallback() { @Override public void onEvent(Call call, String id, String type, String data) { JSONObject obj = JSON.parseObject(data); JSONObject delta = obj.getJSONArray("choices").getJSONObject(0).getJSONObject("delta"); String content = delta.getString("content"); if (content != null && !content.isEmpty()) { finalResult.append(content); System.out.print(content); // 流式打印 } // 工作流进度 JSONObject step = obj.getJSONObject("workflow_step"); if (step != null) { System.out.printf("进度:%.0f%%%n", step.getFloat("progress") * 100); } String finishReason = obj.getJSONArray("choices") .getJSONObject(0).getString("finish_reason"); if ("stop".equals(finishReason)) { System.out.println("\n最终结果:" + finalResult); } } @Override public void onFail(Call call, Throwable t) { System.err.println("SSE 连接失败:" + t.getMessage()); } @Override public void onClosed(Call call) { call.cancel(); } @Override public void onOpen(Call call, Response response) { System.out.println("SSE 连接建立"); } }); ``` -------------------------------- ### 星火大模型多轮对话(WebSocket/HTTP) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 支持 WebSocket 流式调用和 HTTP POST 同步调用与星火大模型进行多轮对话。可配置函数调用、联网搜索和思维链等高级能力。WebSocket 调用需要实现 AbstractSparkModelWebSocketListener 来处理响应。 ```java import cn.xfyun.api.SparkChatClient; import cn.xfyun.config.SparkModel; import cn.xfyun.model.sparkmodel.*; import cn.xfyun.model.sparkmodel.response.SparkChatResponse; import cn.xfyun.service.sparkmodel.AbstractSparkModelWebSocketListener; // 构建多轮对话消息 List messages = new ArrayList<>(); RoleContent systemMsg = new RoleContent(); systemMsg.setRole("system"); systemMsg.setContent("你是一个智能助手。"); RoleContent userMsg = new RoleContent(); userMsg.setRole("user"); userMsg.setContent("北京今天天气怎么样"); messages.add(systemMsg); messages.add(userMsg); // 构建请求参数 SparkChatParam param = SparkChatParam.builder() .messages(messages) .chatId("session_001") .thinkingType("disabled") // 思维链:disabled/enabled // .webSearch(webSearch) // 联网搜索(Pro/Max/Ultra 支持) // .functions(functions) // 函数调用(Max/4.0 Ultra 支持) .build(); // 方式一:WebSocket 流式调用 SparkChatClient wsClient = new SparkChatClient.Builder() .signatureWs(appId, apiKey, apiSecret, SparkModel.SPARK_X1) .build(); StringBuilder finalResult = new StringBuilder(); wsClient.send(param, new AbstractSparkModelWebSocketListener() { @Override public void onSuccess(WebSocket webSocket, SparkChatResponse resp) { if (resp.getHeader().getCode() != 0) { System.err.println("错误:" + resp.getHeader().getMessage()); return; } resp.getPayload().getChoices().getText().forEach(text -> { if (text.getContent() != null) { finalResult.append(text.getContent()); System.out.print(text.getContent()); // 流式打印 } }); if (resp.getPayload().getChoices().getStatus() == 2) { System.out.println("\n完整回复:" + finalResult); webSocket.close(1000, ""); } } @Override public void onFail(WebSocket webSocket, Throwable t, Response response) { System.err.println("连接失败:" + t.getMessage()); } }); // 方式二:HTTP POST 同步调用 SparkChatClient httpClient = new SparkChatClient.Builder() .signatureHttp(sparkApiPassword, SparkModel.SPARK_X1) .build(); String result = httpClient.send(param); JSONObject obj = JSON.parseObject(result); String content = obj.getJSONArray("choices").getJSONObject(0) .getJSONObject("message").getString("content"); System.out.println("HTTP 回复:" + content); ``` -------------------------------- ### Real-Time Speech-to-Text (RTASR) with WebSocket Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Use this for continuous audio streams. It supports multiple languages and can be combined with translation for simultaneous interpretation. Input can be from file streams, byte arrays, or microphones. ```java import cn.xfyun.api.RtasrClient; import cn.xfyun.model.response.rtasr.RtasrResponse; import cn.xfyun.service.rta.AbstractRtasrWebSocketListener; // 构建客户端(仅需 appId 和 rtaAPIKey) RtasrClient rtasrClient = new RtasrClient.Builder() .signature(appId, rtaAPIKey) // .lang("cn") // 语种:cn(默认)/ en // .targetLang("en") // 目标翻译语种(需在控制台开通翻译功能) .build(); StringBuffer finalResult = new StringBuffer(); CountDownLatch latch = new CountDownLatch(1); // 通过输入流发送音频 FileInputStream inputStream = new FileInputStream("audio/rtasr.pcm"); rtasrClient.send(inputStream, new AbstractRtasrWebSocketListener() { @Override public void onSuccess(WebSocket webSocket, String text) { RtasrResponse response = JSONObject.parseObject(text, RtasrResponse.class); // 解析 data 字段中的 cn.st.rt 结构获取文字 String tempResult = handleContent(response.getData()); System.out.println("实时结果:" + finalResult + tempResult); } @Override public void onFail(WebSocket webSocket, Throwable t, Response response) { latch.countDown(); } @Override public void onBusinessFail(WebSocket webSocket, String text) { System.err.println("业务异常:" + text); latch.countDown(); } @Override public void onClosed() { latch.countDown(); } }); latch.await(); // 等待转写完成 // 解析转写结构(type=0 为完整句,type=1 为中间结果) static String handleContent(String data) { JSONObject cn = JSON.parseObject(data).getJSONObject("cn"); JSONArray rtArr = cn.getJSONObject("st").getJSONArray("rt"); StringBuilder sb = new StringBuilder(); for (int i = 0; i < rtArr.size(); i++) { rtArr.getJSONObject(i).getJSONArray("ws").forEach(ws -> { ((JSONObject) ws).getJSONArray("cw").forEach(cw -> sb.append(((JSONObject) cw).getString("w"))); }); } String type = cn.getJSONObject("st").getString("type"); if ("0".equals(type)) finalResult.append(sb); return "1".equals(type) ? sb.toString() : ""; } ``` -------------------------------- ### 语音合成(流式版) Source: https://github.com/iflytek-op/websdk-java-demo/blob/main/README.md 将文字信息转化为声音信息,同时提供了众多极具特色的发音人(音库)供您选择。 ```APIDOC ## 语音合成(流式版) ### Description 语音合成流式接口将文字信息转化为声音信息,同时提供了众多极具特色的发音人(音库)供您选择。 ### Parameters #### Query Parameters - **vcn** (string) - Required - 发音人,可选值:请到控制台添加试用或购买发音人,添加后即显示发音人参数值。 - **rdn** (string) - Optional - 合成音频数字发音方式 0:自动判断(默认值) 1:完全数值 2:完全字符串 3:字符串优先 ### Request Example ```json { "vcn": "xiaoyan", "rdn": "0" } ``` ### Response #### Success Response (200) - **audio** (string) - 合成的音频流数据 ``` -------------------------------- ### 机器翻译(Translate) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 支持小牛翻译、自研机器翻译(ITS)和自研机器翻译增强版(ITS Pro)三种翻译引擎。ITS Pro 支持个性化术语,翻译结果以 Base64 编码返回,需要解码后使用。 ```java import cn.xfyun.api.TransClient; import cn.xfyun.model.translate.TransParam; TransClient client = new TransClient.Builder(appId, apiKey, apiSecret).build(); TransParam param = TransParam.builder() .text("神舟十二号载人飞船发射任务取得圆满成功") .from("cn") // 源语种 .to("en") // 目标语种 // .resId("your_term_id") // 个性化术语ID(仅 ITS Pro 支持) .build(); // 小牛翻译 String niuResult = client.sendNiuTrans(param); System.out.println("小牛翻译结果:" + niuResult); // 自研机器翻译(ITS) String itsResult = client.sendIst(param); System.out.println("ITS 翻译结果:" + itsResult); // 自研机器翻译增强版(ITS Pro),结果需 Base64 解码 String itsProResult = client.sendIstV2(param); String textBase64 = JSON.parseObject(itsProResult) .getJSONObject("payload") .getJSONObject("result") .getString("text"); String decoded = new String(Base64.getDecoder().decode(textBase64), StandardCharsets.UTF_8); System.out.println("ITS Pro 翻译结果:" + decoded); // 输出示例:The launch mission of Shenzhou-12 crewed spacecraft was a complete success. ``` -------------------------------- ### Spark Agent (Agent) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Calls the Xunfei Spark Agent workflow. Supports both streaming (SSE) and non-streaming modes. Allows passing dynamic parameters, tracking multi-step workflow progress, and interactive nodes (e.g., option selection). ```APIDOC ## Spark Agent (Agent) Calls the Xunfei Spark Agent workflow. Supports both streaming (SSE) and non-streaming modes. Allows passing dynamic parameters, tracking multi-step workflow progress, and interactive nodes (e.g., option selection). ```java import cn.xfyun.api.AgentClient; import cn.xfyun.model.agent.AgentChatParam; import cn.xfyun.service.agent.AgentCallback; AgentClient client = new AgentClient.Builder(apiKey, apiSecret).build(); // Construct agent request parameters JSONObject parameter = new JSONObject(); parameter.put("AGENT_USER_INPUT", "What is the weather today?"); AgentChatParam agentParam = AgentChatParam.builder() .flowId("7351431612989308928") // Workflow ID (obtained from the console) .parameters(parameter) .build(); StringBuilder finalResult = new StringBuilder(); // Streaming (SSE) call client.completion(agentParam, new AgentCallback() { @Override public void onEvent(Call call, String id, String type, String data) { JSONObject obj = JSON.parseObject(data); JSONObject delta = obj.getJSONArray("choices").getJSONObject(0).getJSONObject("delta"); String content = delta.getString("content"); if (content != null && !content.isEmpty()) { finalResult.append(content); System.out.print(content); // Streamed printing } // Workflow progress JSONObject step = obj.getJSONObject("workflow_step"); if (step != null) { System.out.printf("Progress: %.0f%%%n", step.getFloat("progress") * 100); } String finishReason = obj.getJSONArray("choices") .getJSONObject(0).getString("finish_reason"); if ("stop".equals(finishReason)) { System.out.println("\nFinal Result: " + finalResult); } } @Override public void onFail(Call call, Throwable t) { System.err.println("SSE connection failed: " + t.getMessage()); } @Override public void onClosed(Call call) { call.cancel(); } @Override public void onOpen(Call call, Response response) { System.out.println("SSE connection established"); } }); ``` ``` -------------------------------- ### Real-time Speech Transcription (RTASR) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt This snippet demonstrates how to use the RtasrClient for real-time speech-to-text transcription via WebSocket. It supports various input methods like file streams, byte arrays, and microphones, and can be configured for different languages and translation. ```APIDOC ## Real-time Speech Transcription (RTASR) This section details the usage of the `RtasrClient` for real-time audio transcription. ### Description Utilizes WebSocket for continuous audio stream transcription, suitable for scenarios requiring real-time feedback. Supports multiple languages and can be integrated with translation for simultaneous interpretation. Accepts input from file streams, byte arrays, and microphones. ### Client Construction ```java import cn.xfyun.api.RtasrClient; RtasrClient rtasrClient = new RtasrClient.Builder() .signature(appId, rtaAPIKey) // .lang("cn") // Language: cn (default) / en // .targetLang("en") // Target translation language (requires translation feature enabled in console) .build(); ``` ### Sending Audio Stream ```java import cn.xfyun.model.response.rtasr.RtasrResponse; import cn.xfyun.service.rta.AbstractRtasrWebSocketListener; import java.io.FileInputStream; import java.util.concurrent.CountDownLatch; // Using FileInputStream as an example FileInputStream inputStream = new FileInputStream("audio/rtasr.pcm"); rtasrClient.send(inputStream, new AbstractRtasrWebSocketListener() { @Override public void onSuccess(WebSocket webSocket, String text) { RtasrResponse response = JSONObject.parseObject(text, RtasrResponse.class); // Parse the cn.st.rt structure in the data field to get the text String tempResult = handleContent(response.getData()); System.out.println("Real-time result: " + finalResult + tempResult); } @Override public void onFail(WebSocket webSocket, Throwable t, Response response) { latch.countDown(); } @Override public void onBusinessFail(WebSocket webSocket, String text) { System.err.println("Business exception: " + text); latch.countDown(); } @Override public void onClosed() { latch.countDown(); } }); latch.await(); // Wait for transcription to complete ``` ### Handling Transcription Results ```java // Parse the transcription structure (type=0 for complete sentences, type=1 for intermediate results) static String handleContent(String data) { JSONObject cn = JSON.parseObject(data).getJSONObject("cn"); JSONArray rtArr = cn.getJSONObject("st").getJSONArray("rt"); StringBuilder sb = new StringBuilder(); for (int i = 0; i < rtArr.size(); i++) { rtArr.getJSONObject(i).getJSONArray("ws").forEach(ws -> { ((JSONObject) ws).getJSONArray("cw").forEach(cw -> sb.append(((JSONObject) cw).getString("w"))); }); } String type = cn.getJSONObject("st").getString("type"); if ("0".equals(type)) finalResult.append(sb); return "1".equals(type) ? sb.toString() : ""; } ``` ``` -------------------------------- ### 文字识别 OCR(GeneralWords) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 支持印刷体和手写体文字识别。图片以 Base64 编码格式传输。适用于文档扫描、票据识别等需要从图片中提取文本的场景。 ```java import cn.xfyun.api.GeneralWordsClient; import cn.xfyun.config.OcrWordsEnum; // OcrWordsEnum.PRINT 印刷文字识别 // OcrWordsEnum.HANDWRITING 手写文字识别 GeneralWordsClient client = new GeneralWordsClient .Builder(appId, apiKey, OcrWordsEnum.PRINT) .build(); // 读取图片并转为 Base64 byte[] imageBytes = IoUtil.readBytes(new FileInputStream("image/document.jpg")); String imageBase64 = Base64.getEncoder().encodeToString(imageBytes); // 发送识别请求 String result = client.generalWords(imageBase64); System.out.println("请求地址:" + client.getHostUrl()); System.out.println("识别结果:" + result); // 输出示例:{"code":0,"data":{"result":[{"content":"识别到的文字内容"}]},"message":"success"} ``` -------------------------------- ### Asynchronous Batch Speech-to-Text (LFASR) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Use this for asynchronous batch processing of audio files up to 5 hours long. It follows an upload-poll-retrieve process and supports tasks like transcription, translation, and quality inspection, with optional speaker diarization. ```java import cn.xfyun.api.LfasrClient; import cn.xfyun.model.response.lfasr.LfasrResponse; // 构建客户端 LfasrClient lfasrClient = new LfasrClient.Builder(appId, lfasrSecretKey) // .roleType((short) 1) // 发音人分离:1=通用,2=电话信道 // .transLanguage("en") // 翻译目标语种 // .audioMode("urlLink") // 使用远程URL上传 .build(); // 第一步:上传本地文件(或远程URL) LfasrResponse uploadResp = lfasrClient.uploadFile("audio/lfasr.wav"); // LfasrResponse uploadResp = lfasrClient.uploadUrl("https://example.com/audio.wav"); if (!"000000".equals(uploadResp.getCode())) { System.err.println("上传失败:" + uploadResp.getDescInfo()); return; } String orderId = uploadResp.getContent().getOrderId(); System.out.println("任务 orderId:" + orderId); // 第二步:轮询查询结果(每隔20秒查询一次) int status = LfasrOrderStatusEnum.CREATED.getKey(); while (status != LfasrOrderStatusEnum.COMPLETED.getKey() && status != LfasrOrderStatusEnum.FAILED.getKey()) { LfasrResponse resultResp = lfasrClient.getResult(orderId, "transfer"); status = resultResp.getContent().getOrderInfo().getStatus(); System.out.println("订单状态:" + LfasrOrderStatusEnum.getEnum(status).getValue()); if (status == LfasrOrderStatusEnum.COMPLETED.getKey()) { // 第三步:解析转写结果(lattice 结构) LfasrOrderResult orderResult = gson.fromJson( resultResp.getContent().getOrderResult(), LfasrOrderResult.class); for (LfasrOrderResult.Lattice lattice : orderResult.getLattice()) { System.out.println("角色-" + lattice.getJson1Best().getSt().getRl() + ":" + extractText(lattice)); } break; } TimeUnit.SECONDS.sleep(20); } ``` -------------------------------- ### 文字识别 OCR(GeneralWords) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt 支持印刷文字识别和手写文字识别,图片以 Base64 编码格式传输,适用于文档扫描、票据识别等场景。 ```APIDOC ## 文字识别 OCR(GeneralWords) 支持印刷文字识别和手写文字识别,图片以 Base64 编码格式传输,适用于文档扫描、票据识别等场景。 ```java import cn.xfyun.api.GeneralWordsClient; import cn.xfyun.config.OcrWordsEnum; // OcrWordsEnum.PRINT 印刷文字识别 // OcrWordsEnum.HANDWRITING 手写文字识别 GeneralWordsClient client = new GeneralWordsClient .Builder(appId, apiKey, OcrWordsEnum.PRINT) .build(); // 读取图片并转为 Base64 byte[] imageBytes = IoUtil.readBytes(new FileInputStream("image/document.jpg")); String imageBase64 = Base64.getEncoder().encodeToString(imageBytes); // 发送识别请求 String result = client.generalWords(imageBase64); System.out.println("请求地址:" + client.getHostUrl()); System.out.println("识别结果:" + result); // 输出示例:{"code":0,"data":{"result":[{"content":"识别到的文字内容"}]},"message":"success"} ``` ``` -------------------------------- ### Face Comparison (FaceCompare) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Compares two face images to determine their similarity and whether they belong to the same person. Supports common image formats like JPG/PNG and returns a similarity score. ```APIDOC ## Face Comparison (FaceCompare) Compares two face images to determine their similarity and whether they belong to the same person. Supports common image formats like JPG/PNG and returns a similarity score. ```java import cn.xfyun.api.FaceCompareClient; FaceCompareClient client = new FaceCompareClient .Builder(appId, apiKey, apiSecret) .build(); // Read two face images and convert them to Base64 byte[] face1Bytes = IoUtil.readBytes(new FileInputStream("image/face1.jpg")); byte[] face2Bytes = IoUtil.readBytes(new FileInputStream("image/face2.jpg")); String face1Base64 = Base64.getEncoder().encodeToString(face1Bytes); String face2Base64 = Base64.getEncoder().encodeToString(face2Bytes); // Perform face comparison String result = client.faceCompare(face1Base64, "jpg", face2Base64, "jpg"); System.out.println("Request URL: " + client.getHostUrl()); System.out.println("Comparison Result: " + result); // Example Output: {"code":0,"data":{"score":0.98},"message":"success"} // A score closer to 1 indicates higher similarity between the two faces. ``` ``` -------------------------------- ### Machine Translation (Translate) Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Supports three translation engines: Niutrans, ITS (Intelligent Translation System), and ITS Pro. ITS Pro supports personalized terminology. Translation results are returned encoded in Base64. ```APIDOC ## Machine Translation (Translate) Supports three translation engines: Niutrans, ITS (Intelligent Translation System), and ITS Pro. ITS Pro supports personalized terminology. Translation results are returned encoded in Base64. ```java import cn.xfyun.api.TransClient; import cn.xfyun.model.translate.TransParam; TransClient client = new TransClient.Builder(appId, apiKey, apiSecret).build(); TransParam param = TransParam.builder() .text("神舟十二号载人飞船发射任务取得圆满成功") .from("cn") // Source language .to("en") // Target language // .resId("your_term_id") // Personalized terminology ID (only supported by ITS Pro) .build(); // Niutrans String niuResult = client.sendNiuTrans(param); System.out.println("Niutrans Result: " + niuResult); // ITS (Intelligent Translation System) String itsResult = client.sendIst(param); System.out.println("ITS Translation Result: " + itsResult); // ITS Pro (Enhanced Intelligent Translation System), result needs Base64 decoding String itsProResult = client.sendIstV2(param); String textBase64 = JSON.parseObject(itsProResult) .getJSONObject("payload") .getJSONObject("result") .getString("text"); String decoded = new String(Base64.getDecoder().decode(textBase64), StandardCharsets.UTF_8); System.out.println("ITS Pro Translation Result: " + decoded); // Example Output: The launch mission of Shenzhou-12 crewed spacecraft was a complete success. ``` ``` -------------------------------- ### Speech Synthesis (TTS) with WebSocket Source: https://context7.com/iflytek-op/websdk-java-demo/llms.txt Converts text to streaming audio using WebSocket. Supports various speakers and audio formats (PCM/MP3). Ensure the output file path is valid. ```java import cn.xfyun.api.TtsClient; import cn.xfyun.model.response.TtsResponse; import cn.xfyun.service.tts.AbstractTtsWebSocketListener; TtsClient ttsClient = new TtsClient.Builder() .signature(appId, apiKey, apiSecret) // .vcn("xiaoyan") // 发音人,默认 xiaoyan(需在控制台添加) // .rdn("0") // 数字发音:0=自动,1=数值,2=字符串 .build(); File outputFile = new File("audio/tts_output.mp3"); ttsClient.send("今天天气真不错,我想出去走走。", new AbstractTtsWebSocketListener(outputFile) { @Override public void onSuccess(byte[] bytes) { // 音频数据块回调(已自动写入 outputFile) System.out.println("收到音频数据块,长度:" + bytes.length); } @Override public void onFail(WebSocket webSocket, Throwable throwable, Response response) { System.err.println("合成失败:" + throwable.getMessage()); } @Override public void onBusinessFail(WebSocket webSocket, TtsResponse ttsResponse) { System.err.println("业务错误:" + ttsResponse.toString()); // 错误码查询:https://www.xfyun.cn/document/error-code } }); // 合成完成后,outputFile 中即为完整 MP3 音频文件 ```