之前已经做好了产品demo:
然后需要去使用在线语音接口,把文字转换为语音
然后集成到产品demo中。
所以不仅要能用语音接口,且要整合进来。
先去用目前发现的,相对来说最好用的:
百度的语音合成api
刚发现:
还可以生成临时的语音文件,以url的形式输出
但是有个缺点:
好像必须是:
标题也要有,内容也要有,才能生成
然后先去注册百度开发者账号:
接着:
详细看官网文档:
“浏览器跨域
目前合成接口支持浏览器跨域。
跨域demo示例: https://github.com/Baidu-AIP/SPEECH-TTS-CORS
由于获取token的接口不支持浏览器跨域。因此需要您从服务端获取或者每个30天手动输入更新。”
去获取token
然后去用浏览器打开:
https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=SNjsggdYDNWtnlbKhxsPLcaz&client_secret=47d7c02dxxxxxxxxxxxxxxe7ba
获得了token:
{ "access_token": "24.569b3b5b470938a522ce60d2e2ea2506.2592000.1528015602.282335-11192483", "session_key": "9mzdDoR4p/oexxx0Yp9VoSgFCFOSGEIA==", "scope": "public audio_voice_assistant_get audio_tts_post wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test权限 vis-classify_flower lpq_开放 cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base", "refresh_token": "25.5axxxx5-xxx3", "session_secret": "12xxxa", "expires_in": 2592000 }
再去调用url:
http://tsn.baidu.com/text2audio?lan=zh&ctp=1&cuid=xxx_robot&tok=24.56xxx3&vol=9&per=0&spd=5&pit=5&tex=as+a+book-collector%2c+i+have+the+story+you+just+want+to+listen!
获取合成后的mp3:
很明显,此处是直接返回mp3的内容的
而不是希望的临时的mp3的临时的url
而之前的:
是可以返回临时mp3的url的
比如:
搜了下:
boscdn bpc baidu.com
感觉是百度的一个cdn的服务器
然后还有js接口上传内容上去,生成临时url的
但是貌似是百度内部自己用的?
接着需要:
1.最好把百度的token,弄成那个永久的,或者至少是1年的,而不是现在的1个月的
2.最好把生成的mp3的文件,弄成一个url可以返回给用户的
感觉需要是用自己的flask的rest的api中,封装百度的接口,给外界一个统一的接口,返回mp3的url,然后是有临时时限的 -》 那内部可以考虑把mp3保存到 /tmp 或者是redis然后设置一个expire时限?
先去弄永久的token的事情:
而关于文档,从:
发现了:
也找到了Python文档:
不过,对于,想要去模拟,当access_token失效时,百度接口会返回什么
突然想到,可以用刚才已经被refresh_token刷新后,而失效的之前的access_token,去调用看看,返回什么
结果之前的token,竟然还能用:
那算了,把token值随便改一下,去尝试模拟一个无效的token
结果返回:
{ "err_detail": "Access token invalid or no longer valid", "err_msg": "authentication failed.", "err_no": 502, "err_subcode": 50004, "tts_logid": 1007366076 }
错误码解释
错误码 | 含义 |
500 | 不支持输入 |
501 | 输入参数不正确 |
502 | token验证失败 |
503 | 合成后端错误 |
百度 err_subcode 50004
百度 authentication failed 50004
50004
Passport Not Login
未登录百度账号passport
400
那目前就可以暂定为如下思路了:
用Flask去封装百度的语音合成的api
然后内部使用Python的SDK(用pip去安装)
如果返回dict,且发现是err_no是502的话,则确定是token无效或过期
则使用refresh_token去重新刷新获得有效的token
重新再去尝试一次
然后正常的话,返回得到mp3的数据
再考虑如何处理,放到哪里,生成一个外部可以直接访问的url
此处,参考:
发现是:
// 参数含义请参考 https://ai.baidu.com/docs#/TTS-API/41ac79a6 audio = btts({ 。。。 onSuccess: function(htmlAudioElement) { audio = htmlAudioElement; playBtn.innerText = '播放'; },
发现是btts直接返回了audio这个html的element?
去看:
发现是:
document.body.append(audio); audio.setAttribute('src', URL.createObjectURL(xhr.response));
好像是创建了本地的文件了?
去搜:
URL.createObjectURL
“File对象,就是一个文件,比如我用input type=”file”标签来上传文件,那么里面的每个文件都是一个File对象.
Blob对象,就是二进制数据,比如通过new Blob()创建的对象就是Blob对象.又比如,在XMLHttpRequest里,如果指定responseType为blob,那么得到的返回值也是一个blob对象.”
所以此处就是返回了mp3的二进制数据,是blob格式,传递给createObjectURL,生成了临时的文件,可以用来播放了
-》那么我后续封装出来的接口,倒是也可以考虑支持两种:
- 直接返回mp3的url
- 返回mp3的二进制数据文件
而返回的类型,可以通过输入参数指定
然后就是去:
接着就可以继续去:
接着就是去:
前端web页面中把相关的之前输出text的接口,更新为,解析返回的mp3的(临时文件)的url,以及调用播放器播放出来:
var curResponseDict = respJsonObj["data"]["response"]; console.log("curResponseDict=%s", curResponseDict); var curResponseText = curResponseDict["text"]; console.log("curResponseText=%s", curResponseText); $('#response_text p').text(curResponseText); var curResponseAudioUrl = curResponseDict["audioUrl"]; console.log("curResponseAudioUrl=%s", curResponseAudioUrl); if (curResponseAudioUrl) { console.log("now play the response text's audio %s", curResponseAudioUrl); var respTextAudioObj = $(".response_text_audio_player audio")[0]; console.log("respTextAudioObj=%o", respTextAudioObj); $(".response_text_audio_player .col-sm-offset-1").text(curResponseText); $(".response_text_audio_player audio source").attr("src", curResponseAudioUrl); respTextAudioObj.load(); console.log("has load respTextAudioObj=%o", respTextAudioObj); respTextAudioPromise = respTextAudioObj.play(); // console.log("respTextAudioPromise=%o", respTextAudioPromise); if (respTextAudioPromise !== undefined) { respTextAudioPromise.then(() => { // Auto-play started console.log("Auto paly audio started, respTextAudioPromise=%o", respTextAudioPromise); }).catch(error => { // Auto-play was prevented // Show a UI element to let the user manually start playback console.error("play response text's audio promise error=%o", error); //NotAllowedError: The request is not allowed by the user agent or the platform in the current context, possibly because the user denied permission. }); } }
已经可以去播放返回的text的audio了:
然后等个1秒左右,再播放被点播的文件
所以先要去解决:
然后故意再去优化,当出错时显示错误信息,期间:
【已解决】js中如何实现字符串拼接或格式化
以及:
【已解决】js的jquery的ajax的get返回的error错误的详细信息
【总结】
最后实现了想要的效果
后端:
Flask的REST API和百度接口的初始化和调用:
app.py
from flask import Flask from flask import jsonify from flask_restful import Resource, Api, reqparse import logging from logging.handlers import RotatingFileHandler from bson.objectid import ObjectId from flask import send_file import os import io import re from urllib.parse import quote import json import uuid from flask_cors import CORS import requests from celery import Celery ################################################################################ # Global Definitions ################################################################################ """ http://ai.baidu.com/docs#/TTS-API/top 500 不支持输入 501 输入参数不正确 502 token验证失败 503 合成后端错误 """ BAIDU_ERR_NOT_SUPPORT_PARAM = 500 BAIDU_ERR_PARAM_INVALID = 501 BAIDU_ERR_TOKEN_INVALID = 502 BAIDU_ERR_BACKEND_SYNTHESIS_FAILED = 503 ################################################################################ # Global Variables ################################################################################ log = None app = None """ { "access_token": "24.569bcccccccc11192484", "session_key": "9mxxxxxxEIB==", "scope": "public audio_voice_assistant_get audio_tts_post wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test权限 vis-classify_flower lpq_开放 cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base", "refresh_token": "25.6acfxxxx2483", "session_secret": "121xxxxxfa", "expires_in": 2592000 } """ gCurBaiduRespDict = {} # get baidu token resp dict gTempAudioFolder = "" ################################################################################ # Global Function ################################################################################ def generateUUID(prefix = ""): generatedUuid4 = uuid.uuid4() generatedUuid4Str = str(generatedUuid4) newUuid = prefix + generatedUuid4Str return newUuid #---------------------------------------- # Audio Synthesis / TTS #---------------------------------------- def createAudioTempFolder(): """create foler to save later temp audio files""" global log, gTempAudioFolder # init audio temp folder for later store temp audio file audioTmpFolder = app.config["AUDIO_TEMP_FOLDER"] log.info("audioTmpFolder=%s", audioTmpFolder) curFolderAbsPath = os.getcwd() #'/Users/crifan/dev/dev_root/company/xxx/projects/robotDemo/server' log.info("curFolderAbsPath=%s", curFolderAbsPath) audioTmpFolderFullPath = os.path.join(curFolderAbsPath, audioTmpFolder) log.info("audioTmpFolderFullPath=%s", audioTmpFolderFullPath) if not os.path.exists(audioTmpFolderFullPath): os.makedirs(audioTmpFolderFullPath) log.info("++++++ Created tmp audio folder: %s", audioTmpFolderFullPath) gTempAudioFolder = audioTmpFolderFullPath log.info("gTempAudioFolder=%s", gTempAudioFolder) def initAudioSynthesis(): """ init audio synthesis related: init token :return: """ getBaiduToken() createAudioTempFolder() def getBaiduToken(): """get baidu token""" global app, log, gCurBaiduRespDict getBaiduTokenUrlTemplate = " https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client_secret=%s " getBaiduTokenUrl = getBaiduTokenUrlTemplate % (app.config["BAIDU_API_KEY"], app.config["BAIDU_SECRET_KEY"]) log.info("getBaiduTokenUrl=%s", getBaiduTokenUrl) # https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=xxxz&client_secret=xxxx resp = requests.get(getBaiduTokenUrl) log.info("resp=%s", resp) respJson = resp.json() log.info("respJson=%s", respJson) #{'access_token': '24.xxx.2592000.1528609320.282335-11192484', 'session_key': 'xx+I/xx+6KwgZmw==', 'scope': 'public audio_voice_assistant_get audio_tts_post wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test权限 vis-classify_flower lpq_开放 cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base', 'refresh_token': '25.xxx', 'session_secret': 'cxxx6e', 'expires_in': 2592000} if resp.status_code == 200: gCurBaiduRespDict = respJson log.info("get baidu token resp: %s", gCurBaiduRespDict) else: log.error("error while get baidu token: %s", respJson) #{'error': 'invalid_client', 'error_description': 'Client authentication failed'} #{'error': 'invalid_client', 'error_description': 'unknown client id'} #{'error': 'unsupported_grant_type', 'error_description': 'The authorization grant type is not supported'} def refreshBaiduToken(): """refresh baidu token when current token invalid""" global app, log, gCurBaiduRespDict if gCurBaiduRespDict: refreshBaiduTokenUrlTemplate = " https://openapi.baidu.com/oauth/2.0/token?grant_type=refresh_token&refresh_token=%s&client_id=%s&client_secret=%s " refreshBaiduTokenUrl = refreshBaiduTokenUrlTemplate % (gCurBaiduRespDict["refresh_token"], app.config["BAIDU_API_KEY"], app.config["BAIDU_SECRET_KEY"]) log.info("refreshBaiduTokenUrl=%s", refreshBaiduTokenUrl) # https://openapi.baidu.com/oauth/2.0/token?grant_type=refresh_token&refresh_token=25.1xxxx.xx.1841379583.282335-11192483&client_id=Sxxxxz&client_secret=47dxxxxa resp = requests.get(refreshBaiduTokenUrl) log.info("resp=%s", resp) respJson = resp.json() log.info("respJson=%s", respJson) if resp.status_code == 200: gCurBaiduRespDict = respJson log.info("Ok to refresh baidu token response: %s", gCurBaiduRespDict) else: log.error("error while refresh baidu token: %s", respJson) else: log.error("Can't refresh baidu token for previous not get token") def baiduText2Audio(unicodeText): """call baidu text2audio to generate mp3 audio from text""" global app, log, gCurBaiduRespDict log.info("baiduText2Audio: unicodeText=%s", unicodeText) isOk = False mp3BinData = None errNo = 0 errMsg = "Unknown error" if not gCurBaiduRespDict: errMsg = "Need get baidu token before call text2audio" return isOk, mp3BinData, errNo, errMsg utf8Text = unicodeText.encode("utf-8") log.info("utf8Text=%s", utf8Text) encodedUtf8Text = quote(unicodeText) log.info("encodedUtf8Text=%s", encodedUtf8Text) # http://ai.baidu.com/docs#/TTS-API/top tex = encodedUtf8Text #合成的文本,使用UTF-8编码。小于512个中文字或者英文数字。(文本在百度服务器内转换为GBK后,长度必须小于1024字节) tok = gCurBaiduRespDict["access_token"] #开放平台获取到的开发者access_token(见上面的“鉴权认证机制”段落) cuid = app.config["FLASK_APP_NAME"] #用户唯一标识,用来区分用户,计算UV值。建议填写能区分用户的机器 MAC 地址或 IMEI 码,长度为60字符以内 ctp = 1 #客户端类型选择,web端填写固定值1 lan = "zh" #固定值zh。语言选择,目前只有中英文混合模式,填写固定值zh spd = 5 #语速,取值0-9,默认为5中语速 pit = 5 #音调,取值0-9,默认为5中语调 # vol = 5 #音量,取值0-9,默认为5中音量 vol = 9 per = 0 #发音人选择, 0为普通女声,1为普通男生,3为情感合成-度逍遥,4为情感合成-度丫丫,默认为普通女声 getBaiduSynthesizedAudioTemplate = " http://tsn.baidu.com/text2audio?lan=%s&ctp=%s&cuid=%s&tok=%s&vol=%s&per=%s&spd=%s&pit=%s&tex=%s " getBaiduSynthesizedAudioUrl = getBaiduSynthesizedAudioTemplate % (lan, ctp, cuid, tok, vol, per, spd, pit, tex) log.info("getBaiduSynthesizedAudioUrl=%s", getBaiduSynthesizedAudioUrl) # http://tsn.baidu.com/text2audio?lan=zh&ctp=1&cuid=RobotQA&tok=24.5f056b15e9d5da63256bac89f64f61b5.2592000.1528609737.282335-11192483&vol=5&per=0&spd=5&pit=5&tex=as%20a%20book-collector%2C%20i%20have%20the%20story%20you%20just%20want%20to%20listen%21 resp = requests.get(getBaiduSynthesizedAudioUrl) log.info("resp=%s", resp) respContentType = resp.headers["Content-Type"] respContentTypeLowercase = respContentType.lower() #'audio/mp3' log.info("respContentTypeLowercase=%s", respContentTypeLowercase) if respContentTypeLowercase == "audio/mp3": mp3BinData = resp.content log.info("resp content is binary data of mp3, length=%d", len(mp3BinData)) isOk = True errMsg = "" elif respContentTypeLowercase == "application/json": """ { 'err_detail': 'Invalid params per or lan!', 'err_msg': 'parameter error.', 'err_no': 501, 'err_subcode': 50000, 'tts_logid': 642798357 } { 'err_detail': 'Invalid params per&pdt!', 'err_msg': 'parameter error.', 'err_no': 501, 'err_subcode': 50000, 'tts_logid': 1675521246 } { 'err_detail': 'Access token invalid or no longer valid', 'err_msg': 'authentication failed.', 'err_no': 502, 'err_subcode': 50004, 'tts_logid': 4221215043 } """ log.info("resp content is json -> occur error") isOk = False respDict = resp.json() log.info("respDict=%s", respDict) errNo = respDict["err_no"] errMsg = respDict["err_msg"] + " " + respDict["err_detail"] else: isOk = False errMsg = "Unexpected response content-type: %s" % respContentTypeLowercase return isOk, mp3BinData, errNo, errMsg def doAudioSynthesis(unicodeText): """ do audio synthesis from unicode text if failed for token invalid/expired, will refresh token to do one more retry """ global app, log, gCurBaiduRespDict isOk = False audioBinData = None errMsg = "" # # for debug # gCurBaiduRespDict["access_token"] = "99.569b3b5b470938a522ce60d2e2ea2506.2592000.1528015602.282335-11192483" log.info("doAudioSynthesis: unicodeText=%s", unicodeText) isOk, audioBinData, errNo, errMsg = baiduText2Audio(unicodeText) log.info("isOk=%s, errNo=%d, errMsg=%s", isOk, errNo, errMsg) if isOk: errMsg = "" log.info("got synthesized audio binary data length=%d", len(audioBinData)) else: if errNo == BAIDU_ERR_TOKEN_INVALID: log.warning("Token invalid -> refresh token") refreshBaiduToken() isOk, audioBinData, errNo, errMsg = baiduText2Audio(unicodeText) log.info("after refresh token: isOk=%ss, errNo=%s, errMsg=%s", isOk, errNo, errMsg) else: log.warning("try synthesized audio occur error: errNo=%d, errMsg=%s", errNo, errMsg) audioBinData = None log.info("return isOk=%s, errMsg=%s", isOk, errMsg) if audioBinData: log.info("audio binary bytes=%d", len(audioBinData)) return isOk, audioBinData, errMsg def testAudioSynthesis(): global app, log, gTempAudioFolder testInputUnicodeText = u"as a book-collector, i have the story you just want to listen!" isOk, audioBinData, errMsg = doAudioSynthesis(testInputUnicodeText) if isOk: audioBinDataLen = len(audioBinData) log.info("Now will save audio binary data %d bytes to file", audioBinDataLen) # 1. save mp3 binary data into tmp file newUuid = generateUUID() log.info("newUuid=%s", newUuid) tempFilename = newUuid + ".mp3" log.info("tempFilename=%s", tempFilename) if not gTempAudioFolder: createAudioTempFolder() tempAudioFullname = os.path.join(gTempAudioFolder, tempFilename) #'/Users/crifan/dev/dev_root/company/xxx/projects/robotDemo/server/tmp/audio/2aba73d1-f8d0-4302-9dd3-d1dbfad44458.mp3' log.info("tempAudioFullname=%s", tempAudioFullname) with open(tempAudioFullname, 'wb') as tmpAudioFp: log.info("tmpAudioFp=%s", tmpAudioFp) tmpAudioFp.write(audioBinData) tmpAudioFp.close() log.info("Done to write audio data into file of %d bytes", audioBinDataLen) # 2. use celery to delay delete tmp file else: log.warning("Fail to get synthesis audio for errMsg=%s", errMsg) #---------------------------------------- # Flask API #---------------------------------------- def sendFile(fileBytes, contentType, outputFilename): """Flask API use this to send out file (to browser, browser can directly download file)""" return send_file( io.BytesIO(fileBytes), # io.BytesIO(fileObj.read()), mimetype=contentType, as_attachment=True, attachment_filename=outputFilename ) ################################################################################ # Global Init App ################################################################################ app = Flask(__name__) CORS(app) # app.config.from_object('config.DevelopmentConfig') app.config.from_object('config.ProductionConfig') logFormatterStr = app.config["LOG_FORMAT"] logFormatter = logging.Formatter(logFormatterStr) fileHandler = RotatingFileHandler( app.config['LOG_FILE_FILENAME'], maxBytes=app.config["LOF_FILE_MAX_BYTES"], backupCount=app.config["LOF_FILE_BACKUP_COUNT"], encoding="UTF-8") fileHandler.setLevel(logging.DEBUG) fileHandler.setFormatter(logFormatter) app.logger.addHandler(fileHandler) app.logger.setLevel(logging.DEBUG) # set root log level log = app.logger log.info("app=%s", app) # log.debug("app.config=%s", app.config) api = Api(app) log.info("api=%s", api) celeryApp = Celery(app.name, broker=app.config['CELERY_BROKER_URL']) celeryApp.conf.update(app.config) log.info("celeryApp=%s", celeryApp) aiContext = Context() log.info("aiContext=%s", aiContext) initAudioSynthesis() # testAudioSynthesis() ... #---------------------------------------- # Celery tasks #---------------------------------------- # @celeryApp.task() @celeryApp.task # @celeryApp.task(name=app.config["CELERY_TASK_NAME"] + ".deleteTmpAudioFile") def deleteTmpAudioFile(filename): """ delete tmp audio file from filename eg: 98fc7c46-7aa0-4dd7-aa9d-89fdf516abd6.mp3 """ global log log.info("deleteTmpAudioFile: filename=%s", filename) audioTmpFolder = app.config["AUDIO_TEMP_FOLDER"] # audioTmpFolder = "tmp/audio" log.info("audioTmpFolder=%s", audioTmpFolder) curFolderAbsPath = os.getcwd() #'/Users/crifan/dev/dev_root/company/xxx/projects/robotDemo/server' log.info("curFolderAbsPath=%s", curFolderAbsPath) audioTmpFolderFullPath = os.path.join(curFolderAbsPath, audioTmpFolder) log.info("audioTmpFolderFullPath=%s", audioTmpFolderFullPath) tempAudioFullname = os.path.join(audioTmpFolderFullPath, filename) #'/Users/crifan/dev/dev_root/company/xxx/projects/robotDemo/server/tmp/audio/2aba73d1-f8d0-4302-9dd3-d1dbfad44458.mp3' if os.path.isfile(tempAudioFullname): os.remove(tempAudioFullname) log.info("Ok to delete file %s", tempAudioFullname) else: log.warning("No need to remove for not exist file %s", tempAudioFullname) # log.info("deleteTmpAudioFile=%s", deleteTmpAudioFile) # log.info("deleteTmpAudioFile.name=%s", deleteTmpAudioFile.name) # log.info("celeryApp.tasks=%s", celeryApp.tasks) #---------------------------------------- # Rest API #---------------------------------------- class RobotQaAPI(Resource): def processResponse(self, respDict): """ process response dict before return generate audio for response text part """ global log, gTempAudioFolder tmpAudioUrl = "" unicodeText = respDict["data"]["response"]["text"] log.info("unicodeText=%s") if not unicodeText: log.info("No response text to do audio synthesis") return jsonify(respDict) isOk, audioBinData, errMsg = doAudioSynthesis(unicodeText) if isOk: audioBinDataLen = len(audioBinData) log.info("audioBinDataLen=%s", audioBinDataLen) # 1. save mp3 binary data into tmp file newUuid = generateUUID() log.info("newUuid=%s", newUuid) tempFilename = newUuid + ".mp3" log.info("tempFilename=%s", tempFilename) if not gTempAudioFolder: createAudioTempFolder() tempAudioFullname = os.path.join(gTempAudioFolder, tempFilename) log.info("tempAudioFullname=%s", tempAudioFullname) # 'xxx/tmp/audio/2aba73d1-f8d0-4302-9dd3-d1dbfad44458.mp3' with open(tempAudioFullname, 'wb') as tmpAudioFp: log.info("tmpAudioFp=%s", tmpAudioFp) tmpAudioFp.write(audioBinData) tmpAudioFp.close() log.info("Saved %d bytes data into temp audio file %s", audioBinDataLen, tempAudioFullname) # 2. use celery to delay delete tmp file delayTimeToDelete = app.config["CELERY_DELETE_TMP_AUDIO_FILE_DELAY"] deleteTmpAudioFile.apply_async([tempFilename], countdown=delayTimeToDelete) log.info("Delay %s seconds to delete %s", delayTimeToDelete, tempFilename) # generate temp audio file url # /tmp/audio tmpAudioUrl = "http://%s:%d/tmp/audio/%s" % ( app.config["FILE_URL_HOST"], app.config["FLASK_PORT"], tempFilename) log.info("tmpAudioUrl=%s", tmpAudioUrl) respDict["data"]["response"]["audioUrl"] = tmpAudioUrl else: log.warning("Fail to get synthesis audio for errMsg=%s", errMsg) log.info("respDict=%s", respDict) return jsonify(respDict) def get(self): respDict = { "code": 200, "message": "generate response ok", "data": { "input": "", "response": { "text": "", "audioUrl": "" }, "control": "", "audio": {} } } parser = reqparse.RequestParser() # i want to hear the story of Baby Sister Says No parser.add_argument('input', type=str, help="input words") log.info("parser=%s", parser) parsedArgs = parser.parse_args() # log.info("parsedArgs=%s", parsedArgs) if not parsedArgs: respDict["data"]["response"]["text"] = "Can not recognize input" return self.processResponse(respDict) inputStr = parsedArgs["input"] log.info("inputStr=%s", inputStr) if not inputStr: respDict["data"]["response"]["text"] = "Can not recognize parameter input" return self.processResponse(respDict) respDict["data"]["input"] = inputStr aiResult = QueryAnalyse(inputStr, aiContext) log.info("aiResult=%s", aiResult) if aiResult["response"]: respDict["data"]["response"]["text"] = aiResult["response"] if aiResult["control"]: respDict["data"]["control"] = aiResult["control"] log.info('respDict["data"]=%s', respDict["data"]) audioFileIdStr = aiResult["mediaId"] log.info("audioFileIdStr=%s", audioFileIdStr) if audioFileIdStr: audioFileObjectId = ObjectId(audioFileIdStr) log.info("audioFileObjectId=%s", audioFileObjectId) if fsCollection.exists(audioFileObjectId): audioFileObj = fsCollection.get(audioFileObjectId) log.info("audioFileObj=%s", audioFileObj) encodedFilename = quote(audioFileObj.filename) log.info("encodedFilename=%s", encodedFilename) respDict["data"]["audio"] = { "contentType": audioFileObj.contentType, "name": audioFileObj.filename, "size": audioFileObj.length, "url": "http://%s:%d/files/%s/%s" % (app.config["FILE_URL_HOST"], app.config["FLASK_PORT"], audioFileObj._id, encodedFilename) } log.info("respDict=%s", respDict) return self.processResponse(respDict) else: log.info("Can not find file from id %s", audioFileIdStr) respDict["data"]["audio"] = {} return self.processResponse(respDict) else: log.info("Not response file id") respDict["data"]["audio"] = {} return self.processResponse(respDict) class GridfsAPI(Resource): def get(self, fileId, fileName=None): log.info("fileId=%s, file_name=%s", fileId, fileName) fileIdObj = ObjectId(fileId) log.info("fileIdObj=%s", fileIdObj) if not fsCollection.exists({"_id": fileIdObj}): respDict = { "code": 404, "message": "Can not find file from object id %s" % (fileId), "data": {} } return jsonify(respDict) fileObj = fsCollection.get(fileIdObj) log.info("fileObj=%s, filename=%s, chunkSize=%s, length=%s, contentType=%s", fileObj, fileObj.filename, fileObj.chunk_size, fileObj.length, fileObj.content_type) log.info("lengthInMB=%.2f MB", float(fileObj.length / (1024 * 1024))) fileBytes = fileObj.read() log.info("len(fileBytes)=%s", len(fileBytes)) outputFilename = fileObj.filename if fileName: outputFilename = fileName log.info("outputFilename=%s", outputFilename) return sendFile(fileBytes, fileObj.content_type, outputFilename) class TmpAudioAPI(Resource): def get(self, filename=None): global gTempAudioFolder log.info("TmpAudioAPI: filename=%s", filename) tmpAudioFullPath = os.path.join(gTempAudioFolder, filename) log.info("tmpAudioFullPath=%s", tmpAudioFullPath) if not os.path.isfile(tmpAudioFullPath): log.warning("Not exists file %s", tmpAudioFullPath) respDict = { "code": 404, "message": "Can not find temp audio file %s" % filename, "data": {} } return jsonify(respDict) fileSize = os.path.getsize(tmpAudioFullPath) log.info("fileSize=%s", fileSize) with open(tmpAudioFullPath, "rb") as tmpAudioFp: fileBytes = tmpAudioFp.read() log.info("read out fileBytes length=%s", len(fileBytes)) outputFilename = filename # contentType = "audio/mp3" # chrome use this contentType = "audio/mpeg" # most common and compatible return sendFile(fileBytes, contentType, outputFilename) api.add_resource(PlaySongAPI, '/playsong', endpoint='playsong') api.add_resource(RobotQaAPI, '/qa', endpoint='qa') api.add_resource(GridfsAPI, '/files/<fileId>', '/files/<fileId>/<fileName>', endpoint='gridfs') api.add_resource(TmpAudioAPI, '/tmp/audio/<filename>', endpoint='TmpAudio') if __name__ == "__main__": app.run( host=app.config["FLASK_HOST"], port=app.config["FLASK_PORT"], debug=app.config["DEBUG"] )
config.py
class BaseConfig(object): DEBUG = False FLASK_PORT = 3xxxx # FLASK_HOST = "127.0.0.1" # FLASK_HOST = "localhost" # Note: # 1. to allow external access this server # 2. make sure here gunicorn parameter "bind" is same with here !!! FLASK_HOST = "0.0.0.0" # Flask app name FLASK_APP_NAME = "RobotQA" # Log File LOG_FILE_FILENAME = "logs/" + FLASK_APP_NAME + ".log" LOG_FORMAT = "[%(asctime)s %(levelname)s %(filename)s:%(lineno)d %(funcName)s] %(message)s" LOF_FILE_MAX_BYTES = 2*1024*1024 LOF_FILE_BACKUP_COUNT = 10 # reuturn file url's host # FILE_URL_HOST = FLASK_HOST FILE_URL_HOST = "127.0.0.1" # Audio Synthesis / TTS # BAIDU_APP_ID = "1xxx3" BAIDU_API_KEY = "Sxxxxz" BAIDU_SECRET_KEY = "4xxxxxa" AUDIO_TEMP_FOLDER = "tmp/audio" # CELERY_TASK_NAME = "Celery_" + FLASK_APP_NAME # CELERY_BROKER_URL = " redis://localhost " CELERY_BROKER_URL = " redis://localhost:6379/0 " # CELERY_RESULT_BACKEND = " redis://localhost:6379/0 " # current not use result CELERY_DELETE_TMP_AUDIO_FILE_DELAY = 60 * 2 # two minutes class DevelopmentConfig(BaseConfig): # DEBUG = True # for local dev, need access remote mongodb MONGODB_HOST = "47.xx.xx.xx" FILE_URL_HOST = "127.0.0.1" class ProductionConfig(BaseConfig): FILE_URL_HOST = "47.xx.xx.xx"
前端:
main.html
<!doctype html> <html lang="en"> <head> <!-- Required meta tags --> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> --> <!-- Bootstrap CSS --> <link rel="stylesheet" href="css/bootstrap-3.3.1/bootstrap.css"> <!-- <link rel="stylesheet" href="css/highlightjs_default.css"> --> <link rel="stylesheet" href="css/highlight_atom-one-dark.css"> <!-- <link rel="stylesheet" href="css/highlight_monokai-sublime.css"> --> <link rel="stylesheet" href="css/bootstrap3_player.css"> <link rel="stylesheet" href="css/main.css"> <title>xxx英语智能机器人演示</title> <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> <!--[if lt IE 9]> <script src=" https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js "></script> <script src=" https://oss.maxcdn.com/respond/1.4.2/respond.min.js "></script> <![endif]--> </head> <div class="logo text-center"> <img class="mb-4" src="img/logo_transparent_183x160.png" alt="xxx Logo" width="72" height="72"> </div> <h2>xxx英语智能机器人</h2> <h4>xxx Bot for Kids</h4> <div class="panel panel-primary"> <div class="panel-heading"> <h3 class="panel-title">Input</h3> </div> <div class="panel-body"> <ul class="list-group"> <li class="list-group-item"> <h3 class="panel-title">Input Example</h3> <ul> <li>i want to hear the story of apple</li> <li>say story apple</li> <li>say apple</li> <li>next episode</li> <li>next</li> <li>i want you stop reading</li> <li>stop reading</li> <li>please go on</li> <li>go on</li> </ul> </li> <li class="list-group-item"> <!-- <form> <div class="form-group input_request"> <input id="inputRequest" type="text" class="form-control" placeholder="请输入您要说的话" value="i want to hear the story of apple"> </div> <div class="form-group"> <button id="submitInput" type="submit" class="btn btn-primary btn-lg col-sm-3 btn-block">提交</button> <button id="clearInput" class="btn btn-secondary btn-lg col-sm-3" type="button">清除</button> <button id="clearInput" class="btn btn-info btn-lg col-sm-3 btn-block" type="button">清除</button> </div> </form> --> <div class="row"> <div class="col-lg-12"> <div class="input-group"> <input id="inputRequest" type="text" class="form-control" placeholder="请输入您要说的话" value="say apple"> <span class="input-group-btn"> <button id="submitInput" type="submit" class="btn btn-primary" type="button">提交</button> </span> </div><!-- /input-group --> </div><!-- /.col-lg-6 --> </div> </li> </ul> </div> </div> <!-- <div class="input_example bg-light box-shadow"> <h5>Input Example:</h5> <ul> <li>i want to hear the story of apple</li> <li>next episode</li> <li>i want you stop reading</li> <li>please go on</li> </ul> </div> --> <div class="panel panel-success"> <div class="panel-heading"> <h3 class="panel-title">Output</h3> </div> <div class="panel-body"> <div id="response_text" class="alert alert-success" role="alert"> <p>here will output response text</p> <div class="response_text_audio_player"> <audio controls data-info-att="response text's audio"> <source src="" type="audio/mpeg" /> </audio> </div> </div> <div class="audio_player col-md-12 col-xs-12"> <audio controls data-info-att=""> <source src="" type="" /> </audio> </div> <!-- <div id="audio_play_prevented" class="alert alert-warning alert-dismissible col-md-12 col-xs-12"> <button type = "button" class="close" data-dismiss = "alert">x</button> <strong>Notice:</strong> Auto play prevented, please mannually click above play button to play </div> --> <!-- <div id="response_json" class="bg-light box-shadow"> <pre><code class="json">here will output response</code></pre> </div> --> <!-- <pre id="response_json"> <code class="json">here will output response</code> </pre> --> <div id="response_json"> <code class="json">here will output response</code> </div> </div> </div> <!-- Optional JavaScript --> <!-- jQuery first, then Popper.js, then Bootstrap JS --> <!-- <script src="js/jquery-3.3.1.js"></script> --> <!-- <script src=" https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js "></script> --> <script src="js/jquery/1.11.1/jquery-1.11.1.js"></script> <!-- <script src=" https://code.jquery.com/jquery-3.3.1.slim.min.js " integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script> --> <!-- <script src=" https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.0/umd/popper.min.js " integrity="sha384-cs/chFZiN24E4KMATLdqdvsezGxaGsi4hLGOzlXwp5UZB1LY//20VyM2taTB4QvJ" crossorigin="anonymous"></script> --> <script src="js/popper-1.14.0/popper.min.js"></script> <!-- <script src="js/bootstrap.js"></script> --> <script src="js/bootstrap-3.3.1/bootstrap.min.js"></script> <script src="js/highlight.js"></script> <script src="js/bootstrap3_player.js"></script> <script src="js/main.js"></script> </body> </html>
main.css
.logo{ padding: 10px 2%; } h2{ text-align: center; margin-top: 10px; margin-bottom: 10px; } h4{ text-align: center; margin-top: 0px; margin-bottom: 20px; } form { text-align: center; } .form-group { /*padding-left: 1%;*/ /*padding-right: 1%;*/ } .input_example { /*padding: 1px 1%;*/ } #response_json { /*width: 96%;*/ height: 380px; border-radius: 10px; padding-top: 20px; /*padding-left: 1%;*/ /*padding-right: 1%;*/ } #response_text { text-align: center !important; font-size: 14px; /* padding-left: 4%; padding-right: 4%; */ } /*pre {*/ /*padding-left: 2%;*/ /*padding-right: 2%;*/ /*}*/ .audio_player { margin-top: 10px; margin-bottom: 5px; text-align: center; padding-left: 0 !important; padding-right: 0 !important; } .response_text_audio_player{ /* visibility: hidden; */ width: 100%; /* height: 1px !important; */ height: 100px; } /* #audio_play_prevented { display: none; } */
main.js
if (!String.format) { String.format = function(format) { var args = Array.prototype.slice.call(arguments, 1); return format.replace(/{(\d+)}/g, function(match, number) { return typeof args[number] != 'undefined' ? args[number] : match ; }); }; } $(document).ready(function(){ $('[data-toggle="tooltip"]').tooltip(); // when got response json, update to highlight it function updateHighlight() { console.log("updateHighlight"); $('pre code').each(function(i, block) { hljs.highlightBlock(block); }); } updateHighlight(); $("#submitInput").click(function(event){ event.preventDefault(); ajaxSubmitInput(); }); function ajaxSubmitInput() { console.log("ajaxSubmitInput"); var inputRequest = $("#inputRequest").val(); console.log("inputRequest=%s", inputRequest); var encodedInputRequest = encodeURIComponent(inputRequest) console.log("encodedInputRequest=%s", encodedInputRequest); // var qaUrl = "http://127.0.0.1:32851/qa"; var qaUrl = "http://xxx:32851/qa"; console.log("qaUrl=%s", qaUrl); var fullQaUrl = qaUrl + "?input=" + encodedInputRequest console.log("fullQaUrl=%s", fullQaUrl); $.ajax({ type : "GET", url : fullQaUrl, success: function(respJsonObj){ console.log("respJsonObj=%o", respJsonObj); // var respnJsonStr = JSON.stringify(respJsonObj); //var beautifiedJespnJsonStr = JSON.stringify(respJsonObj, null, '\t'); var beautifiedJespnJsonStr = JSON.stringify(respJsonObj, null, 2); console.log("beautifiedJespnJsonStr=%s", beautifiedJespnJsonStr); var prevOutputValue = $('#response_json').text(); console.log("prevOutputValue=%o", prevOutputValue); var afterOutputValue = $('#response_json').html('<pre><code class="json">' + beautifiedJespnJsonStr + "</code></pre>"); console.log("afterOutputValue=%o", afterOutputValue); updateHighlight(); var curResponseDict = respJsonObj["data"]["response"]; console.log("curResponseDict=%s", curResponseDict); var curResponseText = curResponseDict["text"]; console.log("curResponseText=%s", curResponseText); $('#response_text p').text(curResponseText); var curResponseAudioUrl = curResponseDict["audioUrl"]; console.log("curResponseAudioUrl=%s", curResponseAudioUrl); if (curResponseAudioUrl) { console.log("now play the response text's audio %s", curResponseAudioUrl); var respTextAudioObj = $(".response_text_audio_player audio")[0]; console.log("respTextAudioObj=%o", respTextAudioObj); $(".response_text_audio_player .col-sm-offset-1").text(curResponseText); $(".response_text_audio_player audio source").attr("src", curResponseAudioUrl); respTextAudioObj.load(); console.log("has load respTextAudioObj=%o", respTextAudioObj); respTextAudioObj.onended = function() { console.log("play response text's audio ended"); var dataControl = respJsonObj["data"]["control"]; console.log("dataControl=%o", dataControl); var audioElt = $(".audio_player audio"); console.log("audioElt=%o", audioElt); var audioObject = audioElt[0]; console.log("audioObject=%o", audioObject); var playAudioPromise = undefined; if (dataControl === "stop") { //audioObject.stop(); audioObject.pause(); console.log("has pause audioObject=%o", audioObject); } else if (dataControl === "continue") { // // audioObject.load(); // audioObject.play(); // // audioObject.continue(); // console.log("has load and play audioObject=%o", audioObject); playAudioPromise = audioObject.play(); } if (respJsonObj["data"]["audio"]) { var audioDict = respJsonObj["data"]["audio"]; console.log("audioDict=%o", audioDict); var audioName = audioDict["name"]; console.log("audioName=%o", audioName); var audioSize = audioDict["size"]; console.log("audioSize=%o", audioSize); var audioType = audioDict["contentType"]; console.log("audioType=%o", audioType); var audioUrl = audioDict["url"]; console.log("audioUrl=%o", audioUrl); var isAudioEmpty = (!audioName && !audioSize && !audioType && !audioUrl) console.log("isAudioEmpty=%o", isAudioEmpty); if (isAudioEmpty) { // var pauseAudioResult = audioObject.pause(); // console.log("pauseAudioResult=%o", pauseAudioResult); // audioElt.attr("data-info-att", ""); // $(".col-sm-offset-1").text(""); } else { if (audioName) { audioElt.attr("data-info-att", audioName); $(".audio_player .col-sm-offset-1").text(audioName); } if (audioType) { $(".audio_player audio source").attr("type", audioType); } if (audioUrl) { $(".audio_player audio source").attr("src", audioUrl); audioObject.load(); console.log("has load audioObject=%o", audioObject); } console.log("dataControl=%s,audioUrl=%s", dataControl, audioUrl); if ((dataControl === "") && audioUrl) { playAudioPromise = audioObject.play(); } else if ((dataControl === "next") && (audioUrl)) { playAudioPromise = audioObject.play(); } } } else { console.log("empty respJsonObj['data']['audio']=%o", respJsonObj["data"]["audio"]); } if (playAudioPromise !== undefined) { playAudioPromise.then(() => { // Auto-play started console.log("Auto paly audio started, playAudioPromise=%o", playAudioPromise); //for debug // showAudioPlayPreventedNotice(); }).catch(error => { // Auto-play was prevented // Show a UI element to let the user manually start playback showAudioPlayPreventedNotice(); console.error("play audio promise error=%o", error); //NotAllowedError: The request is not allowed by the user agent or the platform in the current context, possibly because the user denied permission. }); } } respTextAudioPromise = respTextAudioObj.play(); // console.log("respTextAudioPromise=%o", respTextAudioPromise); if (respTextAudioPromise !== undefined) { respTextAudioPromise.then(() => { // Auto-play started console.log("Auto paly audio started, respTextAudioPromise=%o", respTextAudioPromise); }).catch(error => { // Auto-play was prevented // Show a UI element to let the user manually start playback console.error("play response text's audio promise error=%o", error); //NotAllowedError: The request is not allowed by the user agent or the platform in the current context, possibly because the user denied permission. }); } } }, error : function(jqXHR, textStatus, errorThrown) { console.error("jqXHR=%o, textStatus=%s, errorThrown=%s", jqXHR, textStatus, errorThrown); var errDetail = String.format("status={0}\n\tstatusText={1}\n\tresponseText={2}", jqXHR.status, jqXHR.statusText, jqXHR.responseText); var errStr = String.format("GET: {0}\nERROR:\t{1}", fullQaUrl, errDetail); // $('#response_text p').text(errStr); var responseError = $('#response_json').html('<pre><code class="html">' + errStr + "</code></pre>"); console.log("responseError=%o", responseError); updateHighlight(); } }); } function showAudioPlayPreventedNotice(){ console.log("showAudioPlayPreventedNotice"); // var prevDisplayValue = $("#audio_play_prevented").css("display"); // console.log("prevDisplayValue=%o", prevDisplayValue); // $("#audio_play_prevented").css({"display":"block"}); var curAudioPlayPreventedNoticeEltHtml = $("#audio_play_prevented").html(); console.log("curAudioPlayPreventedNoticeEltHtml=%o", curAudioPlayPreventedNoticeEltHtml); if (curAudioPlayPreventedNoticeEltHtml !== undefined) { console.log("already exist audio play prevented notice, so not insert again"); } else { var audioPlayPreventedNoticeHtml = '<div id="audio_play_prevented" class="alert alert-warning alert-dismissible col-md-12 col-xs-12"><button type = "button" class="close" data-dismiss = "alert">x</button><strong>Notice:</strong> Auto play prevented, please mannually click above play button to play</div>'; console.log("audioPlayPreventedNoticeHtml=%o", audioPlayPreventedNoticeHtml); $(".audio_player").append(audioPlayPreventedNoticeHtml); } } $("#clearInput").click(function(event){ // event.preventDefault(); console.log("event=%o", event); $('#inputRequest').val(""); $('#response_json').html('<pre><code class="json">here will output response</code></pre>'); updateHighlight(); }); });
效果:
点击提交后,后端生成临时的mp3的文件,返回到前端,前端可以正常加载并播放:
转载请注明:在路上 » 【已解决】用和适在线的语言合成接口把文字转语音