【已解决】把文本格式的剧本内容用Python批量导入后台系统

需要把文本格式的：

已编写的对话剧本

去用Python脚本处理，实现批量导入到系统中

而原本是手动的录入到系统中的：

所以去搞清楚

调用了后台的接口是：

1. Request URL:

http://localhost:65000/api/v1/scripts/

2. Request Method: POST

1. age_end:5

2. age_start:4

3. dialogs:[{type: "0", speaker: "A", content: "hello"}, {type: "0", speaker: "B", content: "hi "},…]

1. 0:{type: "0", speaker: "A", content: "hello"}

2. 1:{type: "0", speaker: "B", content: "hi "}

3. 2:{type: "1", audio_file_id: "5b21c7887f4d384d04535fe2", audio_file_name: "The Tunnel.mp3"}

4. operate_mark:"save"

5. place:"office 0716-1"

6. second_level_topic:"sleepy stories"

7. title:"title 0716-1"

8. topic:"bedtime"

对应web端源码是：

后台源码是：

然后现在需要：

先去用Python处理识别出来具体的每个script的内容

然后再去调用接口去POST创建。

【已解决】Python 3中用正则匹配多段的脚本内容

然后接着去：

用数据结构dict去保存起来，再去调用后台API，且POST之前看来也要去获得token

然后期间需要：

【已解决】python中判断单个或多个单词是否是全部小写或首字母小写

为了能够调用后台系统的api，然后接着需要去搞清楚如何用户登录和传递JWT的token：

【已解决】Python中如何模拟用户登录和传递JWT的token以获取访问API的权限

然后调用保存接口时出错：

【已解决】Python调用接口出错：TypeError string indices must be integers

接着又出现其他错误：

【已解决】Python接口返回400错误：{‘detail’: ‘JSON parse error – Expecting value: line 1 column 1 (char 0)’}

然后接着要去解决：

【已解决】Django后台去重置用户的加了密的密码

【总结】

最后用如下代码：

#!/usr/bin/env python

# -*- encoding: utf-8 -*-

# Created on 2018-07-11 14:12:12

# Project: BatchImportScript

# Function: Batch process witten script content from txt into Naturling CMS system

# Author: Crifan Li

# Updated: 20180717

import os

import re

import logging

import requests

import sys

currentPath = os.path.split(os.path.realpath(__file__))[0]

crifanlibPath = os.path.abspath(os.path.join(currentPath, "crifanLib"))

sys.path.append(crifanlibPath)

from crifanLib import crifanMysql, crifanFile, crifanLogging

################################################################################

# Global Config

################################################################################

# is online/production or local debug environment

IsOnline = True

################################################################################

# Global Settings / Variables

################################################################################

ScriptFolder = "input"

ScriptFilenameList = [ "person_1.txt", "person_2.txt"]

# # for debug

# ScriptFilenameList = ["person_2.txt", "person_1.txt"]

MysqlConfigDict_Local = {

‘host’: "localhost",

‘port’: 3306,

‘user’: "root",

‘password’: "crifan_mysql",

‘db’: "naturling",

‘charset’: "utf8",

}

MysqlConfigDict_Online = {

‘host’: "xxxxxx",

‘port’: 3306,

‘user’: "root",

‘password’: "pwd",

‘db’: "naturling",

‘charset’: "utf8",

}

MysqlConfigDict = None

if IsOnline:

MysqlConfigDict = MysqlConfigDict_Online

else:

MysqlConfigDict = MysqlConfigDict_Local

gTopicDict = {}

UserInfoDict = {

"wuman" : {

"username": "user1",

"password": "pwd1",

"zhouyue" : {

"username": "user2",

"password": "pwd2",

}

ApiHost = None

if IsOnline:

ApiHost = "http://x.x.x.x"

else:

ApiHost = "http://localhost"

Port = 65000

ApiVersion = "/api/v1"

ApiPrefix = ApiHost + ":" + str(Port) + ApiVersion # http://localhost:65000/api/v1/

GetJwtTokenUrl = ApiPrefix + "/jwt-token-auth/" # http://localhost:65000/api/v1/jwt-token-auth/

CreateScriptUrl = ApiPrefix + "/scripts/" # http://localhost:65000/api/v1/scripts/

gJwtToken = ""

gHeaders = {

‘Content-Type’: ‘application/json; charset=utf-8’,

"Accept": ‘application/json’,

"Authorization": "",

}

################################################################################

# Functions

################################################################################

def updateTokenAndHeader(userInfo):

global gJwtToken, gHeaders

postBody = {

"username": userInfo["username"],

"password": userInfo["password"],

}

logging.info("GetJwtTokenUrl=%s, postBody=%s", GetJwtTokenUrl, postBody)

getTokenResp = requests.post(GetJwtTokenUrl, data=postBody)

logging.info("getTokenResp=%s", getTokenResp)

respJson = getTokenResp.json()

logging.info("respJson=%s", respJson)

gJwtToken = respJson["token"]

logging.info("gJwtToken=%s", gJwtToken)

if gJwtToken:

gHeaders["Authorization"] = "JWT " + gJwtToken

logging.info("gHeaders=%s", gHeaders)

def generateTopic(mysqlConn):

global gTopicDict

searchTopicLevel1Sql = """SELECT * from `keyword` WHERE type="sectorTopic" ORDER BY name"""

findTopicLevel1Ok, resultDict = mysqlConn.executeSql(searchTopicLevel1Sql)

logging.debug("%s -> %s, %s", searchTopicLevel1Sql, findTopicLevel1Ok, resultDict)

if not findTopicLevel1Ok:

logging.error("Fail to find level 1 topics")

return

topicLevel1ItemList = resultDict["data"]

for eachTopicLevel1Item in topicLevel1ItemList:

topicLevel1Id = eachTopicLevel1Item["id"]

topicLevel1Name = eachTopicLevel1Item["name"]

gTopicDict[topicLevel1Name] = {

"id": topicLevel1Id,

"children": {}

}

findRelationSql = """SELECT * from `keyword_rel` WHERE keyword1=%d""" % (topicLevel1Id)

findRelationOk, resultDict = mysqlConn.executeSql(findRelationSql)

logging.debug("%s -> %s, %s", findRelationSql, findRelationOk, resultDict)

if not findRelationOk:

logging.error("Failed to find topic level 1 and level 2 relation")

continue

level2TopicIdList = resultDict["data"]

for eachRelation in level2TopicIdList:

level2TopicId = eachRelation["keyword2"]

findTopicLevel2Sql = """SELECT * from `keyword` WHERE `id`=%d and `type`="%s" """ % (level2TopicId, "topic")

findTopicLevel2Ok, resultDict = mysqlConn.executeSql(findTopicLevel2Sql)

logging.debug("%s -> %s, %s", findTopicLevel2Sql, findTopicLevel2Ok, resultDict)

if findTopicLevel2Ok:

if resultDict["data"]:

level2Topic = resultDict["data"][0]

level2TopicName = level2Topic["name"]

gTopicDict[topicLevel1Name]["children"][level2TopicName] = {

"id": level2TopicId

}

else:

logging.error("Can not find topic id=%d", level2TopicId)

def findRealTopic(curTopic, topicList=[]):

isExisted, realTopicName = (False, "")

originTopic = curTopic

allLowcaseTopic = curTopic.lower()

capitalizedTopic = curTopic.capitalize()

titledTopic = curTopic.title()

logging.info("originTopic=%s,allLowcaseTopic=%s,capitalizedTopic=%s,titledTopic=%s",

originTopic, allLowcaseTopic, capitalizedTopic, titledTopic)

if originTopic in topicList:

isExisted = True

realTopicName = originTopic

elif allLowcaseTopic in topicList:

isExisted = True

realTopicName = allLowcaseTopic

elif capitalizedTopic in topicList:

isExisted = True

realTopicName = capitalizedTopic

elif titledTopic in topicList:

isExisted = True

realTopicName = titledTopic

else:

isExisted = False

realTopicName = ""

logging.error("Not found topic: %s", curTopic)

logging.info("topic: %s -> isExisted=%s, realTopicName=%s", curTopic, isExisted, realTopicName)

return isExisted, realTopicName

def checkTopic(curTopic, isChild=False, parentTopicName=""):

isExisted, realTopicName = (False, "")

level1TopicList = gTopicDict.keys()

if isChild and parentTopicName:

# is child topic

if parentTopicName in level1TopicList:

level1Topic = gTopicDict[parentTopicName]

childTopicDict = level1Topic["children"]

childTopicKeyList = childTopicDict.keys()

isExisted, realTopicName = findRealTopic(curTopic, childTopicKeyList)

else:

logging.error("Can not found parent topic %s for child topic %s", parentTopicName, curTopic)

else:

# is level 1=parent topic

isExisted, realTopicName = findRealTopic(curTopic, level1TopicList)

return isExisted, realTopicName

def saveScript(curScriptDict):

# {

# "place": "office 0716-1",

# "title": "title 0716-1",

# "topic": "bedtime",

# "second_level_topic": "sleepy stories",

# "age_start": 4,

# "age_end": 5,

# "operate_mark": "save",

# "dialogs": [{

# "type": "0",

# "speaker": "A",

# "content": "hello"

# }, {

# "type": "0",

# "speaker": "B",

# "content": "hi "

# }, {

# "type": "1",

# "audio_file_id": "5b21c7887f4d384d04535fe2",

# "audio_file_name": "The Tunnel.mp3"

# }]

# }

logging.debug("curScriptDict=%s", curScriptDict)

saveScriptResp = requests.post(CreateScriptUrl, headers=gHeaders, json=curScriptDict)

logging.info("saveScriptResp=%s", saveScriptResp)

if saveScriptResp.ok:

respJson = saveScriptResp.json()

logging.debug("respJson=%s", respJson)

logging.info("+++OK to create script: %s", respJson)

else:

logging.error("Fail to create script: %s", curScriptDict)

def processSingleScriptMatch(singleScriptMatch):

curScriptDict = {

"operate_mark": "save",

"place": "",

"title": "",

"age_start": -1,

"age_end": -1,

"topic": "",

"second_level_topic": "",

"dialogs": []

}

singleScript = singleScriptMatch.group("singleScript")

logging.debug("singleScript=%s", singleScript)

place = singleScriptMatch.group("place")

topic = singleScriptMatch.group("topic")

title = singleScriptMatch.group("title")

age = singleScriptMatch.group("age")

logging.debug("place=%s,topic=%s,title=%s,age=%s", place, topic, title, age)

place = place.strip()

title = title.strip()

curScriptDict["place"] = place

curScriptDict["title"] = title

topic = topic.strip()

age = age.strip()

ageMatch = re.search("(?P<ageStart>\d+)\s*-\s*(?P<ageEnd>\d+)", age)

if ageMatch:

ageStart = ageMatch.group("ageStart")

ageEnd = ageMatch.group("ageEnd")

ageStartInt = int(ageStart)

ageEndInt = int(ageEnd)

logging.debug("ageStartInt=%d,ageEndInt=%d", ageStartInt, ageEndInt)

curScriptDict["age_start"] = ageStartInt

curScriptDict["age_end"] = ageEndInt

else:

logging.error("!!! Can not recognize age format for: %s", age)

# Classroom layout-routine

# Food－vegetable

# topicMatch = re.search("(?P<topicLevel1>\w+)\s*(\-\s*(?P<topicLevel2>[\w/\s]+))?", topic)

# topicMatch = re.search("(?P<topicLevel1>[\w\s]+)\s*(\-\s*(?P<topicLevel2>[\w/\s]+))?", topic)

# topicMatch = re.search("(?P<topicLevel1>[\w\s]+)\s*([\-|－]\s*(?P<topicLevel2>[\w/\s]+))?", topic)

# sports – skiing

# topicMatch = re.search("(?P<topicLevel1>[\w\s]+)\s*([\-|－|–]\s*(?P<topicLevel2>[\w/\s]+))?", topic)

# Sports – tae-kwon-do

# topicMatch = re.search("(?P<topicLevel1>[\w\s]+)\s*([\-|－|–]\s*(?P<topicLevel2>[\w/\-\s]+))?", topic)

# Food – cakes & desserts

topicMatch = re.search("(?P<topicLevel1>[\w\s]+)\s*([\-|－|–]\s*(?P<topicLevel2>[\w/\-\&\s]+))?", topic)

if topicMatch:

topicLevel1 = topicMatch.group("topicLevel1")

topicLevel1 = topicLevel1.strip()

isExisted, realTopicLevel1 = checkTopic(topicLevel1)

if isExisted:

curScriptDict["topic"] = realTopicLevel1

topicLevel2 = topicMatch.group("topicLevel2")

if topicLevel2:

topicLevel2 = topicLevel2.strip()

isExisted, realTopicLevel2 = checkTopic(topicLevel2, isChild=True, parentTopicName=realTopicLevel1)

if isExisted:

curScriptDict["second_level_topic"] = realTopicLevel2

else:

logging.error("Can not find real child topic %s from parent %s", topicLevel2, realTopicLevel1)

else:

logging.warning("Not found topic level 2 from topic string: %s" % topic)

else:

logging.error("Can not find level1 topic: %s", topicLevel1)

else:

logging.error("!!! Can not recognize topic format for: %s" % topic)

content = singleScriptMatch.group("content")

logging.debug("content=%s", content)

singleDialogPattern = r"(?P<speaker>\w+):\s*(?P<sentence>[^\n]+)\n"

singleDialogMatchIterator = re.finditer(singleDialogPattern, content, flags=re.I | re.M | re.S)

for dialogIdx, eachDialog in enumerate(singleDialogMatchIterator):

dialogNum = dialogIdx + 1

logging.debug("[%d] eachDialog=%s", dialogNum, eachDialog)

speaker = eachDialog.group("speaker")

logging.debug("speaker=%s", speaker)

sentence = eachDialog.group("sentence")

logging.debug("sentence=%s", sentence)

DialogTypeText = "0"

# DialogTypeAudio = "1"

curDialog = {

"type": DialogTypeText,

"speaker": speaker,

"content": sentence

}

curScriptDict["dialogs"].append(curDialog)

logging.info("[%d] curScriptDict=%s", scriptNum, curScriptDict)

saveScript(curScriptDict)

################################################################################

# Main

################################################################################

# init logging

logFilename = crifanFile.getInputFileBasenameNoSuffix() + ".log"

crifanLogging.loggingInit(logFilename)

logging.info("Logging initialized to %s", logFilename)

mysqlConn = crifanMysql.MysqlDb(config=MysqlConfigDict)

logging.info("mysqlConn=%s", mysqlConn)

generateTopic(mysqlConn)

logging.info("gTopicDict=%s", gTopicDict)

curPath = os.getcwd()

for eachFilename in ScriptFilenameList:

eachFullFilePath = os.path.join(curPath, ScriptFolder, eachFilename)

logging.info("eachFullFilePath=%s", eachFullFilePath)

singleScriptPattern = ""

if eachFilename == "person_1.txt":

singleScriptPattern = r"(?P<singleScript>place:(?P<place>[^\n]+)\ntopic:(?P<topic>[^\n]+)\ntittle:(?P<title>[^\n]+)\nage:(?P<age>[^\n]+)\n(?P<content>.+?\n))\n+"

updateTokenAndHeader(UserInfoDict["wuman"])

elif eachFilename == "person_2.txt":

singleScriptPattern = r"(?P<singleScript>topic:(?P<topic>[^\n]+)\nplace:(?P<place>[^\n]+)\nage:(?P<age>[^\n]+)\ntitle:(?P<title>[^\n]+)\n(?P<content>.+?\n))\n+"

updateTokenAndHeader(UserInfoDict["zhouyue"])

with open(eachFullFilePath, "r") as fp:

allLine = fp.read()

# print("allLine=%s" % allLine)

allScriptMatchIterator = re.finditer(singleScriptPattern, allLine, flags=re.I | re.M | re.S)

logging.info("allScriptMatchIterator=%s", allScriptMatchIterator)

# if allScriptMatchIterator:

for scriptIdx, curScriptMatch in enumerate(allScriptMatchIterator):

scriptNum = scriptIdx + 1

logging.info("[%d] curScriptMatch=%s", scriptNum, curScriptMatch)

processSingleScriptMatch(curScriptMatch)

处理了：

Place: School canteen

Topic: food

Tittle:Have lunch

Age: 3-4

J: What did you have for lunch?

L: I ate rice, fish and bread.

J: Do you like rice?

L: Yes, I do.

J: Do you like fish?

L: Yes, I do.

J: Do you like bread?

L: No, I don’t.

J: What did you drink?

L: I drank milk.

J: Do you like milk?

L: Yes, I do.

Place: home

Topic: house

Tittle: Doing housework

Age: 4-5

J: Do you like cooking, mom?

M: Yes, I do a lot. What about you?

J: Mom, you know me. I can’t cook.

M: But can you help me wash dishes?

J: Yes, I can help you.

M: Let’s make a deal, ok?

J: What kind of deal?

M: I’m going to cook.

J: And then?

M: Then you wash the dishes after the meal.

J: That’s ok. I’ d like to help you mom.

M: You are a good boy.

和：

topic: Health – illness

Place: home

Age: 4-5

title: have a headache

A: Bob, I’m not feeling good.

B: What’s wrong?

A: I’ve got a headache.

B: Oh, take some medicine, honey. Let me get some for you.

A: Thanks, Bob.

topic: House – kitchen

Place: home

Age: 6-7

title: feeling hungry

A: Hey, Bob, what are you doing here?

B: I’m hungry. I’m looking for some food.

A: We still have some vegetables in the refrigerator. I can make you a simple salad.

B: Great! Thank you!

A: What kind of dressing do you want?

B: French dressing is fine.

最后保存到CMS系统中了：

转载请注明：在路上 » 【已解决】把文本格式的剧本内容用Python批量导入后台系统

Post Views: 1,591

与本文相关的文章

订阅在路上