用Kotlin开发android平台语音识别，语义理解应用（olamisdk）

本文使用Kotlin开发Android平台的一个语音识别方面的应用，用的是欧拉密开放平台olamisdk。

1.Kotlin简介

Kotlin是由JetBrains创建的基于JVM的编程语言，IntelliJ正是JetBrains的杰作，而android Studio是
基于IntelliJ修改而来的。Kotlin是一门包含很多函数式编程思想的面向对象编程语言。

　　后来了解到Kotlin原来是以一个岛的名字命名的(Котлин)，它是一门静态类型编程语言，支持JVM平台，Android平台，浏览器JS运行环境，本地机器码等。支持与Java，Android 100% 完全互操作。Kotlin生来就是为了弥补Java缺失的现代语言的特性，并极大的简化了代码，使得开发者可以编写尽量少的样板代码。

2.Kotlin,java,Swift简单比较

1.输出Hello,World!

        JAVA:  System.out.println("Hello,World!"); 
        Kotlin: println("Hello,World!")
        Swift:  print("Hello,World!")
   
   1
2
3
   
   1
2
3

2.变量和常量

        Java：  int  mVariable =10;
                mVariable =20;
                static final int mConstant = 10;
        Kotlin：var mVariable = 10
                mVariable = 20
                val mConstant = 10      
        Swift：var mVariable = 10
               mVariable = 20
               let mConstant = 10            
        感觉Swift和Kotlin比Java简洁,Kotlin和swift很像。
   
   1
2
3
4
5
6
7
8
9
10
   
   1
2
3
4
5
6
7
8
9
10

3.强制类型转换

      Swift ： 
               let label = "Hello world "
               let width = 80
               let widthLabel = label + String(width)
      Kotlin ：
               val label = "Hello world  "
               val width = 80
               val widthLabel = label + width       
   
   1
2
3
4
5
6
7
8
   
   1
2
3
4
5
6
7
8

4数组

     Swift ：
                var tempList = ["one", "two","three"]
                tempList[1] = "zero"
     Kotlin ：
               val tempList = arrayOf("one", "two","three")
               tempList[1] = "zero"
   
   1
2
3
4
5
6
   
   1
2
3
4
5
6

5.函数

  Swift ： func greet(_ name: String,_ day: String) -> String { 
                                     return "Hello \(name),today is \(day)." } 
                    greet("Bob", "Tuesday")

            Kotlin ：    
                   fun greet(name: String, day: String): String { 
                                      return "Hello $name, today is $day."}
                   greet("Bob", "Tuesday")            
   
   1
2
3
4
5
6
7
8
   
   1
2
3
4
5
6
7
8

6.类声明及用法

 Swift ： 

       声明：class Shape {
                    var numberOfSides = 0
                    func simpleDescription() -> String {
                         return "A shape with \(numberOfSides) sides."
                    }
              }
       用法：var shape = Shape()
              shape.numberOfSides = 7
              var shapeDescription = shape.simpleDescription()
    Kotlin ： 

        声明：class Shape {
                    var numberOfSides = 0
                    fun simpleDescription() = "A shape with $numberOfSides sides."
              }
        用法： var shape = Shape()
               shape.numberOfSides = 7
               var shapeDescription = shape.simpleDescription()
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

可见，Kotlin和Swift好像，现代语言的特征，比java这样的高级语言更加简化，更贴近自然语言。

3.开发环境

本文使用的是android studio2.0版本，启动androd studio。
如下图在configure下拉菜单中选择plugins，在搜索框中搜索Kotlin，找到结果列表中的”Kotlin”插件。

这里写图片描述

如下图，找了一张还没有安装kotlin插件的图

这里写图片描述

点击右侧intall，安装后重启studio.

4.新建android项目

你可以像以前使用android stuio一样新建一个andoid项目，建立一个activity。本文用已经完成的一个demo来做示范。

如下图是一个stuio的demo工程
这里写图片描述

选择MainActivity和MessageConst两个java文件，然后选择导航栏上的code，在下拉菜单中选择convert java file to kotlin file
这里写图片描述

系统会自动进行转化，转化完后会生成对应的MainActivity.kt MessageConst.kt文件，打开MainActivity.kt，编译器上方会提示”Kotlin not configured”，点击一下Configure按钮，IDE就会自动帮我们配置好了！

将两个kt文件复制到src/kotlin目录下，如下图

这里写图片描述

转化后的文件，也许有些语法错误，需要按照kotlin的语法修改。

环境配置好后，来看下gradle更新有哪些区别

project的gradle代码如下：

buildscript {
    ext.kotlin_version = '1.1.3-2'
    repositories {
        jcenter()
    }
    dependencies {
        classpath 'com.android.tools.build:gradle:2.0.0'
        //此处多了kotlin插件依赖
        classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
    }
}

allprojects {
    repositories {
        jcenter()
    }
}
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

再来看看某个module的gradle代码：

apply plugin: 'com.android.application'
apply plugin: 'kotlin-android'//此处多了这条插件声明

android {
    compileSdkVersion 14
    buildToolsVersion "24.0.0"

    defaultConfig {
        applicationId "com.olami"
        minSdkVersion 8
        targetSdkVersion 14
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.txt'
        }
    }
    sourceSets {
        main.java.srcDirs += 'src/main/kotlin' //生成的***.kt文件需要copy到对应的目录
    }
}

dependencies {
    compile 'com.android.support:support-v4:18.0.0'
    compile files('libs/voicesdk_android.jar')
    compile "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version"//此处多了kotlin包的依赖
}
repositories {
    mavenCentral()
}

   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

如上所示，如果不是通过转化的方式新建kotlin工程，则需要自己按照上面的gradle中增加的部分配置好。

5.olami语音识别应用

先贴一张识别后的效果图：
这里写图片描述

在MainActivity.kt中

override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        initHandler()//初始化handler用于处理消息

        initView()//初始化view控件，比如点击开始录音的button

        initViaVoiceRecognizerListener()//初始化语音识别回调，用于返回录音状态和识别结果

        init()//初始化语音识别对象
    }
   
   1
2
3
4
5
6
7
8
9
10
11
12
   
   1
2
3
4
5
6
7
8
9
10
11
12

fun init() 
{
        initHandler()
        //定义olami语音识别对象
        mOlamiVoiceRecognizer = OlamiVoiceRecognizer(this@MainActivity)
        val telephonyManager = this.getSystemService(
                                    Context.TELEPHONY_SERVICE) as TelephonyManager
        val imei = telephonyManager.deviceId

        mOlamiVoiceRecognizer!!.init(imei)
        //set null if you do not want to notify olami server.

        //设置回调，用于更新录音状态和数据等的界面
        mOlamiVoiceRecognizer!!.setListener(mOlamiVoiceRecognizerListener)

        //设置支持的语言类型，默认请设置简体中文
        mOlamiVoiceRecognizer!!.setLocalization(
                                 OlamiVoiceRecognizer.LANGUAGE_SIMPLIFIED_CHINESE)
        mOlamiVoiceRecognizer!!.setAuthorization("51a4bb56ba954655a4fc834bfdc46af1",   
                                   "asr", "68bff251789b426896e70e888f919a6d", "nli")

        //注册Appkey，在olami官网注册应用后生成的appkey
        //注册api，请直接填写“asr”，标识语音识别类型
        //注册secret，在olami官网注册应用后生成的secret

        mOlamiVoiceRecognizer!!.setVADTailTimeout(2000)
        //录音时尾音结束时间，建议填//2000ms

        mOlamiVoiceRecognizer!!.setLatitudeAndLongitude(
                                             31.155364678184498, 121.34882432933009)
        //设置经纬度信息，不愿上传位置信息，可以填0 
    }
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

代码比较简单，点击开始录音button后，启动录音，在OlamiVoiceRecognizerListener中回调处理，然后通过handler发送消息用于更新界面。

来看一下初始化view的代码，看看跟java方式书写有哪些不同

private fun initView() 
{
        mBtnStart = findViewById(R.id.btn_start) as Button
        mBtnStop = findViewById(R.id.btn_stop) as Button
        mBtnCancel = findViewById(R.id.btn_cancel) as Button
        mBtnSend = findViewById(R.id.btn_send) as Button
        mInputTextView = findViewById(R.id.tv_inputText) as TextView
        mEditText = findViewById(R.id.et_content) as EditText
        mTextView = findViewById(R.id.tv_result) as TextView
        mTextViewVolume = findViewById(R.id.tv_volume) as TextView

        mBtnStart!!.setOnClickListener {
            if (mOlamiVoiceRecognizer != null)
                mOlamiVoiceRecognizer!!.start()
        }

        mBtnStop!!.setOnClickListener {
            if (mOlamiVoiceRecognizer != null)
                mOlamiVoiceRecognizer!!.stop()
            mBtnStart!!.text = "开始"
            Log.i("led", "MusicActivity mBtnStop onclick 开始")
        }

        mBtnCancel!!.setOnClickListener {
            if (mOlamiVoiceRecognizer != null)
                mOlamiVoiceRecognizer!!.cancel()
        }

        mBtnSend!!.setOnClickListener {
            if (mOlamiVoiceRecognizer != null)
                mOlamiVoiceRecognizer!!.sendText(mEditText!!.text.toString())
            mInputTextView!!.text = "输入: " + mEditText!!.text
        }


    }
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

是不是感觉代码更简练了？
下面两句赋值，效果相同，第二句可以用id之间进行文本赋值，比以前简练好多。

 mInputTextView!!.text = "输入: " + mEditText!!.text
 tv_inputText.text = "输入: " + et_content.text
   
   1
2
   
   1
2

再来看看handler：

private fun initHandler() {
        mHandler = object : Handler() {
            override fun handleMessage(msg: Message) {
                when (msg.what) {
                    MessageConst.CLIENT_ACTION_START_RECORED -> mBtnStart!!.text 
                                                                = "录音中"
                    MessageConst.CLIENT_ACTION_STOP_RECORED -> mBtnStart!!.text 
                                                                = "识别中"
                    MessageConst.CLIENT_ACTION_CANCEL_RECORED -> {
                        mBtnStart!!.text = "开始"
                        mTextView!!.text = "已取消"
                    }
                    MessageConst.CLIENT_ACTION_ON_ERROR -> {
                        mTextView!!.text = "错误代码：" + msg.arg1
                        mBtnStart!!.text = "开始"
                    }
                    MessageConst.CLIENT_ACTION_UPDATA_VOLUME -> mTextViewVolume!!.text
                                                                = "音量: " + msg.arg1
                    MessageConst.SERVER_ACTION_RETURN_RESULT -> {
                        if (msg.obj != null)
                            mTextView!!.text = "服务器返回: " + msg.obj.toString()
                        mBtnStart!!.text = "开始"
                        try {
                            val message = msg.obj as String
                            var input: String? = null
                            val jsonObject = JSONObject(message)
                            val jArrayNli = 
                                  jsonObject.optJSONObject("data").optJSONArray("nli")
                            val jObj = jArrayNli.optJSONObject(0)
                            var jArraySemantic: JSONArray? = null
                            if (message.contains("semantic")) {
                                jArraySemantic = jObj.getJSONArray("semantic")
                                input = 
                                   jArraySemantic!!.optJSONObject(0).optString("input")
                            } else {
                                input =   jsonObject.optJSONObject("data")
                                              .optJSONObject("asr").optString("result")
                            }
                            if (input != null)
                                mInputTextView!!.text = "输入: " + input
                        } catch (e: Exception) {
                            e.printStackTrace()
                        }

                    }
                }
            }
        }
    }
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

原来的switch case的方式，变成了when***，代码不仅简练，更贴近现代语言，更容易理解。

上面的MessageConst.SERVER_ACTION_RETURN_RESULT时，获取了服务器返回的结果，紧接着对这段语义进行了简单的解析

{
    "data": {
        "asr": {
            "result": "我要听三国演义",
            "speech_status": 0,
            "final": true,
            "status": 0
        },
        "nli": [
            {
                "desc_obj": {
                    "result": "正在努力搜索中，请稍等",
                    "status": 0
                },
                "semantic": [
                    {
                        "app": "musiccontrol",
                        "input": "我要听三国演义",
                        "slots": [
                            {
                                "name": "songname",
                                "value": "三国演义" }
                        ],
                        "modifier": [
                            "play"
                        ],
                        "customer": "58df512384ae11f0bb7b487e"
                    }
                ],
                "type": "musiccontrol"
            }
        ]
    },
    "status": "ok"
}
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

1）解析出nli中type类型是musiccontrol,这是语法返回app的类型，而这个在线听书的demo只关心musiccontrol这个app类型，其他的忽略。

2）用户说的话转成文字是在asr中的result中获取
3）在nli中的semantic中，input值是用户说的话，同asr中的result。
modifier代表返回的行为动作，此处可以看到是play就是要求播放，slots中的数据表示歌曲名称是三国演义。
那么动作是play，内容是歌曲名称是三国演义，在这个demo中调用
mBookUtil.searchBookAndPlay(songName,0,0);会先查询，查询到结果会再发播放消息要求播放，我要听三国演义这个流程就走完了。

这段是在线听书应用中的语义解析，详情请看博客：http://blog.csdn.net/ls0609/article/details/71519203