Thing you want to do

Try Watson's Speech to text. Try running the sample demo site below (https://www.ibm.com/blogs/watson/2016/07/getting-robots-listen-using-watsons-speech-text-service/)

background

Watson's voice authentication (Speech to Text) for creating Raspberry Pi Robo that can convert video audio into text in real time Try.

As shown in the figure below, the final goal is voice authentication and transcription with Raspberry Pi 3 x Julius x Watson (Speech to Text). (http://qiita.com/nanako_ut/items/1e044eb494623a3961a5)

This time, we will search for the watson voice authentication method in part (4) of the figure below.

environment

Raspberry Pi3
Python 2.7.9

Premise

The following is assumed to be ready. --User registration to watson (It seems that all services can be used free of charge for one month after registration) --Created Speech to Text service with watson and obtained credentials

.Watson will post separately how to create a Speech to Text service and obtain credentials.

procedure

Connect with curl (upload audio file)
Connect with python Part 1 (audio file upload)
Connection with python Part 2 (Real-time voice analysis with WebSocket connection)

■ Connect with curl (upload audio file)

1.1 Upload audio file

Specify the audio file (test.wat) and upload it to watson via HTTP connection

For .username: password, specify the user name and password of the credentials.

curl -X POST -u username:passward --header "Content-Type: audio/wav" --header "Transfer-Encoding: chunked" --data-binary @test.wav "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=ja-JP_BroadbandModel"

1.2 Execution result

Something has returned. But ... the characters are garbled ... Is Raspberry Pi UTF-8 garbled due to Japanese analysis results (S-JIS?)? ??

■ Connection with python Part 1 (audio file upload)

Implemented with reference to this sample source Getting robots to listen: Using Watson’s Speech to Text service

2.1 Environmental maintenance

python library for watson watson-developer-cloud-0.23.0 installation

pip installation

Not required if pip is already installed. It wasn't in the Raspberry Pi I'm using, probably because I put RASPBIAN JESSIE LITE in Raspberry Pi 3. .. ..

$ python -m pip -V
/usr/bin/python: No module named pip

$ sudo apt-get install python-pip
Reading package lists... Done
Building dependency tree
~ Halfway through ~

$ python -m pip -V
pip 1.5.6 from /usr/lib/python2.7/dist-packages (python 2.7)

update

$ sudo pip install -U pip
  Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
  Found existing installation: pip 1.5.6
    Not uninstalling pip at /usr/lib/python2.7/dist-packages, owned by OS
Successfully installed pip
Cleaning up...

$ python -m pip -V
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)

watson-developer-cloud installation

$ sudo pip install --upgrade watson-developer-cloud
Collecting watson-developer-cloud
  Downloading watson-developer-cloud-0.23.0.tar.gz (52kB)
~ Halfway through ~
Successfully installed pysolr-3.6.0 requests-2.12.5 watson-developer-cloud-0.23.0

2.2 Execution program

Copy the referenced site

.Test1.wav records English voice

`watson_test1.py`


from watson_developer_cloud import SpeechToTextV1
import json

stt = SpeechToTextV1(username="username", password="password")
audio_file = open("test1.wav", "rb")
print json.dumps(stt.recognize(audio_file, content_type="audio/wav"), indent=2)

2.3 Execution

Something came back. It seems that the text is being returned. However, it should have been a longer voice, but the text was cut off in the middle! ?? ??

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.438,
          "transcript": "so we know it's coming Julio just say yeah lost me grow mandatory right here shone like a great kid fifth grader etan Allemand planning his fifth critics "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

■ Connection with python Part 2 (Real-time voice analysis with WebSocket connection)

It seems that you can analyze voice in real time by using something called webSocket.

3.1 What is webSocket?

(https://www.html5rocks.com/ja/tutorials/websockets/basics/) The WebSocket specification defines an API that establishes a "socket" connection between a web browser and a server. Simply put, there is a persistent connection between the client and the server, and either side can start sending data at any time.

It seems.

(http://www.atmarkit.co.jp/ait/articles/1111/11/news135.html) In HTML5, a new communication standard called "WebSocket" has been added. Feature

Once a connection is established between the server and the client, data can be exchanged via socket communication without being aware of the communication procedure unless explicitly disconnected. A server with a WebSocket connection and all clients can share the same data and send and receive in real time. In the conventional communication technology, an HTTP header is added each time communication is performed, so in addition to sending and receiving data according to the number of connections, a small amount of traffic is generated and resources are consumed. WebSocket sends a handshake request from the client side to continue using the connection on the first connection. The server side uses one connection by returning a handshake response and continues. It seems.

I see. .. ..

3.2 Environmental improvement

Install ws4py library for webSocket

$ sudo pip install ws4py
Collecting ws4py
  Downloading ws4py-0.3.5-py2-none-any.whl (40kB)
    100% |????????????????????????????????| 40kB 661kB/s
Installing collected packages: ws4py
Successfully installed ws4py-0.3.5

3.2 Execution program