Escolar Documentos
Profissional Documentos
Cultura Documentos
Recognition.
De todos los que he probado, me decanto por el de Google. El problema es
que para el uso de ste, deberemos guardar en un wav lo que digamos,
convertirlo a flac, y enviarlo por la api usando un navegador (menudo
pitote).
Basically, the library encodes the sample into FLAC using a third party FLAC
library, then issues a request to "https://www.google.com/speechapi/v1/recognize?xjerr=1&client=chromium&lang=en-US" with a ContentType header specifying the format (FLAC) and bitrate (8 kbit).
http://www.codeproject.com/Articles/338010/Fun-with-Google-SpeechRecognition-service
Introduction
I was excited to discover open web services like Google has, and it was very
amazing when I heard about Google speech recognition.
In this article, I write some tips to use Google speech recognition API in Windows
application with direct recording voice from audio input devices. And also, like a
delicious spice - wear simple program for speech recognition into the utility for
quick issues adding in Redmine project.
Background
The basic idea was: you push the button, some timer starts elapse together with
wave-in device opening, main loop starts and pcm data from buffers with your
voice records to file, timer stops and audio file is posted to Google for
recognition.
First task was in understanding flac encoding in realtime, you can tell 'In *nix, I
can write couple commands in terminal and do all: record, encode, post flac file
and receive answer from server. So, why do you not encode file with encoder
program started after recording wave file?' - because it's boring, just imagine:
your program writes already prepared flac audio file!
From the time then I wrote some application for batch converting mp3 files to
OGG/Vorbis, I have stayed library that can encode pcm to vorbis in realtime,
there also was ring buffer for that.
At that point, the appropriate handler for the flac did not wait. You might know
that Google accepts flac in 16 kHz and 16 bit per sample with 1(mono) channel
format. By using example in libflac, I add three
functions:InitialiseEncoder, ProcessEncoder, CloseEncoder which are,
respectively: open file and prepare encoder, upload to encoder 16bit pcm
samples, close file and destroy encoder. One thing: don't understand why it
can't add metadata to flac file? Maybe charset problems?
The wonderful article: WaveLib, which has wave-in API implementation included,
that uses Recorder class: starts theWaveInRecorder and in parallel uses thread
for transmitting pcm data to encoder.
File Uploading
The basic upload function usage is below, change lang parameter optionally:
Collapse | Copy Code
Issue Creating
In which case can you use the speech recognition? Maybe for issue creating?
Maybe it is not practical, but certainly funny.
The Redmine web application includes REST web service. By it, we can create
issues as much as we need to, just specify project and tracker, by the way the
list of trackers I could only get younger version 1.3*.
Collapse | Copy Code
Points of Interest
When it was over, I drew attention to record timeout, it gives you 4 secs for your
speech: not for all expressions of it may be appropriate, form maybe needs
some stop button?
Ring buffer will save you from data loss in case of such records directly to flac.
When the data comes from the wave-in, they go in the ring buffer.
Fondo
La idea bsica era: se presiona el botn, el temporizador comienza a transcurrir
algunos con olas en la apertura del dispositivo, las principales salidas de bucle y
los datos PCM de buffers con sus registros de voz en un archivo, se detiene el
temporizador y el archivo de audio se envi a Google para su reconocimiento.
Primera tarea consista en entender la codificacin flac en tiempo real, se puede
decir 'En * nix, puedo escribir par de comandos en el terminal y hacer todo:
grabar, codificar, publicar archivos flac y recibir respuesta del servidor. As que,
por qu no codificar archivos con el programa codificador iniciado despus de
grabar archivos de onda? - Porque es aburrido, imagnense: el programa graba
archivos de audio flac ya preparado!
Desde el momento en que escrib entonces alguna aplicacin para la conversin
por lotes archivos MP3 a OGG / Vorbis, me he alojado biblioteca que puede
codificar pcm a vorbis en tiempo real, tambin hubo memoria cclica para eso.
En ese punto, el controlador apropiado para el FLAC no esper. Usted puede
saber que Google acepta flac en 16 kHz y 16 bits por muestra con 1 (mono)
Formato de canal. Por ejemplo, en el uso de libflac, aado tres
funciones:InitialiseEncoder , ProcessEncoder , CloseEncoder que son,
respectivamente: archivos abiertos y os preparare encoder, subo a muestras
PCM encoder de 16 bits, cerrar el archivo y destruyo encoder. Una cosa: no
entiendo por qu no se puede aadir metadatos a los archivos
flac? Quizs charset problemas?
El artculo maravilloso: WaveLib , que tiene olas de aplicacin API incluida, que
utiliza la grabadora de clase: se inicia laWaveInRecorder y paralelamente
utiliza hilos para transmitir datos PCM para encoder.
Carga de archivos
El uso bsico funcin de carga est por debajo, cambie el parmetro lang
opcionalmente:
Collapse | Copiar cdigo
string resultado =
rate = 16000 " , los parmetros, nulo );
Emitir Creacin
En este caso se puede utilizar el reconocimiento de voz? Tal vez para la creacin
de tema? Tal vez no es prctico, pero sin duda divertido.
La aplicacin web Redmine incluye servicio web REST. Por ello, podemos crear
problemas tanto como necesitamos, basta con especificar los proyectos y
seguimiento, por la forma en la lista de seguidores que slo poda obtener la
versin ms joven 1.3 *.
Collapse | Copiar cdigo
Puntos de inters
Cuando todo termin, me llam la atencin para registrar tiempo de espera,
que te da 4 segundos para su discurso: no para todas las expresiones de la
misma puede ser apropiado, la forma tal vez necesita un poco de botn de
parada?
Memoria circular te salvar de la prdida de datos en caso de tales registros
directamente a flac. Cuando los datos proceden de la onda-in, se van en el
bfer de anillo.
Historia
Licencia
Introduction
As I already mentioned in my article A low-level audio player in C#, there are no
built-in classes in the .NET framework for dealing with sound. This holds true not
only for audio playback, but also for audio capture.
It should be noted, though, that the Managed DirectX 9 SDK does include
classes for high-level and low-level audio manipulation. However, sometimes
you dont want your application to depend on the full DX 9 runtime, just to do
basic sound playback and capture, and there are also some areas where
Managed DirectSound doesnt help at all (for example, multi-channel sound
playback and capture).
Nevertheless, I strongly recommend you to use Managed DirectSound for sound
playback and capture unless you have a good reason for not doing so.
This article describes a sample application that uses
the waveIn and waveOut APIs in C# through P/Invoke to capture an audio
signal from the sound cards input, and play it back (almost) at the same time.
catch
{
Stop();
throw;
}
}
The WaveInRecorder constructor takes five parameters. Except for the last
parameter, their meaning is the same as in WaveOutPlayer.
The first parameter is the ID of the wave input device that you want to use. The
value -1 represents the default system device, but if your system has more than
one sound card, then you can pass any number from 0 to the number of
installed sound cards minus one, to select a particular device.
The second parameter is the format of the audio samples.
The third and forth parameters are the size of the internal wave buffers and the
number of buffers to allocate. You should set these to reasonable values.
Smaller buffers will give you less latency, but the captured audio may have
gaps on it if your computer is not fast enough.
The fifth and last parameter is a delegate that will be called periodically as
internal audio buffers are full of captured data. In the sample application we just
write the captured data to the FIFO, like this:
Collapse | Copy Code
Similarly, the Filler method is called every time the player needs more data. Our
implementation just reads the data from the FIFO, as shown below:
Collapse | Copy Code
Conclusion
# Curl-H "Content-Type: audio / x-flac, tasa = 16000" "https://www.google.com/speechapi/v1/recognize?xjerr=1&client=chromium&lang=en-US"-F miarchivo = "@ C: \ input.flac"k-o "C: \ output.txt"
Funciona excelente! Slo algunas notas:
1) copiar y pegar: utilizar diferentes seales cotizacin
2) hacer que tipo de seguro = 16000 corresponde a la tasa de bits (audacity: antes de
grabar)!
3) que tena un poco mejor los resultados de grabacin mono
Alguien tiene algo sobre:
* Hecho espera de 100-continue
Me falta esta milisegundos para esperar a que ...
1.
2.
3.
4.
5.
6.
"xjerr=1&client=speech2text&lang=en-US&maxresults=10";
7.
8.
9.
connection.setDoOutput(true);
10.
connection.setDoInput(true);
11.
connection.setInstanceFollowRedirects(false);
12.
connection.setRequestMethod("POST");
13.
14.
connection.setRequestProperty("User-Agent", "speech2text");
15.
connection.setConnectTimeout(60000);
16.
connection.setUseCaches (false);
17.
18.
DataOutputStream wr = new DataOutputStream(connection.getOutputStream ());
19.
wr.writeBytes(new String(data));
20.
wr.flush();
21.
wr.close();
22.
connection.disconnect();
23.
24.
System.out.println("Done");
25.
26.
27.
28.
connection.getInputStream()));
29.
String decodedString;
30.
31.
System.out.println(decodedString);
32.
DataOutputStream wr = new
DataOutputStream(connection.getOutputStream ());
wr.writeBytes(new String(data));
wr.flush();
wr.close();
connection.disconnect();
System.out.println("Done");
1.
package test;
2.
3.
import java.io.BufferedReader;
4.
import java.io.DataOutputStream;
5.
import java.io.InputStreamReader;
6.
import java.net.HttpURLConnection;
7.
import java.net.MalformedURLException;
8.
import java.net.URL;
9.
import java.nio.file.Files;
10.
import java.nio.file.Path;
11.
import java.nio.file.Paths;
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
"speech-api/v1/recognize?"+
22.
"xjerr=0&client=speech2text&lang=en-US&maxresults=20";
23.
24.
25.
connection.setDoOutput(true);
26.
connection.setDoInput(true);
27.
connection.setInstanceFollowRedirects(false);
28.
connection.setRequestMethod("POST");
29.
connection.setRequestProperty("Content-Type", "audio/x-flac;
rate=16000");
30.
connection.setRequestProperty("User-Agent", "speech2text");
31.
connection.setConnectTimeout(60000);
32.
connection.setUseCaches (false);
33.
34.
DataOutputStream wr = new
DataOutputStream(connection.getOutputStream ());
35.
wr.write(data);
36.
wr.flush();
37.
wr.close();
38.
connection.disconnect();
39.
40.
System.out.println("Done");
41.
42.
43.
44.
String decodedString;
45.
46.
System.out.println(decodedString);
47.
48.
}
}
49.
package test;
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
Path path =
Paths.get("C:\\Users\\CDAC\\Downloads\\priyanka.flac");
byte[] data = Files.readAllBytes(path);
DataOutputStream wr = new
DataOutputStream(connection.getOutputStream ());
wr.write(data);
wr.flush();
wr.close();
connection.disconnect();
System.out.println("Done");
String decodedString;
while ((decodedString = in.readLine()) != null) {
System.out.println(decodedString);
}
}
1.
2.
3.
4.
5.
"Speech-api/v1/recognize?" +
6.
"Xjerr = 1 & client = speech2text & lang = es-US & maxResults = 10" ;
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
DataOutputStream wr = nueva DataOutputStream ( conexin. getOutputStream ( ) ) ;
19.
20.
wr. ras ( ) ;
21.
. wr close ( ) ;
22.
. conexin desconexin ( ) ;
23.
24.
25.
26.
27.
nueva InputStreamReader (
28.
conexin. getInputStream ( ) ) ) ;
29.
Cadena DecodedString ;
30.
31.
32.
connection.setRequestProperty("Content-Type", "audio/x-flac;
rate=16000");
connection.setRequestProperty("User-Agent", "speech2text");
connection.setConnectTimeout(60000);
connection.setUseCaches (false);
DataOutputStream wr = new
DataOutputStream(connection.getOutputStream ());
wr.writeBytes(new String(data));
wr.flush();
wr.close();
connection.disconnect();
System.out.println("Done");