I-State Of Linux Voice Recognition

Isingeniso

Ngichitha isikhathi esiningi ukucwaninga ngezihloko futhi kaningi ngicabanga ngendaba ethile ngendaba ngenkathi ngihamba esiteshini sesitimela noma lapho ngiphuma futhi cishe ngokujwayelekile.

Ngenye ilanga kusihlwa ngenkathi ngihamba ngamakhilomitha angu-1.5 ngiya esiteshini esivela emsebenzini wami ngacabanga ukuthi "ngeke kube kuhle uma ngizorekhoda lokho engangifuna ukukusho bese ngibhala ngokuzenzekelayo efonini yombhalo engingayenza futhi ngiyifake kamuva" .

Ngichithe amahora amaningi amade ngibheka izinketho ezahlukene ezitholakalayo zokuqashelwa kwezwi nokuqamba okufaka phakathi ukurekhoda ngokuqondile kumakrofoni usebenzisa isofthiwe yokuqamba iLinux, ukurekhoda ifayela kwifomethi ye-MP3 noma ye-WAV futhi uyiguqule ngomugqa womyalo, kanye nokusebenzisa i-Chrome kanye nezinhlelo zokusebenza ze-Android.

Lesi sihloko sigcizelela imiphumela yami emva kwezinsuku zomsebenzi onzima.

Izinketho ze-Linux

Ukuzama ukuthola isichazamazwi nesofthiwe yokuqashelwa kwezwi ku-Linux akulula njengoba kungase kube khona futhi izinketho ezitholakalayo akuzona ezihlakaniphile.

Leli phepha le-wikipedia linolwazi lwezinketho ezikhona okungenzeka kuhlanganise no-CMU Sphinx, uJulius noSimon.

Ngisebenzisa i-SparkyLinux esekelwe ku-Debian Testing okwamanje futhi ngingakutshela ukuthi iphakheji elilodwa kuphela lokuqaphela izwi elitholakalayo kuma-repository yi-Sphinx.

Uhlelo lwe-Linux lwamazwe engangiqeda ukuzama lwaluyi-PocketSphinx, engangivame ukuguqula amafayela we-WAV ukuze ngithumele umbhalo kanye ne-Freespeech-VR okuyinto isicelo se-python esikuvumela ukuba uqophe ngqo kusuka kumakrofoni.

Ngiphinde ngizame izinhlelo zokusebenza ezimbalwa ze-Chrome kuhlanganise VoiceNote II noDictanote.

Ekugcineni ngazama i- "Dictation and Email" kanye ne "Talk and Talk Dictation" Izinhlelo zokusebenza ze-Android.

Freespeech-VR

I-Freespeech-VR ayitholakali kumareferensi ajwayelekile. Ngilande amafayela kusuka lapha.

Ngemva kokulanda nokukhipha okuqukethwe kwefayili ye-zip ngavula isiphequluli futhi ngaya kufolda lapho amafayela athathwe khona.

Ngifake umyalo olandelayo ukuvula i-freespeech-vr.

sudo python freespeech-vr

Nginekhanda le-headphones nge-microphone ehloniphekile nenkulumo ecacile yesiNgisi yaseningizimu.

Umbhalo olandelayo uvele efasiteleni le-freespeech-vr:

Siyakwamukela ezinja zezinyunyana zomphumela namuhla Nesiqiniseko Sendlela Yokusingatha Izivivinyo Okufanele kuvivinywa Uma Ukuthumela Ukusebenzisa I-system Indlela Yokukhuluma Ngomunye Ngokwakho Kwaphela Kuphela Ngokwethemba Lokuhlala Futhi I-The Means Of One Izinkukhu zegolide njengehlelo I-Ea lapho ngibizwa ngokuthi i-intshi elandelayo ibiza ifoni Le fayili Ngokushesha ngokwanele amacala ocingweni kumaHands- Isikhala sokuthi i-sphinx Iya Ukuthi akuyona amafoni azokwabiwa A aqeqeshiwe kanye namathuluzi Sebenzisa ukukhuluma Uma uqedile Yisho ifayela elisetshenziswe Last indaba A Futhi usebenzisa i-Lapho kwenzeka kanjani ukuthi lokhu kuphumelele Le Linux yayinjengokuthi uyayigwema yini

Ngingathanda ukusho manje ukuthi lokhu akusiyo iwebhusayithi ye-Unit Of Dogs futhi angizange ngikhulume nganoma yikuphi ukukwenza ngezinkukhu zegolide. Ngangempela ngizama ukuchaza inqubo yokusebenzisa isofthiwe yokuqaphela izwi.

Ngazama isofthiwe izikhathi ezimbalwa kufaka phakathi i-pitch nejubane ehlukile kodwa ukunemba kwakungalungile.

I-PocketSphinx

I-PocketSphinx iyakwazi ukuthatha ifayela le-WAV bese uyiguqulela ekuthumeni usebenzisa umugqa womyalo.

I-PocketSphinx iyatholakala ngamakhomitha eDebian futhi kufanele itholakale ekusakazeni okuningi.

Inkinga enkulu engiyitholile ne-PocketSphinx yukuthi udinga cishe idijithali emibonweni yokuqaphela izwi, amafayela olimi, izichazamazwi nendlela yokuqeqesha uhlelo.

Emva kokufaka i-PocketSphinx kufanele uye kwiwebhusayithi ye-CMU Sphinx bese ufunda ulwazi oluningi ngangokunokwenzeka. Udinga ukulanda ifayela lemodeli elandelayo.

(Uma ungesiye isiNgisi esikhulumi sendawo ukhetha imodeli yolimi efanelekayo).

Amadokhumenti e-PocketSphinx no-Sphinx ngokujwayelekile kunzima ukuwaqonda kumuntu olele kodwa kusukela kulokho engingenza khona amafayili esichazamazwi asetshenziselwa ukuhlinzeka uhlu lwamagama akhona kanye namamodeli olimi analo uhlu lwamagama angama-pronunciations.

Ukuhlola i-PocketSphinx Ngasebenzisa ukurekhoda kwezwi lami siqu, umshicileli ovela ku-Al Pacino ku-"Ummeli WabaDemoni" nomsakazo ovela ku "Morgan Freeman". Iphuzu lalezi kwakuwukuzama amagama ahlukene futhi kimi akekho ongatshela indaba ngokucacile njengoMorgan Freeman futhi akekho ohambisa umugqa onjenge-Al Pacino.

Ukuze i-PocketSphinx isebenze idinga ifayela le-WAV futhi idinga ukuba ifomethi ethile. Uma ifayela lise-MP3 format sebenzisa umyalo weffmpeg ukuwuguqula ube ifomethi ye-WAV:

ffmpeg -i inputfilename.mp3 -acodec pcm_s16le -ar 16000 outputfilename.wav

Ukuqalisa i-PocketSphinx sebenzisa umyalo olandelayo:

pocketsphinx_continuous -dict /usr/share/pocketsphinx/model/lm/en_US/cmu07a.dic -infile voice2.wav -lm cmusphinx-5.0-en-us.lm 2> izwi2.log

I-pocketsphinx_continuous ithatha ifayela le-WAV iphinde liyiguqule ukuze libhalwe phansi.

Kumyalo ongenhla kwe-pocketsphinx utshelwe ukusebenzisa ifayela lesichazamazwi elibizwa ngokuthi "/usr/share/pocketsphinx/model/lm/en_US/cmu07a.dic" ngesimo selimi "cmusphinx-5.0-en-us.lm". Ifayela eliguqulwa emgqeni libizwa ngokuthi i-voice2.wav (okuyinto okuqoshiwe engayenza ngezwi lami). Ekugcineni i-2> ibeka konke okushiwo ngu-verbose ukuthi awudingi ngempela efayeleni ebizwa ngezwi2.log. Imiphumela yangempela yokuhlolwa iboniswa ngaphakathi kwefasitela lokugcina.

Imiphumela usebenzisa izwi lami kanje:

wamukelekile kokulandelayo mayelana nalesi sonto ngesifundo mayelana nokuthi iyiphi isofthiwe yokuqaphela ngomzuzu

Imiphumela ayiyona into evelele njengenhlangano ye-freespeech-vr kepha ayisetshenziswa ngempela. Ngabe ngizama ukusebenzisa i-PocketSphinx ne-Al Pacino kepha lokhu akubuyanga nhlobo imiphumela.

Ekugcineni ngazama ukusebenzisa izwi likaMorgan Freeman kusuka ku-movie "Bruce Somandla" futhi nansi imiphumela:

000000000: sizobe kuye
000000001: yiyo yonke leyo yeah nzima kosuku okwamanje manje yilapho esiphila kakhulu ngingumngane oshisayo
000000002: ku-elevator ngubani oyisihluthulelo esincane sehora le-baseball noma ukwazi ukuthi yini okumelwe uyenze ekuphileni
000000003: yiziphi lezo ezizophola
000000004: abazange babhale
000000005: banakho kimi
000000006: kufanele ube yimithetho
000000007: ngilindele wena
000000008: futhi wafunda lapha ukuthi kwakuyimifanekiso kwakuyiqembu le-christmas yombulali
000000009: kuvela enye indlela yokubhala o. imbongolo ngacabanga ukuthi bambalwa bembethe njalo
000000010: njengenkinga ebumbene angeke inikeze okuhle ukuthi ngiyalinganiselwa kubo ngaleso sikhathi lapho singekho konke okucabanga ukuthi ngikhona emhlabeni ngizokwenza izindlu futhi ngibone ukuthi
000000011: ubaba onalo
000000012: yikuphi okuningi ngalokhu
000000013: lokho kunikwe
000000014: konke okunye okungawi phansi kakhulu
000000015: ngakwesokudla ekwindla
000000016: ngibambelele kahle nje
000000017: akujabule uma ngicabanga ukuthi bazoba ukuthi lokho okuzokwenza konke okushadile ku-cha kwakungekho esikuthandayo ngokungafani nendlela

Ukuhlolwa kwami ​​akunakucatshangwa njengesayensi futhi abathuthukisi be-PocketSphinx bangase bathi mina angiyikusebenzisa isofthiwe ngendlela efanele. Kukhona futhi inqubo ebizwa ngokuthi ukuqeqeshwa ngezwi okungasetshenziselwa ukwakha izichazamazwi ezingcono namafayela olimi.

Imibono yami eyinhloko nokho ukuthi kunzima kakhulu ekusebenzeni okujwayelekile kwansuku zonke.

I-VoiceNote II

I-VoiceNote II uhlelo lokusebenza lwe-Chrome olusebenzisa i-Google Voice ukuqashelwa i-API.

Uma usebenzisa iziphequluli ze-Chrome noma i-Chromium ungafaka i-VoiceNote II ngeSitolo Sewebhu .

Imifanekiso kwi VoiceNote II ifakwe ngendlela engavamile njengoba udinga ukusetha ulimi phansi kwefasitela futhi inkinobho yokuhlela nayo iphansi, kepha inkinobho yokurekhoda isesikhundleni esiphakeme kwesokudla.

Into yokuqala okudingeka ukwenze ukhetha ulimi futhi lokhu kungatholakala ngokuchofoza uphawu lwezwe.

Ukuze uqale ukurekhoda, chofoza isithonjana semakrofoni bese uqala ukukhuluma kwimakrofoni yakho. Ngokuba imiphumela engcono engayithola ukukhuluma kancane yayiyisihluthulelo ukuze isofthiwe ithole ithuba lokuqhubeka.

Imiphumela ayiphumekanga njengoba ingabonakala ngezansi:

Sawubona futhi wamukelekile ukuxhuma. I-About.com izihloko zanamuhla mayelana nezwi ukuya ekuguqulweni kombhalo ukuguqulwa komnotho we-dreelm farrell ngokweminyaka engu-2008 njengokuguqulwa futhi kuthiwa kusekelwa kahle indlela engcono engathola ngayo i-addon text voice ukukhombisa iphakheji le-2014debian noma i-rpm livule uhlobo lwezwi ekukhulumeni ukuze liyivule uma lifuna ukukhetha Ukhethile e-edinburgh isiFulentshi saseGermany sithola isikhathi esivumelwaneni esisodwa se-microphone lapho usuqedile ukubhala umbhalo wakho njengefayili yombhalo ukuze usebenze kahle kakhulu yi-accent yesiNgisi ejwayelekile kakhulu engaseningizimu ye-england engcono kakhulu kodwa ngiya kulowo mbhalo we-torrentalong ngedokhumenti yangempela futhi ungabona ngamaphutha akwenza ukuba ulalele

Dictanote

I-Dictanote ingenye uhlelo lokusebenza le-Chrome elingasetshenziselwa izinhloso zokuphoqelela futhi litholakale njengeyinkimbinkimbi kakhulu kepha imiphumela ayizange ibe ngcono kune VoiceNote II.

Ngisebenzise kuphela i-demo version ye-Dictanote okuvimbela wena ekudaleni amadokhumenti amasha kodwa ikuvumela ukuthi ukhulume ngombhalo osevele usemhleli. Ngikwazi ukuhlola ukuqashelwa kwezwi kodwa imiphumela ayikho kangcono kune-VoiceNote II ngakho-ke angizange ngibhalisele inguqulo yepro.

Ukumemezela Neposi

I-"Dictation And Mail" iyisicelo se-Android esisebenzisa i-API ye-Google Voice recognition API.

Imiphumela evela ku-"Dictation and Mail" yayingcono kunanoma yiluphi olunye uhlelo oluzama ukufika kuleli phuzu.

Sawubona wamukelekile ku-Linux mayelana., namhlanje sikhuluma ngokuguqula umsindo kumbhalo

Iqhinga nge "Dictation and Mail" ukukhuluma kancane futhi ukubiza ngendlela ongayenza ngayo nangomqondo owodwa.

Ngemuva kokuthi usuqedile ukukhuluma ungathumela i-imeyili imiphumela.

Ukukhuluma nokukhuluma ngokukhuluma

Olunye uhlelo lokusebenza lwe-Android engangikuzama lwaluyi-"Talk And Talk Dictation".

I-interface yalolu hlelo lokusebenza yayiyiyona engcono kakhulu yeqembu futhi ukuqashelwa kwezwi kwasebenza kahle ngempela. Ngemuva kokurekhoda ukucindezela ngakwazi ukwabelana ngemiphumela ngezindlela ezihlukahlukene kubandakanya nge-imeyili.

wamukelekile ku-linux mayelana.com namhlanje esikhuluma ngokuguqula inkulumo ukuze itlolwe

Njengoba ungabona umbhalo ophezulu usho ngokucacile njengoba ungase ulindele ukuthola. Ukukhuluma kancane kuyisihluthulelo.

Isifingqo

I-Linux yaseNative inendlela ethile yokuhamba ngokuqondene nokuqashelwa kwezwi nokucacisa ngokuqondile. Kukhona ezinye izinhlelo zokusebenza ezisebenzisa i-Google Voice API kepha zingakabhaliswa kumafolda.

Izinhlelo zokusebenza ze-ChromeOS zihle kangcono kepha imiphumela emihle kakhulu ifinyelelwe ngokusebenzisa ifoni yami ye-Android. Mhlawumbe ifoni inemikrofoni engcono kakhulu ngakho-ke isofthiwe yokuqaphela izwi inethuba elingcono lokuguqulwa.

Ukuze ukuqashelwa kwezwi kusetshenziswe ngempela kuyadingeka ukuthi kube nokuningi okunembile ngokusetha okuncane okudingekayo. Akufanele udinga ukuzulazula nxazonke ngamamodeli olimi nezichazamazwi ukuze wenze kube lula.

Ngiyazisa ukuthi noma yikuphi ubuciko bokuqashelwa kwezwi kuyinselele kakhulu ngoba wonke umuntu unamazwi ahlukene futhi kunezinkulumo eziningi ezivela esifundeni kuya esifundeni ezweni elilodwa azange azikhathazeke ngamakhulu ezilimi ezisetshenziswe kuwo wonke umhlaba.

Ngakho-ke, ukuhlaziywa kwami ​​kuwukuthi isofthiwe yokuqaphela izwi isasebenza ngokuqhubekayo.