Showing posts with label google summer of code. Show all posts
Showing posts with label google summer of code. Show all posts

Wednesday, July 16, 2014

Back to India for Patch Review


Back to India and work !

After having a word with the mentor, I have already submitted a raw patch, so that it can be tested by others and I can get some early feedback.
You can find the patch here.

Regarding the to do's,

-> Changing the volume, pitch and rate is working. For the volume part, I had to modify the nsISpeechService.idl so that we can also pass volume parameter to the speak() function.

-> the boundary events give a correct timestamp now. I have used a window's in-built function for this.

-> I have tested the Pause and the Resume button. They are working perfectly.

Aim for the next week :

-> To test the cancel functionality

-> To make Sapi concurrent, so that we can have two browser windows using it independently.

Both of these will require a lot of testing.

The plan is to complete the work for windows in next two weeks, so that I can jump to the next platform.

Hopefully, by the next post, I would have completed all the major functionalities.

Stay tuned !

And, I have received the mid term payment :)

Wednesday, June 25, 2014

Mid-term is here !


After running the first version of the Sapi Service, it has been comparatively less challenging to add more functions to it. I have added the following this week :-

-> speak function is working for each call now.

-> I have added the following events support -> start/end of speech, word and sentence boundary.

-> The stop/pause/resume/interrupt functions for the speech.

-> Selecting a specific voice for the speech

These all functions have been tested here : http://itsyash.github.io/webspeechdemos/

The to-do’s for the next week would be:



Adding the support to change properties of the speech such as Pitch, Rate and Volume.
And then, after testing each functionality thoroughly, submit the patch for reviewing.

This week is also the mid term week. I am running a bit behind my schedule, but I have almost completed the Speech Adapter for windows, that was aimed for the mid term evaluation. The delay has been due to the time invested in learning the windows api.


And.., Mount Fuji awaits this weekend. Will update more in the next post.
P.S. Tokyo has awesome weather.

Stay tuned for more :)

Sunday, June 15, 2014

Tokyo, Mac and Mozilla !


Yes, I am in Tokyo currently and will be staying till July. This is a place where you can get Beer from the vending machines on the streets :D
I have been between places for the past few days, so have not been regular in posting the updates on my project. But I promise I will be regular from now.

Coming to the project,  after spending hours on it, I have finally managed to make significant progress now. It has been quite challenging to dive simultaneously into the completely new windows APIs and the Mozilla codebase and then, integrating the two of them. But the support of this awesome community is what drives me to get through every hurdle I get.

I have a very basic version of Sapi Synthesis Service running on my system :D .

The current status of the service is that it supports speak() and getVoices() (details about each voice) methods (and some other minor methods also) The speak() method right now, works only for the first time, as I haven't implemented nsISpeechTask and nsISpeechTaskCallBack interfaces right now.
That would be the first to do in the coming week.

I have also committed all the functionalities of speech api to my Sapi Git repository here , for the reference.

Next to do's would be:
-> completely working speak() function
-> handling the pitch, rate etc. qualities
-> adding the tts events.

Also, I would be soon be writing about the my experience in writing the code till now and the difficulties that I have been facing, so that it serves as a good reference to others.

Keep an eye out for the next post :)

Monday, May 12, 2014

Mozilla on the New Mac :)

Hi,

After going through the Pico Service, the previous week had two major tasks :

> Testing the pico on my Linux Machine

> Setting up SAPI on the windows

Regarding the first part, I have tested Pico on my linux. Here are the steps if anyone wants to try :
Install Pico on your linux. It's a simple one-liner. After that

1. Set the LD_LIBRARY_PATH environment variable to wherever your libttspico.so is.
2. Set the PICO_LANG_PATH environment variable to wherever the language files are.

3. Add "ac_add_options --enable-synth-pico" to your .mozconfig, in the root directory.
4. Build the firefox again and then, set "media.webspeech.synth.enabled" to true in about:config.
This preference is present by default in the firefox and you have to create it. I have created Bug 1007834 so that the preference is present by default, and it will soon be included.


So, these are the steps that you need to follow. After that, you can test the api here.
You can also test this page with other browsers who have tts, and file bugs if they have any.
The second part was to get started with SAPI, Microsoft Speech API. The API provides a high-level interface between the application and speech engines. It implements all the low-level details needed to control and manage the real-time operations of various speech engines.
So, I began with setting up Sapi on my Windows 8 and have started with some basic programs to get familiar with sapi, which I'll be updating here : MS-SAPI-demo.
I haven't done any windows programming till now so it will take some extra effort and time to get familiar with COM (Component Object Model) and Win32 API, which is needed to get the most out of this api.

That's it for this week, and btw ! I got myself a new Mac :D

Cheers ! :)


Wednesday, May 7, 2014

Preparing to Code !


The current status is that, we already have the Pico Speech Synthesis Service on Gonk, i.e. the Firefox OS devices already have the synthesis via the pico engine.

After discussing with my mentor, Eitan Isaacson, my first step in the project, was to study the implementation of Pico service, to get inspiration for the future work.

After spending some time on the service, I have understood the basic workflow of the process. In this post, I would be explaining, or rather documenting the same, so that it helps in the future.


For any OS, Desktop support should be implemented as nsISpeechService and nsISpeechTaskCalback. When speak() is called on that interface, it is provided with a nsISpeechTask object that has methods for doing all the things that we would want to do. 

Following are the Pico specific classes:

nsPicoService :> our main service, subclasses nsISpeechService.


PicoCallbackRunnable :> a runnable class that subclasses nsISpeechTaskCallback.

Helper Classes : 


PicoApi :> acts as a wrapper for us and directly interacts with the pico library.

PicoVoice :> handles the voices


PicoSynthDataRunnable :> a runnable class that is used to send the synthesized data back to the browser.


The functions of the all the classes are defined in the following workflow:

The browser calls the speak method of the nsPicoService, with a reference to the nsISpeechTask object, along with four other parameters : text to utter, a unique voice identifier, rate to speak voice and the pitch.


The service then instantiates a new PicoCallbackRunnable object by pasing all these parameters, along with itself, and obtains a reference to that object.

Then, the PicoCallbackRunnable is executed on a new worker thread. In this process, all the text is fed to the engine in buffers of specified size and the output data from the engine is received in chunks.


These chunks are then sent to the DispatchSynthDataRunnable method. This method implements PicoSynthDataRunnable class.
This runnable is then executed on the main thread again, and it sends the synthesized data back to browser, using the functions of the nsISpeechTask object, passed to it.     


The nsPicoService is also used for other utility functions, such as, to initialize the PicoApi class, to register the voices and to load/unload the pico engine(this is done via PicoApi class ofcourse).

This is it for this week. Next week, my aim would be to test the Pico service and with that, start with the windows.

Cheers !

Wednesday, April 30, 2014

Defining the work in bugzilla



As one of the first steps, I have created the following bugs, to keep record of what I'll be covering in the project.



Bug #1003439 ==> A meta-bug for the Desktop TTS, that will keep track of the development on all the three OSes.


I'll be starting with the windows platform.  Bug #1003457 will keep track of the development on Windows
I plan to start working from 5th/6th May.

After I am done with windows, I plan to move on to Mac and then, if time permits, Linux.

Bug #1003452 and Bug #1003464 will keep track of the development on Mac and Linux, respectively.

Before starting on windows. I'll be studying the current Pico implementation.

Tuesday, April 29, 2014

Summer of Code with Mozilla !!


Big news is that I have been selected as a Google Summer of Code Intern with Mozilla this year :D


My work includes, adding the Text to Speech API on firefox for Desktop OSes. You can refer here for more details on the project,


I’ll be blogging about my progress here, so look out if you’re interested.