We are a product development company working on a voice processing application. In this application we want to perform speaker recognition from the audio files user uploads to our backend. Here user is allowed to upload voice recording from our iOS / Android mobile app to the backend application. After receiving the upload, the backend application should compare the voice against existing voice samples of the user and identify whether users voice is there in the newly uploaded audio file. Here is the flow:
1. User create profile for the first time by entering email / mobile, password etc
2. User upload his 4 sample voice files at the time of signup. Each samples files could be 10-15 secs length. These samples voice can be used for comparison when user uploads his voice recordings
3. When user finish signup, application take user to home screen
4. User record a new speech from the application and upload it to server
5. Server should receive the file and validate it
6. After validation is done, it should verify whether user's voice is there in this audio file by comparing this against sample audio files that user uploaded at the time of signup
7. If user's voice is identified in the audio file, we should update it in the database that user's voice is found in it. Then upload the audio file in AWS S3 and send response back to mobile app
This is the process of this functionality. All the registered users in our application should be able to upload their audio file and do this speaker recognition as mentioned above. When multiple users are uploading the audio file at the same time to server, this speaker recognition module should perform comparison without any hiccup. The speaker recognition feature should provide at least 80% accuracy while comparing the voice.
We also tried to use Speaker Recognition API provided by Azure cloud. But the accuracy it provided is really bad. Thats the reason we decided to build this feature by our own.
Our backend application is build with Python Flask framework. The database we used in our application is Postgres and MongoDB. If you can build this speaker recognition module in Python with support of C++ is good with us, as Python have powerful packages to build any kind of mathematical stuff. If you wish to use some other programming language feel free to do so. But we should get accuracy not less than 80%.
Please feel free to ping me for any questions. I can try to clarify your queries.
11 freelancers are bidding on average ₹59454 for this job
I ll use libraries like friture for evaluation of sound and comparison in between. I ll try few libraries and choose the best one according to their accuracies.