Play YouTube Videos with your Voice in Python 3 (YouTube API v3, Pyaudio, Speech Recognition)

Dhruv Padhiyar
4 min readMay 19, 2020
Photo by Jason Rosewell on Unsplash

“Let your voice be heard”, “The future is in your voice”.

Let me introduce myself, I’m a student at Mumbai University affiliate college K.J. Somaiya Institute of Engineering and Information Technology, I’m currently in the final year of my computer engineering undergraduate degree, I’m a Tech-Enthusiast person and I code most of my programs in Python language that probably tells you many things about me.

A long way from being a prevailing fashion, the mind-boggling achievement of discourse empowered items like Amazon Alexa, Google Assistant, and Jarvis from IronMan has demonstrated that some level of speech recognition that will be a fundamental part of family tech for a long time to come. All things being equal, the reasons for what reason are quite self-evident. Joining discourse acknowledgment into your Python application offers a degree of intelligence and openness that a couple of innovations can coordinate.

So I have created a simple speech recognition algorithm that will play the desired video from YouTube using python3 and YouTube API v3.

So let's begin, for this program need to work we need some of the libraries,

The Google APIs Client Library for Python:

pip install --upgrade google-api-python-client
pip install --upgrade google-auth-oauthlib google-auth-httplib2

Some Python libraries:

pip install pyaudio
pip install SpeechRecognition

And any Video player that supports streaming from URL, I have used VLC for ease.

So Now lets us start with actual code

from pyaudio import PyAudioCHUNK = 1024
FORMAT = 8
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 4
p = PyAudio()stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("Say Something to search on Youtube")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("Searching....")stream.stop_stream()
stream.close()
p.terminate()

The Above code will initialize the microphone and will start the recording for 4 seconds

import waveWAVE_OUTPUT_FILENAME = "output.wav"wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

The Above code will save the recorded audio into a file named output.wav.

from speech_recognition import Recognizer,AudioFile
r = Recognizer()
temp_audio = AudioFile(WAVE_OUTPUT_FILENAME)
with temp_audio as source:
audio = r.record(source)
try:
output = r.recognize_google(audio)
except:
print("Error Try again")

Now we will use the speech recognition library which we have installed earlier and will pass the recorded audio file for speech recognition.

from apiclient.discovery import build
from os import system
DEVELOPER_KEY = "AIXXXXXXXXXX"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey = DEVELOPER_KEY)
search_keyword = youtube.search().list(q=output, part="id,snippet", maxResults = 1).execute()URLS = f"https://www.youtube.com/watch?v={search_keyword['items'][0]['id']['videoId']}"system(f"vlc {URLS} &")

Here you have to replace the DEVELOPER_KEY with your key, You can get the key easily with the following steps:

Step 1: Set up your project and credentials. Create or select a project in the API Console. Complete the following tasks in the API Console for your project:

In the library panel, search for the YouTube Data API v3. Click into the listing for that API and make sure the API is enabled for your project.

In the credentials panel, create two credentials: Create an API key You will use the API key to make API requests that do not require user authorization. For example, you do not need user authorization to retrieve information about a public YouTube channel.

Boom! there you go you have finally created Simple Speech Recognition with Python and played your youtube video.

FULL CODE:

from pyaudio import PyAudio
import wave
from speech_recognition import Recognizer,AudioFile
from apiclient.discovery import build
from os import system
CHUNK = 1024
FORMAT = 8
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 4
WAVE_OUTPUT_FILENAME = "output.wav"
DEVELOPER_KEY = "AIXXXXXXXXXX"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey = DEVELOPER_KEY)

p = PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("Say Something to search on Youtube")frames = []for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("Searching....")stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
r = Recognizer()
temp_audio = AudioFile(WAVE_OUTPUT_FILENAME)
with temp_audio as source:
audio = r.record(source)
try:
output = r.recognize_google(audio)
except:
print("Error Try again")
if "play" in output:
output.replace("play","")
search_keyword = youtube.search().list(q = output, part = "id, snippet",
maxResults = 1).execute()
URLS = f"https://www.youtube.com/watch?v={search_keyword['items'][0]['id']['videoId']}"
print(URLS)
system(f"vlc {URLS} &")

Video for demonstration:

You can follow this project on GitHub at-https://github.com/xaviruvpadhiyar98/YoutubeAutomation

Now it's your turn to create something cool. Go try out YouTube API v3 now. Till then see you next time! Good Luck!

--

--