Make an initial request to get the auth cookies

As a music lover and full-stack developer, I often watch music videos on YouTube. However, as a non-native English speaker, I sometimes struggle to make out all the lyrics, especially for fast-paced raps or songs with heavy accents. I thought it would be amazing if YouTube music videos had built-in lyric subtitles, like a karaoke feature, so I could fully appreciate the poetic wordplay while watching the visual storytelling.

I scoured the internet for sites providing timed lyrics that could be used as subtitles, but came up empty. Then a friend recommended I check out the Musixmatch Chrome extension, which displays scrolling lyrics on YouTube music videos. Perfect! Well, almost – Musixmatch didn‘t allow exporting the lyrics as subtitle files. Time to put on my hacker hat and dive into some browser-based reverse engineering.

Inspecting Musixmatch‘s Network Requests

I fired up the Chrome developer tools, switched to the Network tab, and started playing a music video. I knew the lyric data must be coming from Musixmatch‘s servers, so I typed "musix" into the filter bar. Lo and behold, I spotted a request to a Musixmatch API endpoint returning the fully timed lyrics in a custom text format.

The request URL looked something like this:

https://apic.musixmatch.com/ws/1.1/macro.subtitles.get?format=json&q_track=https://www.youtube.com/watch?v=ABC123&user_language=en&utm_source=youtube&...

Upon closer inspection, I realized the URL contained two key parameters:

  • q_track: The YouTube video URL
  • user_language: The language of the lyrics (e.g. "en" for English)

Most of the other parameters looked like typical web analytics fodder. However, the URL also contained a very long, random-looking signature parameter. Hmm…

Unraveling Musixmatch‘s Signed Requests

I tried opening the raw API URL directly, but was greeted with an empty response. The plot thickens. On a hunch, I pasted the URL into an incognito window. Empty again. This usually points to authentication cookies being involved.

Inspecting the API request headers confirmed my suspicion – there was a Cookie header with some Musixmatch-specific values. These cookies seemed to be set whenever you loaded a page with the Musixmatch extension active.

To unravel the signature parameter, I made API calls for a few different videos and compared the URLs. I noticed the signature always contained 16 integers interspersed with random letters, like:

90rt120b114xz70xv82w85vv90a94hn90vb102av86

After staring at these signatures for longer than I‘d like to admit, I had a breakthrough. The letters were just a red herring! If I extracted just the integers, I got:

90 120 114 70 82 85 90 94 90 102 86

Exactly 11 numbers. What else contains 11 characters? YouTube video IDs! For example, oHg5SJYRHA0. On a hunch, I converted the video ID to ASCII codes:

111 72 103 53 83 74 89 82 72 65 48

Not an exact match to the decoded signature, but awful close. Turns out Musixmatch doesn‘t use the video ID directly, but performs some fixed arithmetic transforms I was able to figure out after a few more minutes of pattern matching. I had cracked the case!

Writing the Lyrics Downloader Script

Now that I knew how to forge my own API requests, it was time to write a Python script to automate the following:

  1. Open a Musixmatch URL to extract the authentication cookies
  2. Perform the arithmetic to convert a video ID into the signature
  3. Build the full API URL with all the required parameters
  4. Make the API request with the cookies and parse the response into the standard SRT subtitle format

Here‘s the key snippet for making the authenticated API request:

import requests

session = requests.Session() session.get("https://www.musixmatch.com")

cookies = session.cookies.get_dict()

video_id = "oHg5SJYRHA0" signature = "".join([str(ord(char)) for char in video_id])

url = f"https://apic.musixmatch.com/ws/1.1/macro.subtitles.get?format=json&q_track=https://www.youtube.com/watch?v={video_id}&user_language=en&signature={signature}&..."

response = session.get(url, cookies=cookies)

lyrics = response.json()["message"]["body"]["macro_calls"]["track.subtitles.get"]["message"]["body"]

The last step was parsing the custom Musixmatch lyrics format into the standard SRT format using Python‘s built-in re regular expressions library. I‘ll spare you the hairy regex details, but if you‘ve ever used re.findall(), you know it‘s possible to parse just about any text format into JSON with enough persistence.

Building a Minimal Web UI with Flask

I now had a functioning Python script that, when given a YouTube video ID, would download and save the lyrics as an SRT file. Yay! But what if I wanted to make it a web app for others to use?

Enter Flask – a lightweight Python web framework perfect for single-page apps. I whipped up a minimal UI with an input box for pasting YouTube video IDs/URLs and a download button for spitting out the SRT file.

The key parts of the Flask app:

from flask import Flask, request, Response

app = Flask(name)

@app.route("/") def home(): return """ <form method="POST" action="/download"> <input type="text" name="video_id" placeholder="Enter YouTube Video ID or URL"> <button type="submit">Download Lyrics</button> </form> """

@app.route("/download", methods=["POST"])
def download(): video_id = request.form["video_id"] subtitles = get_lyrics(video_id) # The Python function we wrote earlier

return Response(
    subtitles,
    mimetype="text/plain",
    headers={
        "Content-disposition": f"attachment; filename={video_id}.srt"
    }
)

I‘ve omitted the full HTML/CSS for brevity, but even without any styling, this tiny Flask app does the job. Punch in a video ID or URL, smash that download button, and you‘ve got karaoke-ready lyrics in a flash!

Closing Thoughts

Let me be clear – this reverse engineering exercise was done purely for educational purposes. I have no intention of stealing Musixmatch‘s hard work or monetizing this tool. I simply wanted to peek under the hood and learn how their Chrome extension worked its lyrical magic.

By intercepting network requests, analyzing URL parameters, dissecting response formats, and stitching it all together in a web app, I gained a newfound appreciation for the work that goes into building robust, authenticated APIs and browser extensions. Reverse engineering is a powerful learning tool for any developer looking to level up their web tinkering skills.

So go forth, aspiring hacker, and see what other extensions you can dissect! Just remember to use your newfound superpowers for good and not evil. Happy coding!

Similar Posts