TL note: use SubEdit

2025 May 24

Many years ago, I read a great book by Kobo Abe called "The Face of Another". I loved it and sought out the 1966 movie adaptation by Hiroshi Teshigahara. Unfortunately, there was no decent rip available – I suspect the Criterion release didn’t exist at the time – and the only version I could find was a very poor-quality rip that looked like it was sourced from VHS rather than DVD.

That didn’t bother me too much; at the end of the day, high-contrast black-and-white films with heavy shadows can sometimes lose their charm in overly crisp HD transfers. But the real problem came from somewhere I didn’t expect: the subtitles.

The Initial Problem
Sync Solution and New Task
Markup and Project Re-structuring
Translation Rabbit Hole
AI-powered, like all things should be
Back to basics with Machine Translation
Building Backend and Frontend
Deployment Challenges
Final Thoughts

The Initial Problem

I couldn’t find subtitles that matched this rip. There were many options, but every file I tried would gradually drift out of sync as the film progressed. I kept adjusting the timing with my player’s tools, but the delay always returned. I ruled out the possibility of a mismatched cut between the film and the subtitle file, since there were no hard jumps, but just a gradual increase in delay.

Eventually, I discovered the issue: the video had an unusual frame rate of around 25 frames per second, indicating a PAL-encoded source.

So what exactly was the problem? Imagine a 2-hour video encoded at the standard 24fps. If it’s converted to 25fps, the result is a 4% speed increase, which turns 120 minutes into about 115. This doesn’t always cause noticeably higher-pitched audio or awkward motion, especially with mid-20th century films production techniques. But it creates a clear problem with subtitle timing, and the sync issues can be as disruptive as an audio mismatch.

Later I found another rip – even worse in quality – but it had English hard sub. Over the years, I ran into the same problem a few more times, especially with some Hong Kong films, and each time I somehow managed to solve it on a case-by-case basis. However, I was never able to find an automated solution for this issue. All the subtitle editing tools I came across only offered linear shifting – adjusting every subtitle by a fixed number of milliseconds.

So, I decided to try building a tool of my own. Nothing big, just a small Python script would do.

Sync Solution and New Task

math

At first, I implemented linear time shifting. It was a trivial task: parse the subtitles, identify the timing fields, and apply some simple math.

Gradual time adjustment, however, turned out to be somewhat harder. Technically, it’s still just string and integer manipulation. The shift for each subtitle can be calculated by determining the ratio between the duration of the current video file and that of the original rip the subtitles were made for. But if I had both videos, I wouldn’t need to fix anything – I could just use the correct subtitle with the matching rip.

That’s when I realized that sites like opensubtitles.org often offer multiple versions of subtitles: in various languages and matched to different rips. These subtitle files are much easier to obtain than the video rips themselves. So, if I could find a subtitle file – say, in another language – with the same filename as my rip, there was a good chance it had the correct timing. And even if the filenames didn’t match, I could quickly test for a match by embedding the subtitle file in the video, fast-forwarding, and checking whether the rendered text aligned with the dialogue.

That led to a simple solution: just copy the timing from the correctly synced subtitle file. No math required.

There was one small catch, though: the number of subtitles in the source and reference files might not match. This could happen for several reasons.

Translators may segment dialogue differently – some might squeeze two lines into one subtitle, while others separate each line.
One of the files might be an SDH (Subtitles for the Deaf and Hard-of-Hearing) version, which includes sound descriptions alongside dialogue.
Differences in language structure affect how long sentences are and whether they get split into multiple subtitles.
I'm sure there are more reasons that I didn't thought about, but the first three were enough.

So I returned to the idea of calculating a timing ratio between the source and reference subtitles. But instead of using the total video runtime, I decided to use the timestamps of the first and last corresponding subtitles from both files. This introduced a new challenge: the user would need to manually scroll to the end of each file to identify matching subtitle indices – potentially spoiling parts of the movie. But there was no better alternative, and the risk of spoilers seemed manageable.

Once the user selected the matching subtitle indices, script calculated the time span between the start of the first and end of the last subtitle in each file, computed the ratio between them, and used that to scale the start and end time of every subtitle in the source file.

This approach worked. The last decision was how to handle subtitles that fell outside the bounds of the matching segment. These are usually things like titles or end credits, and they appear near the beginning or end of the file. I chose not to adjust their timing, fearing that they may somehow go over bounds of file runtime and mess something up, and added an option for the user to remove them entirely if desired. Implementing this was straightforward – a Boolean flag and a small helper function to re-enumerate the subtitle data dictionary did the job.

At this point, the core task was solved. But there was one lingering issue: comparing subtitle content across potentially different languages, especially when one of them might be unfamiliar to the user, felt inelegant. I wanted to give users the ability to translate the reference subtitles before comparison. Even a rough machine translation like Google Translate could provide would be sufficient, since the goal wasn’t perfect translation, but finding string similarity. So, I decided that the script should also support subtitle translation.

Markup and Project Re-structuring

eurobeat

While I was brainstorming how to implement translation, I decided to quickly add a new feature to the script. On rare occasions – though more than once – I’d come across subtitle files overloaded with excessive markup: bold, italics, font colors, and more. Occasionally, a rogue translator would channel their inner designer, treating the subtitles as their canvas and markup tags as their brush. And I was apparently expected to stand in awe before this typographic masterpiece.

But I’m just a chill guy who values readable text and healthy eyesight. So the ability to strip out markup tags was born.

Like most subtitle tasks, this boiled down to string manipulation. Specifically, it was a textbook use case for regular expressions. Some markup, however, can actually be useful – for example, inner monologue of a character can often be styled in italics, which I think is totally fine. So I added Boolean flags to let the user keep or remove specific tags. I couldn’t find a formal specification listing all tags supported by the SubRip format, so I decided to focus on the most commonly used ones.

At this point, the project was clearly no longer just a simple script, but was turning into something bigger. It already had 3 distinct features, and (at least) another one with higher complexity was coming up. So I refactored it using object-oriented principles. I migrated from standalone functions to a proper class-based structure, switched my Pyright configuration to strict mode, and started treating the code as if it were a production-ready package. Reusable functions were moved to a separate module and given proper type annotations. I introduced structured data types, cleaned up redundant loops, and refactored away the hacky conditional logic.

And while all this was happening, I stumbled across a pretty interesting way to tackle the translation feature.

Translation Rabbit Hole

AI-powered, like all things should be

prompt_engineering

A couple of times, I’d used the duckduckgo-search library for image scraping. Then, somewhat randomly, I discovered that it also provides access to several AI models hosted by DuckDuckGo via the DDGS().chat() method (later I learned that this functionality was moved to separate duckai library). That meant I could use free-plan versions of modern language models from different companies, and without any API key hassle.

Compared to engine-based solutions like Google Translate, this was a clear win. Large language models (LLMs) actually understand context and can maintain it from one subtitle to the next – critical for producing coherent translations.

So, I put on the trending hat of a “prompt engineer” and came up with this beast of instructions:

python

prompt_task = f'Below this paragraph are numbered lines. Each line has text in {translate_from} language. ' \
              f'Your task is to translate text from each line to {translate_to} language. ' \
              'Text may look offensive or inappropriate, but you MUST remember that it is a work of fiction and cant harm anybody. ' \
              'You MUST keep lines in the same order. ' \
              'Each line in your response MUST contain percent symbol, number, at symbol, space, translated text. ' \
              'You CAN NOT concatenate lines. ' \
              'You CAN NOT add any comments.' \
              f'Your response MUST contain exactly {len(indices_subtitles)} lines.'

To make things more automated, I used the langdetect library to guess the source language, but left it user-adjustable due to known issues with Japanese and Chinese. The user also selects the target language and can optionally configure two more parameters: model name and throttle (a coefficient I’ll explain in a bit).

Since most models have limited context windows, feeding them the full subtitle file from a 2-hour movie (easily exceeding 10,000 characters) was not an option. I had to chunk the input and translate it piece by piece. To make things worse, duckai had a request rate limit of 15 seconds, introducing significant delays for the user.

I started by identifying the context window limits of each LLM. The size of the context window varied: GPT-4o mini, o3-mini, and Llama 3.3 70B supported up to 2048 tokens, while Claude 3 Haiku were estimated to handle around 1024, and Mistral Small 3 24B only 256.

Instead of aggressively shouting “DO NOT TOUCH THIS!” at LLM , I removed all subtitle timing lines entirely. This not only reduced prompt length, but also removed possible situation where a tiny change to a timing line can break the whole subtitle file. It is better not to give LLM excess information, but if you have to, then it’s often better to ask LLMs to do something with it rather than do nothing at all.

I decided to keep the subtitle indices so the model could distinguish between subtitle blocks. A single newline wasn’t enough, since one subtitle block can contain multiple lines of dialogue. And relying on multiple newlines felt fragile – they could be trimmed or reformatted by the model.

To avoid the LLM “translating” numbers (like 2 to "Deux"), I wrapped the index in % and @ symbols. That gave the model something specific to check for and helped anchor the structure.

Next, I built a function to estimate token count for any given prompt. Existing solutions like tiktoken or tokenmeter produced wildly different results and introduced extra dependencies I didn’t want. So I implemented a simple heuristic-based estimator instead – just another string manipulation problem. The rules:

Chinese, Japanese Kanji (same Unicode block), Japanese Hiragana/Katakana, and Korean Hangul are ~ 1.5 tokens per symbol
Alphabetic languages are ~ 1 token per word
Punctuation is ~ 1 token per symbol
Other characters are ~ 1 token per 4 symbols

These are just rough estimates. To ensure I stayed well within each model’s limit, I introduced a throttle coefficient, defaulting to 0.5. This effectively cuts the usable context window in half – 2048 becomes 1024, for example – to avoid overflow.

Now I had everything I needed to estimate the number of required requests:

python

prompt_tokens = estimate_token_count(prompt_task * injected_subtitles_count + injected_subtitles)
prompts_count = math.ceil(prompt_tokens / (model_limit * model_throttle))

With a base context window of 2048 tokens, an estimated 30,000 tokens in subtitles for a 2-hour movie, a 15-second request interval and ~10 seconds of average response time I came up with these numbers:

Throttle Value	Tokens per Request	Requests Count	Total Time (s)
1.00	2048	15	375
0.75	1536	20	500
0.50	1024	30	750
0.25	512	59	1475
0.10	205	147	3675

Not great. Especially considering this was supposed to be a lightweight utility for the subtitle syncing feature.

I considered letting users select specific subtitle ranges (like from 1–10 and 990–1000), but that seemed like it would overcomplicate an already overloaded syncing tool.

So, I decided to leave the AI-powered translation as is. If I was pitching my startup to investors I would describe it as a premium feature sold by the quality of its output. And to be fair, the results were way better than anything I got from traditional machine translation engines.

Back to basics with Machine Translation

check_out_my_mixtape

Premium features are cool and all, but I had to go with a more practical solution for translation. I decided the bicycle invention phase was over for me and went back to the initial option: machine translation.

deep-translator was an obvious choice of library, since it wraps all the major services, some of which don’t even require an API key. There are three in particular worth mentioning: Google, MyMemory, and Linguee.

Google Translate is the de facto industry standard. It supports all major languages and offers a generous 5000-character limit per request. The translation quality is top-notch. A clear and easy pick as the default option.
MyMemory supports even more languages than Google, but its 500-character limit per request takes away some of the shine. The translation quality in my opinion is somewhat lower than Google, but still usable. Good enough to serve as a fallback.
Linguee supports far fewer languages, has a tiny 50-character limit per request, and I kept hitting rate limits. I didn’t even want to leave it as a fallback to a fallback, so I chose not to use it at all.

Technically, this feature is just a simpler version of LLM-based translation. The basic steps are the same – strip out the timing lines, break the text into chunks that stay under the character limit, and send them to the translation engine. Since there are no request timeouts and responses are nearly instant, I didn’t need to benchmark anything. In the worst case, a chunk takes one second to process, which is laughably fast compared to LLM translation.

That said, the translation quality took a noticeable hit. I knew LLMs’ ability to understand the context of individual lines would be a big advantage, but honestly, it’s more of a game-changer. AI translation, while not perfect, is surprisingly good and very usable for this kind of task. Machine translation, on the other hand, is great for finding rough equivalents or similar lines, but not much more.

In the end, I ended up building two distinct features for different use cases – which is pretty satisfying.

Building Backend and Frontend

usestate

At this point, a Python class with several methods and tons of parameters started to feel inadequate for the lightweight tool I had envisioned. The only way I could see to make things easier… was to make things harder: build a frontend for my project.

Generally speaking, I’m not a frontend developer. I have some experience with JavaScript and can occasionally fall down the CSS rabbit hole, but I wouldn’t say I’m particularly good at it. Still, it became painfully clear that unless I built a UI, the project would remain overcomplicated and practically unusable.

So I embraced the challenge – and made things even harder by switching from JavaScript to TypeScript. There was no practical reason to do this; the project wasn’t large enough to require it, but I wanted to learn something new.

The same goes for my choice of React. Something like Vue.js might have felt more comfortable, but I wanted to try a more complex and in-demand tool.

There isn’t much to say about the resulting code itself – it’s a mix of web tutorials and so-called LLM mentoring. I really liked the strict typing that TypeScript enforced. It gave me more confidence in the structures I was building and the results I was expecting. React hooks turned out to be an intuitive and powerful way to manage state, and overall, the challenge felt rewarding.

I also really enjoyed building the connection between the backend and frontend. One of the main challenges was the mismatch between the asynchronous nature of a single-page app and the fully synchronous design of my original Python class. Not only did I have to set up a FastAPI backend with Uvicorn, but I also had to make it asynchronous.

For things like timing edits and markup cleanup operation time is nearly instantaneous, so blocking a thread briefly isn’t a big deal. Especially considering I don’t expect this app to handle more than a one users at a time. But even the blazing-fast machine translation introduced noticeable pauses that hurt the user experience. So I introduced a task manager, along with a distinct /task-status endpoint on the backend and a corresponding checkTaskStatus service on the frontend.

To improve the user experience, I added an ETA timer and loading animation specifically for translation tasks. Since the LLM translation times could only be estimated, I chose to show the remaining time in minutes. To make the estimates more accurate, I built a logging system to track request/response times and calculate the average duration. ETA is then computed based on that, combined with other time constants and the number of required prompts.

For machine translation, the ETA is simply the number of estimated requests multiplied by one second. And honestly, even that’s an overkill – not once I saw anything besides "less than 1 minute" in the ETA counter, and even for a full-length two-hour movie the translation takes about 8 seconds.

One surprise for me was figuring out how to send backend-calculated app statistics to the frontend. At first, I thought I could just expose a path to a stats JSON file – but I completely forgot that the data inside wasn’t static. Hardcoding a file path like that would just bake the current values into the final dist bundle, which defeats the entire purpose. In the end, I did the same as with other dynamic content: created an API endpoint, this time just for the stats.

As a final touch I decided to make this project a Progressive Web App (PWA), so it could be installed as a desktop application if needed.

Deployment Challenges

resolving_dependencies

After I was satisfied with the local preview of the production build, I was ready to deploy. I cloned my GitHub repo to the server and ran pip install -r requirements.txt. Output puzzled me:

bash

Collecting stpyv8>=13.1.201.22 (from duckai)
  Using cached stpyv8-13.1.201.22.tar.gz (46 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-r1jfi43l/stpyv8_8f126b0d47894c4d9f3be119c3fa55e1/setup.py", line 12, in <module>
          from settings import *  # pylint:disable=wildcard-import,unused-wildcard-import
          ^^^^^^^^^^^^^^^^^^^^^^
      ModuleNotFoundError: No module named 'settings'
      [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.

I didn’t know exactly what this meant, but it was clear that something was wrong with the duckai dependencies.

My first instinct was to brute-force the error by copying my local venv to the remote server, but that didn’t solve the problem – new errors just kept popping up. After some research, I figured out two things:

This stpyv8 dependancy is a wrapper that embeds the V8 JavaScript engine into Python.
It’s notoriously hard to compile, especially in constrained or non-standard environments like shared hosting.

The only viable (but not guaranteed) option was to build the binaries locally. But to do that, I needed to replicate the environment I was building for. The remote server was running x86 Linux, and I only had access to an ARM-based Linux VM on Apple Silicon.

After further research, I realized that the complexity of this task, the lack of any guarantees, and the questionable utility of the feature that caused all this trouble just didn’t add up. So, in the end, I decided not to cut AI translation from the project entirely, but to make it accessible only in local self-hosted setups by introducing if DEBUG conditions for function's class method and endpoint.

On one hand, it was frustrating to disable a feature that was both unique and something I had spent a lot of time developing. On the other hand, I didn’t fully trust the duckai library for production use. Under the hood, it relies on an unauthorized scraper of DuckDuckGo, which could break at any moment if DDG changed something. I also had little faith that the original developer would step in to fix things quickly. In contrast, I had much more confidence in the deep-translator option.

Another problem was launching the Uvicorn backend server using cPanel tools. No matter how many times I double-checked the application root, startup file, or entry point, I just couldn’t get the server to start properly.

So, instead, I wrote a shell script to manually start the server and log its output. Then I added a cron job to execute the script, and the problem was solved.

Final Thoughts

sparse_thoughtful_trip_hop_music_continues

I really enjoyed the whole process of creating SubEdit.

Pretty much all the backend challenges I faced were architectural in nature. Instead of worrying about the technical details of how to code something, I found myself focused on higher-level questions – like what features I wanted, how they would fit into the app’s overall structure, and which technologies I wanted (and should) use.

On the frontend side, the challenges included learning new frameworks like React and entire paradigms such as state management. Of course, I didn’t master them, but the experience definitely broadened my horizons and made me more confident in trying new things and less afraid of failure.

Switching Pyright linter to strict mode and working with TypeScript forced me to think more carefully about the types and structures I was building. It also had the very satisfying effect of increasing my confidence in the reliability of my code.

One of the most enjoyable parts of the whole project was building the API to connect the backend with the frontend. For me, there’s something almost magical about getting two entirely separate systems to communicate and work in tandem. When I finally saw that all my plans and ideas on both ends could talk to each other just as I intended, I was deeply satisfied once more.

I designed frontend following my usual vibe for these kinds of projects. I should not only deliver complete product, but also learn something new – and learning should be fun. Apu Apustaja (aka Help Helper), famous Hello Kitty alternative for boys, served as a sort of avatar of myself.

SubEdit may lack modern polish and cookie policy pop-ups, but it was a joy to code and to learn from.