Overview

Nyora for Android is a fast, ad-free comic and manga reader for phones and tablets, built end to end by a single developer. The problem it tackles is a real one. Manga is read by tens of millions of people, but a large share of it is only published in Japanese, Chinese, or Korean and never officially translated into English. Nyora's headline engineering achievement closes that gap directly: tap once on a page of foreign-language art and the app finds every speech bubble, translates the text, and paints the translation back onto the artwork exactly where the original words were — in real time, while you can still pinch-zoom and pan the page. Around that feature it adds free cross-device cloud sync, offline downloads, and access to more than 1,100 content sources. The codebase is genuinely production-scale: roughly 1,300 Kotlin files and ~120,000 lines. It reads from a mature open-source parser engine (the Kotatsu engine) as an underlying dependency — the way a web application depends on a database driver — while the original engineering that distinguishes Nyora lives in the translation, rendering, and sync subsystems layered on top.

Technology stack

The app is written in Kotlin and targets Android 6.0 and up (minSdk 23), compiled against SDK 36, so it runs across roughly a decade of devices. The user interface is built almost entirely with the classic Android View system and Material Components / Material You: there are 178 XML layouts, with view binding enabled app-wide. Jetpack Compose — Google's newer declarative UI toolkit — is enabled and used for a handful of newer screens, but the View system remains the bulk of the interface. Objects are wired together with Hilt (a dependency-injection framework built over Dagger); dependency injection is the pattern that lets subsystems stay loosely coupled instead of reaching directly into each other. Local data lives in a Room SQLite database, background jobs run on WorkManager, and all concurrency uses Kotlin Coroutines and Flow. Networking goes through OkHttp, with a small Ktor-based :translator Gradle module. Images render through Coil 3 and SubsamplingScaleImageView (SSIV), a specialised view that displays very large images at smooth zoom without exhausting memory. The 1,100-plus sources are driven by a QuickJS-backed parser engine: each source is effectively a small JavaScript scraper, and quickjs-kt is the embedded engine that runs them. On-device computer vision and translation come from Google ML Kit — separate text recognisers for Japanese, Chinese, Korean, and Latin scripts, plus offline Translate and Language ID — with cloud sync on Supabase and authentication via Google Sign-In. Each choice is deliberate: ML Kit runs offline so translation works without a server round-trip, and Supabase supplies auth plus a serverless backend without operating custom infrastructure.

Architecture

The guiding principle is isolation. Every new capability lives in its own self-contained package — ai/ for translation, sync/supabase/ for cloud sync — alongside the standalone :translator Gradle module, all wired in through Hilt so they compose cleanly with the engine's existing singletons rather than being threaded invasively through inherited code. That discipline is what lets a feature as large as the translation pipeline sit on top of a big codebase without destabilising it.

To picture the system end to end, follow a single tap. A reader opens a Japanese chapter. The source's behaviour is defined by a small JavaScript parser that the QuickJS engine runs to fetch the list of page images; the parser version is pinned in the build to commit 59c033ecfd so every build is reproducible. The page bitmap is handed to MangaTranslator, the orchestrator. It pre-processes the image, runs optical character recognition (OCR — converting pixels of text into actual characters), groups the detected fragments back into whole speech bubbles, then drives a three-stage translation flow. Results stream back through a Kotlin SharedFlow — a broadcast channel the reader subscribes to — and TranslationOverlayView, a custom view layered over the image, paints the translated text onto each bubble and keeps it pinned there as the user zooms. Separately, when a chapter is finished, a SupabaseSyncWorker pushes the new reading position to the cloud so the same progress appears on the user's other devices.

Hard problems solved

Translation that appears instantly, then quietly gets better

The problem. A good translation of a comic bubble — one that reads naturally and carries the right tone — needs a large language model (LLM), the kind of AI system behind tools like ChatGPT. An LLM call takes seconds. If the reader has to stare at a blank page for several seconds each time, the feature is unusable. Why the obvious approach fails. A single-pass pipeline — OCR, then call the best translator, then show the result — forces the user to wait for the slowest stage on every page. The solution. MangaTranslator runs a staged state machine, TRANSLATING → MT → REFINED, over a per-chapter MutableSharedFlow. The instant OCR finishes, draft bubbles appear. A fast tier (offline ML Kit machine translation, abbreviated MT) fills in usable text within a frame or two. A background LLM tier later replaces it with polished, context-aware dialogue. Each bubble upgrades independently. A single ordering guard — update.state.ordinal >= block.state.ordinal — ensures a slower, lower-quality tier can never overwrite a better result that has already landed. Why it works. The reader sees text almost immediately and watches it sharpen, rather than waiting on the slowest path. Three ConcurrentHashMap caches — refined text, MT text, and page blocks, all keyed by chapter — make re-opening any page instant and the whole operation idempotent.

Reading text off a page when you don't know its language

The problem. ML Kit ships a separate recogniser for each script: one for Japanese, one for Chinese, one for Korean, one for Latin. They are mutually exclusive. But you cannot know a page's language before you read it, and many pages mix scripts. Why the obvious approach fails. Guessing the language first and then running a single recogniser gets the guess wrong on exactly the ambiguous pages that matter most, producing garbage output. The solution. OcrProvider runs all four recognisers in parallel on every page (via Kotlin async) and scores each result with a CJK-weighted heuristic — text length plus a bonus for every Chinese, Japanese, or Korean codepoint — so a Latin-only reading can never win on a page that is actually Japanese. Before any of that, each page is pre-processed to make text legible to the recogniser: upscaled 1.5x, desaturated, and contrast-stretched 1.8x through an Android ColorMatrix, so characters separate cleanly from the grey screentone dots manga uses for shading. One subtle detail matters here: hardware-accelerated bitmaps are first copied to a software configuration, because they cannot be drawn onto the software canvas the filter requires. Why it works. The highest-scoring recogniser wins per page, so the system adapts to whatever language it is handed without a fragile up-front guess.

Rebuilding speech bubbles out of fragments

The problem. OCR does not return clean bubbles. It returns dozens of disconnected text boxes, and a single line of dialogue is often shattered across several of them. Why the obvious approach fails. Translating each fragment on its own produces word-salad: the translator never sees a complete sentence, so it cannot produce a grammatical one. The solution. mergeBlocksIntoBubbles clusters the raw boxes with a geometry-aware proximity test that weighs horizontal and vertical overlap and the gaps between boxes relative to their size (with a centre-distance fallback), then re-orders the merged fragments top-to-bottom and unions their bounding rectangles into one bubble. Why it works. The MT and LLM stages now receive coherent, complete dialogue units — the difference between grammatical localisation and gibberish. This also feeds a small narrative-context tracker, StoryBrain, which carries running context across the chapter so the LLM keeps character names and tone consistent from page to page.

Holding 60fps while lettering a live, zooming page

The problem. The translated text must sit exactly on each bubble and stay pinned there as the user pinch-zooms and pans — and it must do so at 60 frames per second, which leaves roughly 16 milliseconds to draw each frame. Why the obvious approach fails. Re-measuring text, choosing font sizes, and re-laying-out paragraphs on every frame blows that budget, and the reader stutters. The solution. TranslationOverlayView does zero layout work per frame. Once per bubble it caches the expensive results: the binary-searched StaticLayout (font size fit between 10 and 42sp so the text fills the bubble without overflowing), the expanded background rectangle, and a text and shadow colour chosen from the page's actual luminance — all keyed by block id plus state plus a hash of the text. Then onDraw reads SSIV's current scale and its sourceToViewCoord(0,0) anchor exactly once, applies a single canvas translate-and-scale, and paints all backgrounds in one pass and all text in a second. Why it works. The per-frame cost collapses to cheap cached draws under one transform, so the overlay tracks continuous zoom and pan without dropping frames.

Respecting a rate-limited LLM without stalling the reader

The problem. The refinement LLM is metered at roughly 40 requests per minute. Fire one request per bubble and the quota is exhausted in seconds, after which the service throttles you. Why the obvious approach fails. Per-bubble calls are both rate-limited into failure and needlessly slow, since each request carries fixed overhead. The solution. Refinement is debounced and batched. A 1,200ms debounce window (BATCH_DEBOUNCE_MS) collects work; dialogues are chunked up to MAX_BATCH_DIALOGUES = 12 per request; and a deliberate 1.5-second pacing delay between preload chunks keeps the app comfortably under the 40-RPM ceiling. The prompt also instructs the model to reassemble sentences that CJK languages split across multiple bubbles. Why it works. Far fewer, larger requests stay inside the quota while still streaming polished text back per bubble — and the shared batch context actually improves translation quality, because the model sees neighbouring lines at once.

Adding free cross-device sync to an engine that had none

The problem. The underlying engine stores everything locally, with no concept of a user account or the cloud, so a reader's library and reading position are trapped on one device. Why the obvious approach fails. Building a full custom authentication-and-sync backend from scratch is a large, error-prone surface to own and operate. The solution. Google Sign-In returns an id_token (a signed proof of identity) that Nyora exchanges at Supabase's grant_type=id_token endpoint for session tokens; the user id is parsed directly from the returned access token's JWT. A single Supabase Edge Function at /functions/v1/nyora-sync performs delta push and pull of favourites, categories, history, bookmarks, and exact reading progress, and SupabaseSyncWorker schedules it through WorkManager. Switching accounts resets the sync watermark for a clean snapshot, and credentials are injected at build time with a baked fallback. Why it works. The same backend contract is shared verbatim with five sibling Nyora apps, so a chapter started on Android resumes mid-page on web, iOS, or desktop.

Engineering highlights

State-machine-driven, flicker-free progressive translation across three quality tiers, with an ordering guard that prevents a slow tier from clobbering a better result.
Parallel four-recogniser OCR ensemble with CJK-weighted scoring and a screentone-aware pre-processing pass (1.5x upscale, desaturate, 1.8x contrast).
Geometry-based bubble reconstruction that turns scattered OCR fragments into coherent dialogue, feeding context-aware LLM localisation backed by a per-chapter StoryBrain tracker.
An allocation-light custom overlay renderer that caches all layout and paints under a single canvas transform to hold 60fps during continuous zoom and pan.
Debounced, 12-per-batch LLM refinement with explicit 1.5-second pacing to stay inside a 40-RPM quota.
Cross-platform sync via Google id_token exchanged for a Supabase session and a delta-sync Edge Function, shared verbatim with five sibling apps.
Clean Hilt-wired modularisation (ai/, sync/supabase/, the :translator module) layered onto a ~120,000-line codebase with a build-pinned parser-engine commit for reproducibility.

What this demonstrates

This is large-codebase ownership at a senior level: integrating computer vision, on-device machine learning, real-time custom rendering, and a cloud backend into a ~120,000-line Android codebase built on an upstream open-source engine — without destabilising it — and decomposing a hard, latency-sensitive feature into independently-converging stages so the user never waits on the slowest one. It shows fluency across the full Android surface: the View system, Coroutines and Flow, Hilt, Room, WorkManager, and hand-written View.onDraw rendering. It shows comfort with the performance discipline of a 16-millisecond frame budget, and the judgment to keep new subsystems isolated and testable. It also reflects a practitioner's habit of verifying on real hardware via :app:installDebug rather than assuming that code which compiles is code that works.

Stack