Skip to content
All case studies

Nyora for macOS

A native SwiftUI Mac reader that translates whole pages of Japanese, Chinese, and Korean comics on-device, while driving a headless Kotlin engine as a private background service over a local-only API.

Stack

  • Swift 6 (strict concurrency, language mode v6)
  • SwiftUI (macOS 15+ native front end)
  • Swift Package Manager (executable target)
  • Apple Vision (VNRecognizeTextRequest + RecognizeDocumentsRequest)
  • Core Image (CIFilter preprocessing pipeline)
  • FoundationModels (on-device Apple Intelligence, macOS 26+)
  • Kotlin Multiplatform / Kotlin-JVM engine (nyora-shared submodule)
  • GraalVM polyglot (GraalJS) for JS source parsers
  • SQLDelight + OkHttp + Jsoup (JVM helper)
  • C dyld interpose shim (ConcurrencyShim)
  • WKWebView (Cloudflare/Turnstile solver)
  • Bundled Adoptium Temurin 17 JRE
  • create-dmg + ad-hoc codesign + Homebrew cask

Overview

Nyora for macOS is a native comic reader built from scratch in SwiftUI for Apple Silicon Macs. It connects to hundreds of online catalogues, reads in a dedicated page-and-webtoon viewer, downloads chapters for offline use, and keeps a single library in sync across six platforms. Its standout feature is whole-page translation. Press one keyboard shortcut on a page of untranslated Japanese, Chinese, or Korean comic art, and the app finds the text, translates it, and shows the result next to the original artwork. The step that reads the text from the image runs entirely on the user's own Mac, so the page itself never leaves the device for that stage.

The hardest engineering here is not the catalogue browser. It is that this single application is two completely different software runtimes stitched into one product. The part the user sees is a modern Swift 6 / SwiftUI front end. Behind it runs a headless Kotlin engine — the same catalogue engine that powers Nyora's Windows, Linux, Android, iOS, and web builds — which the Mac app launches and controls as a private background process. Layered on top is an on-device machine-vision pipeline tuned for a problem that off-the-shelf text recognition does not solve well: reading the stylised, vertically-set Japanese packed inside comic speech bubbles.

Technology stack

The front end is about 21,500 lines of Swift 6 across 47 files, built with Swift Package Manager and compiled in strict-concurrency language mode v6. That mode is the Swift compiler's most demanding setting: it refuses to build code that could let two threads touch the same data at once, eliminating a whole class of crashes before the app ever runs. It targets macOS 15 and later and is built on SwiftUI, Apple's modern declarative UI framework, so the app behaves like a native Mac application rather than a ported window.

The translation pipeline is built on four Apple technologies plus an optional language model. Apple Vision (its VNRecognizeTextRequest API and newer document recogniser) detects and reads text in images. A Core Image chain cleans each page first. A bundled MangaOCR model converted to Core ML — Apple's on-device machine-learning format — handles vertical Japanese that Vision struggles with. An optional final pass through Apple Intelligence (Apple's FoundationModels) or a user-supplied language-model key polishes the wording. Vision and Core ML were chosen because they run on the Mac's Neural Engine and GPU with no server and no per-page cost. That is what turns "your reading never leaves your machine" into a real guarantee rather than a marketing line.

The catalogue engine lives in the nyora-shared Kotlin Multiplatform module — the single shared codebase reused by every Nyora platform. It runs each site's JavaScript parser through GraalVM (GraalJS), a JavaScript runtime that runs on the Java virtual machine (JVM), backed by OkHttp for networking, Jsoup for HTML parsing, and SQLDelight for the local database. This layer is the open-source parser engine Nyora builds on, packaged here as a single fat JAR and shipped with a bundled Adoptium Temurin 17 Java runtime so the user never needs to install Java. The app is distributed as a roughly 100 MB ad-hoc-signed .dmg and a Homebrew cask.

Architecture

The catalogue engine is Kotlin and depends on GraalJS, OkHttp, and Jsoup — none of which can be called directly from Swift. Rewriting it in Swift would mean maintaining a second, drifting copy of logic that already works on five other platforms. So Nyora instead runs the Kotlin engine as a headless JVM sidecar: a genuine background process with no window of its own, owned entirely by the Mac app.

On launch, a Swift component called HelperLauncher finds a Java runtime — preferring the one bundled inside the .app — starts the helper, and waits for it to claim a random local network port and write that port number to a file. A second component, NyoraHelperBridge, then polls a /health endpoint until the engine reports ready, after which it drives the engine through 66 REST endpoints on 127.0.0.1, the loopback address that stays inside the machine and never reaches the network.

Following one real action end to end makes the design concrete. A user searches for a title. SwiftUI sends an HTTP request to the local helper; the helper runs that source's JavaScript parser inside GraalJS, fetches and scrapes the site with OkHttp and Jsoup, and returns clean JSON. The user opens a chapter; the bridge asks for the page list, and the images come back through an image-proxy endpoint that re-attaches the headers each source expects (such as Referer and User-Agent) so hotlink-protected images actually load. The user presses translate; that work runs in-process on the Swift side, inside the roughly 4,200-line vision-and-AI subsystem, fully independent of the engine. The result is one canonical parser layer shared by all six platforms, with a Mac front end that still feels entirely native.

Hard problems solved

Vertical Japanese that general-purpose OCR can't read

Japanese comics set their text vertically — top to bottom, columns running right to left (tategaki) — in stylised fonts crammed into speech bubbles. Optical character recognition (OCR), the technology that turns text in an image into machine-readable characters, is trained mostly on horizontal lines. General OCR, Apple Vision included, often returns nothing at all for a vertical bubble, so the region is lost before translation even starts. "Just run Vision" therefore fails precisely where comics need it most.

The solution is a rotated, multi-recogniser ensemble in a 1,570-line OcrProvider component paired with a purpose-built model. Each page is first cleaned in Core Image: adaptive upscaling, denoising to remove the dotted screentone shading that confuses OCR, grayscale conversion, contrast, and sharpening — with a gentler version of the chain for small crops so the thin strokes of katakana don't dissolve. Vision then runs as a four-language ensemble (Japanese, Simplified Chinese, Korean, English) to catch horizontal and Latin text. In parallel, the bundled MangaOCR Core ML model, built specifically for vertical bubble lettering, reads the page through an overlapping 4×5 grid of tiles so no bubble is cut by a seam. The two result sets are merged and de-duplicated by how much their bounding boxes overlap (intersection-over-union, IoU) and by text similarity, then filtered to strip repetition, scanlator credits, and stray numbers. It works because each recogniser covers the other's blind spot, and the geometry-aware merge keeps everything they collectively found without counting any bubble twice.

The thread-starvation deadlock under brute force

The highest-quality OCR tier brute-forces stubborn bubbles. It tries several tonal treatments (raw, black-and-white, inverted) crossed with several rotations — up to two dozen recognition attempts on a single bubble. Fired off naively, that froze the whole application.

The first implementation used Grand Central Dispatch (GCD), Apple's standard concurrency system, with a semaphore: each attempt seized a real worker thread and then blocked it, sitting idle while it waited for Vision to finish. GCD has a soft ceiling of about 64 worker threads. With hundreds of blocked attempts queued, every thread was held hostage by a waiting task, and no task could finish to free one — a classic deadlock, confirmed by sampling the hung process. The fix has two parts. First, replace the semaphore with an OperationQueue capped at 8 concurrent operations; queued work consumes no thread at all until it actually begins, so the thread pool is never drained. Second, front the entire grid with a 64-bit perceptual-hash (dHash) de-duplicator that collapses near-identical image variants: when "binarize" and "invert" happen to produce visually equivalent crops, they hash within a tuned Hamming distance and only one Vision call runs. A CJK early-stop and a 500-entry crop cache then make re-translating a page instant. It works because the real bottleneck was thread occupancy, not raw CPU, and the dedup step removes redundant work before it is ever scheduled.

A macOS 26 runtime crash that made the app unlaunchable

On the newest macOS, compiled with the newest SDK, the app crashed on launch every single time — inside Apple's own framework code, before any Nyora logic ran. SwiftUI constantly checks "am I on the main thread?" to keep UI updates safe. On that specific OS-and-SDK combination, the object representing the main thread carried corrupted metadata, so every one of those checks crashed.

You cannot patch Apple's frameworks, and waiting for an Apple fix is not a way to ship. The solution is a 60-line C component, ConcurrencyShim, that installs a dyld __interpose table — a documented mechanism for substituting one function for another at load time. It swaps the two broken Swift-runtime checks for a simple, correct one: pthread_main_np(), which asks the operating system directly whether the current thread is the main thread. dyld rewires those call sites inside Apple's own libraries before any Swift code runs, with no changes to application code. It works because the replacement answers the exact same question through a lower, uncorrupted path, and the main actor is, by definition, always the main thread.

Cloudflare challenges from a headless fetcher

Many sources sit behind Cloudflare, which serves a JavaScript challenge page to anything that looks like a bot. The engine's OkHttp client is a pure HTTP fetcher: it cannot run that challenge's JavaScript, so it receives the challenge instead of the content. The solution is a two-phase WKWebView solver on the Swift side. A hidden, off-screen web view silently clears ordinary JavaScript challenges. For interactive "managed" challenges (Turnstile) that demand a human click, a real window appears, the user clicks once, and the resulting cf_clearance cookie is captured and handed back to the engine. Critically, the User-Agent string is pinned identically across the Swift web view and the Kotlin OkHttp client, because the clearance cookie is bound to the exact User-Agent that earned it — any mismatch silently voids it.

Cross-platform sync without corrupting the cloud

Sync has to merge one user's library across six apps without any device clobbering another's data. During development, the Mac helper started overwriting cloud records with placeholder values: it was pushing every local row, including restored entries whose source metadata couldn't be decoded, and stamping them Unknown. A documented investigation traced several distinct faults — request bodies built as untyped maps that Kotlin serialization can't encode at runtime, pull data-transfer objects that needed user_id treated as write-only, and restored history rows being re-pushed with local timestamps instead of the original cloud ones. The fix added a sync-specific upsert that preserves cloud timestamps and never re-pushes pulled rows, plus a guard that refuses to push any manga whose source reference is unresolved. A device that cannot fully decode a record now stays silent rather than degrading the shared copy.

Engineering highlights

  • A native SwiftUI front end (~21,500 lines, 47 files) driving a headless Kotlin/JVM engine over 66 loopback REST endpoints — one canonical parser layer reused across all six Nyora platforms.
  • An on-device CJK OCR pipeline (~4,200 lines) fusing an Apple Vision four-language ensemble with a bundled MangaOCR Core ML model over an overlapping 4×5 tile grid, merged by IoU and text-similarity dedup.
  • Diagnosed and fixed a GCD thread-starvation deadlock by moving to a bounded OperationQueue plus 64-bit perceptual-hash dedup, with the post-mortem kept in the source tree.
  • Shipped a 60-line C dyld-interpose shim that works around a macOS 26 Swift-runtime crash with no application-code changes — runtime-level systems debugging.
  • Built a two-phase WKWebView Cloudflare solver with User-Agent-pinned cookie hand-off between the Swift web view and the Kotlin OkHttp client.
  • Authored a reproducible build-dmg.sh: cached JRE download, release build, deep ad-hoc codesign with allow-jit and disable-library-validation entitlements so Metal JIT runs under an ad-hoc signature, a hidpi DMG layout, and a Homebrew cask with --no-quarantine.
  • Concurrency-safe by construction: OCR, translation, and engine I/O are Swift actors compiled under language mode v6, the compiler's strictest data-race setting.

What this demonstrates

This build shows fluency across boundaries most engineers never cross in a single project: SwiftUI and strict Swift 6 concurrency; the Apple Vision / Core ML / FoundationModels machine-vision stack tuned for a genuinely hard CJK OCR problem; cross-runtime systems design, where a native app launches and orchestrates a JVM/GraalVM engine over a local-only API; low-level runtime patching with dyld interpose; careful distributed-sync reasoning to avoid corrupting shared data; and the full unsigned-distribution toolchain for macOS, from entitlements to Homebrew. As much as the code, it demonstrates judgment. Every hard decision — the 8-wide queue, the Hamming threshold, the executor shim, the sync guard — is documented in the source tree alongside the measurements and crash reports that justified it.