MetaWriter: Personalized HTR with Prompt Tuning

MetaWriter

Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning

Wenhao Gu¹, Li Gu¹, Ching Yee Suen¹, Yang Wang¹

¹Dept. of Computer Science & Software Engineering, Concordia University

Abstract

Recent advancements in handwritten text recognition (HTR) have enabled the effective conversion of handwritten text to digital formats. However, achieving robust recognition across diverse writing styles remains challenging. Traditional HTR methods lack writer-specific personalization at test time due to limitations in model architecture and training strategies. Existing attempts to bridge this gap, through gradient-based meta-learning, still require labeled examples and suffer from parameter-inefficient fine-tuning, leading to substantial computational and memory overhead.

To overcome these challenges, we propose an efficient framework that formulates personalization as prompt tuning, incorporating an auxiliary image reconstruction task with a self-supervised loss to guide prompt adaptation with unlabeled test-time examples. To ensure self-supervised loss effectively minimizes text recognition error, we leverage meta-learning to learn the optimal initialization of the prompts. As a result, our method allows the model to efficiently capture unique writing styles by updating less than 1% of its parameters and eliminating the need for time-intensive annotation processes.

We validate our approach on the RIMES and IAM Handwriting Database benchmarks, where it consistently outperforms previous state-of-the-art methods while using 20× fewer parameters. We believe this represents a significant advancement in personalized handwritten text recognition, paving the way for more reliable and practical deployment in resource-constrained scenarios.

Challenges in HTR

The left panel shows examples of handwritten text from different writers, highlighting variations in letter shapes, spacing, and stroke patterns. The right panel presents the model's predictions for each example, demonstrating the difficulty in accurately recognizing diverse handwriting styles.

Method Overview

Our framework during training: The handwritten texts from a specific writer are divided into an unlabeled support set and a labeled query set. The images in the support set are masked, padded with meta prompt vectors, and passed through a shared image encoder, followed by reconstruction using a Masked Autoencoder (MAE)'s decoder. The writer-specific prompt vectors are derived in the inner loop using a self-supervised loss, which optimizes the meta prompt vectors through a single gradient step. These writer-specific prompt vectors are then padded with the document images from the query set and used as input to the HTR model to predict a sequence of tokens representing the writing content.

Experimental Results

Comparison with state-of-the-art methods on the IAM dataset. Our method achieves outstanding performance, outperforming other methods in both Character Error Rate (CER) and Word Error Rate (WER).

Performance comparison on the RIMES dataset, demonstrating consistent improvements across different adaptation scenarios.

Parameter Efficiency

Our approach requires only 0.08M parameters to be trainable during personalization. This means our approach optimizes just 1% of the total model parameters, making it highly efficient for deployment in resource-constrained environments.

@article{gu2024metawriter, title={MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning}, author={Gu, Wenhao and Gu, Li and Suen, Ching Yee and Wang, Yang}, journal={CVPR}, year={2025} }