OCR - Part II

Nobody viết ngày 14/01/2022

Part II

The main problems of OCR are, beside the myriad free styles, the unequal size, the different fonts and the colors which could be tarnished or blurred by the surrounding or the quality of the image. For Example: all characters of Lucida Console have the same size while the size of Dialog characters is different. Examples:

Lucida Console:
alt text

alt text

alt text

In this brief and concise tutorial I show you two different implementations:

  • Dynamic OCR: the recognition bases on the distinctive features,
  • Static OCR: the recognition bases on a given Font Table

The 1st one is relatively difficult to cover all different fonts while the last one is easier, but restrictive to one font type. The distinctive features of the letter A of both fonts Dialog and Lucida Console are similar, but the A of Courier slightly deviates from the top and from the feet. A dynamic A-Recognition implementation that works with Lucida Console and Dialog could fail with Courier. On the other hand, a Static A Recognition requires 3 different Character Font Tables.

We start with the dynamic OCR.

Dynamic OCR

As mentioned in Part I the image that contains a string which should be OCRed should be firstly "cleaned up" by "translucentifying" of the unnecessary surrounding.

    rgb &= 0xFFFFFF; // only the RGB of the selected pixel (the color of the string)
    BufferedImage img = ImageIO.read(new File(imageFile));
    int width  = (int) img.getWidth();
    int height = (int) img.getHeight();
    // black: 0x000000, white:0xffffff
    BufferedImage bImg = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);
    for (int y = 0; y < height; ++y)
    for (int x = 0; x < width; ++x) {
      int p = img.getRGB(x,y);
      int alpha =  p & 0xFF000000;
      if ((p&0x00FFFFFF == rgb) {
        bImg.setRGB(x, y, p | 0xFF000000); // keep and make this pixel to 100% opaque
      } else { // translucentify this pixel
        bImg.setRGB(x, y,  p & 0x00FFFFFF); // set Transparent
   return bImg;

and then focused on the string only (by color of the String):

    ... // img is the translucentified image
    int width = img.getWidth();
    int height = img.getHeight();
    // the upper-most corner
    int xa = 0, ya = 0;
    LA:for (int y = 0; y < height; ++y)
      for (int x = 0; x < width; ++x)
      if ((img.getRGB(x, y) & ALPHA) != 0) {
        if (x < xa || xa == 0) xa = x;
        if (y < ya || ya == 0) ya = y;
    if (xa == width || ya == height) return null;
    // the lower-most corner
    int xe = 0, ye = 0;
    LE:for (int x = width-1; x >= 0; --x)
      for (int y = height-1; y >= 0; --y)
      if ((img.getRGB(x, y) & ALPHA) != 0) {
        if (x > xe || xe == 0) xe = x;
        if (y > ye || ye == 0) ye = y;
    if (xe <= xa || ye <= ya) return null;
    // set some tolerant
    int wi = 1 + xe - xa, he = 1 + ye - ya;
    // focus only on the area with the given RGB and copy the pixels into a new BufferedImage
    BufferedImage bImg = new BufferedImage(wi, he, BufferedImage.TYPE_INT_ARGB);
    for (int b = 0; b < he; ++b, ++ya)
      for (int a = 0, i = xa; a < wi; ++a, ++i)
        bImg.setRGB(a, b, img.getRGB(i, ya));
    return bImg;

With the "DOTifying" of each letter (or character) it's just a chore to implement a Letter Recognition. Example: letterH( )

    Specification of  xy[10]:
      [0]: x coordinate X
      [1]: y coordinate Y
      [2]: xW (the Letter Width)
      [3]: yH (the Letter Height)
      [4]: gap between 2 letters
      [5]: return found Letter
      [6]: Starting Y value
      [7]: ending Y value
      [8]: starting X
      [9]: ending X
  public boolean letterH(int[] xy ) {
    int a = xy[0]+((xy[2]-xy[0])>>1);
    int t  = leftT(xy[0], xy[1]);
    int up = upperY(xy[0]+t);
    int ud = lowerY(xy[0]+t);
    if (onVertical(xy[3], xy[0], xy[1]) && onVertical(xy[3], xy[2]-1, xy[1]) && onHorizontal(xy[2], xy[0], up) &&
        noHorizontal(xy[2]-4, xy[0]+4, xy[1]+3) && noVertical(up, a, xy[1]) && noVertical(xy[3], a, ud+1) && xy[5] > xy[1]) {
      xy[5] = (int)'H';
      return true;
    //  lower case h ?
    int Y = xy[1] > xy[6]? xy[6] : xy[1];
    if (onVertical(xy[3], xy[0], Y) && noVertical(up, xy[2]-1, xy[1]) && onHorizontal(xy[2], xy[0], up) &&
        noHorizontal(xy[2], xy[0]+t, xy[1]+((up-xy[1])>>1)) && noVertical(xy[3], a, ud+1)) {
      xy[5] = (int)'h';
      return true;
    return false;

The OCR of letter H bases on the following computing features:
alt text
and the result is:
alt text

Bình luận

{{ comment.user.name }}
Bỏ hay Hay
Male avatar
{{ comment_error }}

Hiển thị thử

Chỉnh sửa



7 bài viết.
482 người follow
{{userFollowed ? 'Following' : 'Follow'}}
Bài viết liên quan
2 0
Trong bài viết này, một số hình ảnh hoặc nọi dung có thể bị thiếu do quá trình chế bản. Vui lòng xem nội dung ở blog gốc sau: (Link) (Link), chúng...
programmerit viết hơn 6 năm trước
2 0
0 0
Giới thiệu Trong bài hôm nay chúng ta sẽ tìm hiểu cách handle request POST của Spring Boot. Trước đó, bạn nên biết 1. 「Spring Boot 8」Tạo Web He...
https://loda.me viết hơn 2 năm trước
0 0
Male avatar
0 0
https://grokonez.com/deployment/vultr/howtoinstalljavainubunturemoteservervutrhostingvpsexample How to install Java on Ubuntu Remote Server – Vutr...
loveprogramming viết 9 tháng trước
0 0


{{ comment_count }}

bình luận

{{liked ? "Đã kipalog" : "Kipalog"}}

{{userFollowed ? 'Following' : 'Follow'}}
7 bài viết.
482 người follow

 Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!