2016年5月21日 星期六

即時影像翻譯器無責任試用報告 - 產品與專利

名稱是我隨便翻的,就技術用語來說,這是一種OCR(optical character recognition,可翻為光學字元辨識),使用過幾個OCR,不過都覺得不好用,或是有些還要拍照下來翻譯,比較厲害的是用手機拍攝同時會把翻譯結果顯示在側欄,其實已經很厲害了。不過這些現行APP應該也是仰賴某個開放的翻譯機器吧!

看到Google才發布的「Google翻譯」APP(https://goo.gl/t847Q),才覺得歷史性的一刻到了,直接就想到出國旅遊。按照Google的說明,這是個「人工智慧」的產品,官方網站還以中文為範例(不曉得是不是其他國家也式看到同樣內容,Google已經讓我們指能看到我們"同溫層"的資訊了!)

[產品]
無責任試用(順便帶過總統交接與台灣第一位女總統的時事)。

啟動「Google翻譯」APP,功能包括影像翻譯(這是本篇測試目標)、語音翻譯與手寫翻譯


這是CNN網頁有關蔡英文總統的報導:



開啟照相翻譯功能(有個照相機小圖),向網頁上拍攝,可以決定用拍照逐字翻譯,或是整個畫面都「即時翻譯」,經測試,即時翻譯不是每種語言對翻都支援(編按,幸運的是,還好中文不是翻得頂好的!):


BBC新聞首頁除了埃及航空事故外,也有蔡英文總統消息:


即時影像翻譯:


另一頁:


紙本文件也可即時影像翻譯:

對韓文最沒力,拿韓國專利局網頁來測試:


韓文尚未支援即時影像翻譯,但仍可直接拍攝畫面來翻譯,先跑出選擇字詞的提示畫面,可以拍攝某部分來翻譯:






[專利]
就專利部落格的"使命"來說,介紹APP以外,還是要討論一下專利。

簡單用"inassignee:google" "translate" "OCR" "camera"等關鍵字找Google Patents中最近的專利(先不討論申請審查中的專利),可以看到一些有興趣的,顯然Google專利佈局也不少,這是她可以成為全世界市值最高(現在已經超越Apple)的企業方式之一。

簡單瀏覽一些專利,Google將這個看似簡單的功能分為以伺服器-終端、照相機抓取畫面、裂解畫面、觸發光學辨識、辨識內容、優化畫面等角度來佈局

petapator/patentbell的Gallery可以快速得到許多有興趣的專利,不過這裡並未詳細研究,僅列舉幾件。



可以從圖快速得到這件與產品本身相符的專利:


直指Google翻譯新功能的專利:
US9239833
請求項1界定一個方法,方法內容就是操作這個APP的步驟與顯示翻譯的流程,其實心中覺得這還是一個十分"軟"的步驟,如果面對目前法院對於101純軟體專利的標準,有些疑慮。其餘範圍有系統與非暫態儲存媒體等標準的軟體專利的寫法,但是都含有Claim 1步驟。
1. A method performed by data processing apparatus, the method comprising:
receiving an image;
identifying multiple distinct text blocks that each includes text depicted in the image;
identifying multiple collections of related text blocks based on visual characteristics of text depicted in the related text blocks and an arrangement of the text depicted in the related text blocks, each collection of related text blocks including text having matching visual characteristics;
selecting, for the image and from a plurality of presentation contexts, a presentation context based on the identified collections, wherein each presentation context has a corresponding user interface for presenting a translation of at least a portion of the text included in a particular collection of the identified collections, wherein the user interface for each presentation context is different from the user interface for other presentation contexts, and wherein the user interface for a first presentation context of the plurality of presentation contexts includes a translation of a different collection of text blocks than the user interface for a second presentation context of the plurality of presentation contexts;
identifying the user interface that corresponds to the selected presentation context; and
presenting a translation of the at least a portion of the text included in the particular collection of the selected presentation context using the selected user interface, while not presenting a translation of text included in another collection of the identified collections.
說明書:"The translator 115 includes a text identifier 120 that can identify text in images and other types of documents. In some implementations, the text identifier 120 analyzes images using optical character recognition (“OCR”) to identify text depicted by the images. The text identifier 120 can detect text in multiple different languages. For example, the text identifier 120 may include an OCR engine that is capable of recognizing text in multiple languages, or an OCR engine for each of multiple different languages."

其他可能週邊技術的專利:
US9235049

請求項1界定一個照相系統,典型用硬體包裝軟體程序的權利範圍,習知的鏡頭中有景深、第一、第二焦內視景、不同於景深的微距範圍,這樣,只是要轉換週邊視區銳利度為在微距內的中央視區。這應該是幫助使用OCR時的對焦而增進文字辨識度
1. A camera system, comprising:
an image sensor; and
a lens positioned in front of the image sensor to focus image light onto the image sensor, the lens including:
a depth of field (“DOF”) range;
a first in-focus field of view (“FOV”) within the DOF range;
a macro range that is distinct and separate from the DOF range, wherein the macro range is a near field relative to the DOF range; and
a second in-focus FOV within the macro range that is smaller than the first in-focus FOV within the DOF range, wherein the lens transfers sharpness from a peripheral viewing region within the macro range into a central viewing region within the macro range.
說明書:"In contrast, macro range 210 is designed to facilitate image recognition (“IR”), bar code scanning, or optical character recognition (“OCR”) using a narrower in-focus FOV (e.g., FOV2=30 degrees). "

US9087235

1. A method performed by data processing apparatus, the method comprising:
receiving, from a device, an image query that includes an image;
identifying textual characters in a region of the image and structural information associated with the textual characters in the region of the image, the structural information specifying a position of at least one of the textual characters with respect to one or more reference point elements in the image of the image query;
retrieving, using one or more of the textual characters and the structural information, a canonical document that includes the one or more textual characters at a location in the canonical document that is consistent with the structural information; and
sending, to the device, at least a portion of the canonical document.
說明書:"The disclosed embodiments relate generally to the field of optical character recognition (OCR), and in particular to displaying a canonical source document containing strings of high quality text extracted from a visual query."

US9116890

1. A computer-implemented method comprising:
receiving an output of performing an image capture process on a rendered document;
determining that the output includes a particular symbol;
determining a particular action that is associated with the particular symbol; and
transmitting an instruction to a document management system to perform the particular action.
說明書:"Optical Character Recognition (OCR) technologies have traditionally focused on images that include a large amount of text, for example from a flatbed scanner capturing a whole page. OCR technologies often need substantial training and correcting by the user to produce useful text. OCR technologies often require substantial processing power on the machine doing the OCR, and, while many systems use a dictionary, they are generally expected to operate on an effectively infinite vocabulary."

US8897598

1. A computer-implemented method comprising:
for each of a first image of a first view of an object and a second image of a second view of the object, generating a respective feature point descriptor for each of multiple feature points included in the image;
determining that a quantity of one or more of the feature point descriptors from the first image that are (i) indicated as similar to one or more feature point descriptors associated within a predefined sub-region of the second image, and (ii) associated within a corresponding predefined sub-region of the first image, satisfies a quantity threshold; and
based on determining that a quantity of one or more of the feature point descriptors from the first image that are (i) indicated as similar to one or more feature point descriptors associated within a same predefined sub-region of the second image, and (ii) associated within a same predefined sub-region of the first image, satisfies a quantity threshold, creating a mapping between the predefined sub-region of the first image and the predefined sub-region of the second image.
說明書:"The above-described aspects of the disclosure may be advantageous for rapidly reconstructing video streams into high quality document images capable of being translated by an OCR."

US9113076

1. A mobile device comprising:
one or more data processors that execute instructions that cause the one or more data processors to perform operations comprising:
measuring a relative orientation between a document and the mobile device; and
comparing the measured relative orientation to a threshold relative orientation associated with a document capture mode of the mobile device; and
an action component that performs operations comprising:
automatically transitioning the mobile device to the document capture mode when the measured relative orientation is within the threshold relative orientation associated with the document capture mode, wherein the threshold relative orientation is based on an analysis that includes:
analyzing occurrences when a user of the mobile device rejected automatic launching of an application that performs actions in response to text captures; and
analyzing occurrences when a user of the mobile device manually launched the application that performs actions in response to text captures.
說明書:"The shapes of characters in most commonly used fonts are related. For example, in most fonts, the letter “c” and the letter “e” are visually related—as are “t” and “f”, etc. The OCR process is enhanced by use of this relationship to construct templates for letters that have not been scanned yet."

Google官方Blog:https://googleblog.blogspot.tw/2016/05/translate-where-you-need-it-in-any-app.html
Google+新聞:https://plus.google.com/+GoogleTaiwan/posts/dFcyzDo5B4y

my two cents:
如果有時間去分析Google佈局、專利家族、各國佈局等,應該才是精彩的地方。

(警語)現在人類最大要預防的敵人應該是Google(或包括facebook)(架在blogger上的本部落格也是倚賴甚深,哪天它不爽時,隨時可以讓我下架),它收集我們的資料、瞭解我們、預測我們的行為、餵我們想要的內容、不讓我們知道其他人的想法(僅餵給認同自己與自己認同的內容)、左右搜尋結果、混淆商業與真正的資訊,以及開發了讓許多人面臨失業的人工智慧產品,包括最仰賴大腦工作的棋士。

(警語)除了必要的工作外,少用Facebook,它已經「限縮」我們的視野,看到的內容都是你認同,或是認同你的內容,好像全世界都很幸福、都很認同你、都像我們酸來酸去的錯覺。

Ron

沒有留言: