Pdfsandwich將 PDF 文件轉(zhuǎn)換成文本
Pdfsandwich 是將文本添加到圖像形式的文本 PDF 文件 (如掃描書籍) 的工具。它使用光學(xué)字符識別(OCR)創(chuàng)建一個額外的圖層,包含了原始頁面已識別的文本。這對于復(fù)制和處理文本很有用。
Pdfsandwich 是一個命令行工具,與同類的軟件相比,它在掃描圖像時執(zhí)行了預(yù)處理程序,如版面校正和去除黑邊等。
運行效果
最終的識別結(jié)果
Visionaries
I I7
and silver ligree ornaments ; gold and silver ower-stands, etc. ;
elaborate coloured patterns of carpets in brilliant tints are not
uncommon.
Another peculiarity resides in the extreme restlessness of
my visual objects. It is often very difficult to keep them still,
as well as from changing in character. They will rapidly oscil-
late or else rotate to a most perplexing degree, and when the
characters change at the same time a critical examination is
almost impossible. When the process is in full activity,l feel
as if I were a mere spectator at a diorama of a very eccentric
kind, and was in no way concerned with the getting up of the
performance.
When a. succession of images has been passing, I sometimes
alez ermz'ne to introduce an object, say a watch. Very often it is
next to impossible to succeed. There is an evident struggle.
The watch, pure and simple, will not come; but some hybrid
structure appears something round, perhaps but it lapses into
a warming-pan or other unexpected object.
This practice has brought to my mind very clearly the dis-
tinction between at least one form of automatism of the brain
and volition; but the strength of the former is enormous, for
the visual objects, when in full career of the change, are impera-
tive in their refusal to be interfered with.
[...]
獲取代碼
SVN Checkout
svn checkout svn://svn.code.sf.net/p/pdfsandwich/code/trunk/src pdfsandwich
