If this really works...
|
Author | Content |
---|---|
caitlyn Feb 07, 2013 11:57 AM EDT |
If this really works it's a huge big deal. pdfedit can do it in theory, but in practice it doesn't do a good job at all. I know this is one tool I could really use. |
Steven_Rosenber Feb 07, 2013 12:19 PM EDT |
As a journalist myself, even having a PDF based on electronic records is a pain to deal with. I find it better to try to persuade the agency to submit a spreadsheet because that's usually the source of the material in the first place. |
dinotrac Feb 07, 2013 12:27 PM EDT |
Ditto on the REAL BIG DEAL. PDFs have become a sort of lingua franca -- in spite of being a proprietary format -- and a free facility for processing them would make a number of things possible. |
caitlyn Feb 07, 2013 12:40 PM EDT |
I don't know how many times it would have been great to extract some piece of documentation or some information from a pdf and I really just couldn't do it with the tools available to me. dino, you're right,, pdfs are ubiquitous nowadays. |
Bob_Robertson Feb 07, 2013 1:39 PM EDT |
Gee, I thought my being happy when I found a PDF from which I could copy text was kind of silly. Glad I'm not the only one. PDF itself fills a need, for printing and presentation. But as a transmitter of data, no. |
jdixon Feb 07, 2013 1:50 PM EDT |
Most of the PDF's I get are scanned images, so there's no text to recover. :( You have to ocr them. From the article, that's what they're doing here too. Fortunately, the freeocr program for Windows (it uses the tesseract engine) seems to work halfway well, though it's very poor at maintaining formatting. I expect tesseract would work equally well under Linux, but I don't know if there's a good gui front end for it or not. |
mrbobeau Feb 07, 2013 8:18 PM EDT |
There are some GUIs for tesseract. The one I like is gimagereader. |
jdixon Feb 08, 2013 10:00 AM EDT |
Thanks for the feedback, mrbobeau. |
Posting in this forum is limited to members of the group: [ForumMods, SITEADMINS, MEMBERS.]
Becoming a member of LXer is easy and free. Join Us!