Matthew Gray's Blog: My ScanSnap Workflow

Sunday, April 13, 2008

My ScanSnap Workflow

I bought and absolutely love my Fujitsu ScanSnap S510M. It is a high-speed compact full-duplex full-color sheet-feed scanner. I put a document in the sheet feeder, press a button on the scanner and a few seconds later there is a PDF in a pre-configured folder with the contents of the document. It's an amazing product. Read any number of other rave reviews of it.

In the roughly three months I've had it, I've scanned over 1200 documents, comprising over 5000 pages. I use "Yep!" for managing all of these and all of my other pre-existing and downloaded PDFs. In total, I have over 3000 PDFs. Yep handles this fine. Overall, I like "Yep!" very much, though it has one or two bugs that irritate me, but it's well worth the price.

The software that comes with the ScanSnap includes both the scanning software itself and FineReader, the OCR software. Both work well, but the user interface to both is lacking. But, fortunately, it's possible to completely disable the user interface of each so I never see it. The functionality of both pieces of software is good. Unfortunately, the integration with the OCR has a moderate flaw, but there's an easy way around it. Specifically, OCR takes far longer than scanning, but it doesn't queue up OCR jobs. This means you can't scan a second document until the first is done OCRing if you have automatic OCR enabled.

So, here's my workflow/setup: I have the ScanSnap Manager set up to scan to PDF with no OCR to a directory called "New Scans/Fresh" without prompting. I have two other directories under "New Scans": "Being OCRd" and "OCRd". I often glance through "Fresh" in Yep to see
if the scanner misrotated a page or something (in general it is very very good). Occasionally, often only once a week or so, before going to bed, I drag all the files in "Fresh" into "Being OCRd", and then drag all of those onto FineReader. I have FineReader configured to
OCR the files in place, and depending how many documents I have scanned since the last OCR batch, it can take several hours. I go to sleep. Later, I drag the completed OCRd PDFs to the "OCRd" directory.

Gradually, documents pile up in "OCRd", and periodically, I go through in Yep and clean up. But, it's worth noting that having them unfiled in a big pile in "OCRd" is still quite useful. With Yep I can search through those and find things quickly even if I haven't done proper
"filing". But, whenever I want to do some filing, I use Yep's "Browse by search folder" mode, which shows me a list of directories that contain PDFs. I don't use the tagging as the primary organization scheme, but do use it and will describe it later. First I select the "OCRd" directory and it shows me all the PDFs that are pending. Usually I'll spot something obvious like a mortgage bill and I'll type "mortgage" in the search box, and the view will be narrowed to just
things mentioning "mortgage". Often this will include some things other than mortgage statements, but often it will nicely narrow it to a homogeneous set. I select all of them and drag them into one of my two "filing cabinets" and the appropriate sub-folder, all within Yep.

I use two folders as filing cabinets. One is just a standard folder under Documents which contains stuff like correspondence, recipes and local restaurant menus. The other is an encrypted sparse image for things like bills and account statements. Some of the hierarchy is
obvious "Bills/Discover" or whatever, but mostly I don't worry too much about it because I know search works well enough. As I mentioned above, I don't use tags as the primary organization scheme, but do use them for task oriented groupings. For example, I used a tag for "2007
taxes" since that included statements from a number of accounts. Similarly, when we bought a new house, I had a "mortgage application" tag.

The system works great, and I'm not a big "organization guy". It's allowed me to shred and recycle paperwork with abandon because if I have something and I think I might want it, I scan it, and get rid of the clutter. The resulting scans are small in size, coming in on average under 100k per page. I love the ability to quickly and easily find any bill, document or other paperwork and my wife loves it as well.

Overall, I wholeheartedly recommend the ScanSnap and "Yep!".

5 comments:

Kirit08 September, 2008 10:51
Matthew,

I have recently acquired ScanSnap. I am very excited about it. Here is my work flow. I would like your comments on this.

I use ScanSnap Manager to create a profile for Fine Reader by turning off quick scan.

When I installed Yap, it created a folder called Yap Documents.

I have configured Fine Reader to put all its documents in the Yap Document folder. So here it goes. All the documents that need to be scanned and OCRed are put into ScanSnap so they get scanned. Then they get OCRed by Fine Reader. In the preferences of Fine Reader, I have clicked “Ask Me” for the name of the file. So, I give it a unique name, something that will be relevant.

Once a week I go into this folder and drag all the documents to EverNotes desktop application where I have created a notebook for PDFs only. So, locally when I am looking for a document, I look through Yap and remotely I find them through EverNotes. This way I have the best of both worlds.
ReplyDelete
Replies
Kirit08 September, 2008 10:52
Please respond to kiritvora@gmail.com
ReplyDelete
Replies
Unknown25 September, 2008 14:52
Does anyone know how to change the file type/creator of a regular PDF file so that FineReader will scan it? I get downloadable bank statements which I cannot OCR due to FineReader being picky about the files it will work with.
ReplyDelete
Replies
NK25 December, 2008 15:54
Matthew-
I was going to pull the trigger on yep but would like to know if you still utilize the same workflow? Did you buy the other products (leap and deep) that bundle with yep? When you OCR a document with abbyy does it affect the image of the original pdf? nice post
ReplyDelete
Replies
mkgray25 December, 2008 16:11
NK: I still use Yep and essentially the same workflow (though I rely on Yep to quickly "eyeball" things more than the OCR, compared to when I first got it). I did not buy or upgrade to Leap/Deep.

OCRing of the files does degrade the image quality of the original PDF unfortunately, but it's usually pretty minor and worth it for the OCR. For scans I know won't OCR (eg, children's drawings) I will move them straight from Fresh to OCRd to maintain the image quality.
ReplyDelete
Replies

Add comment

Professional

Previously, I was the CTO at an 802.11 location and security company, Newbury Networks in Boston. In June, 1999 I received my Masters degree from the MIT Media Lab. I graduated from MIT (undergraduate) in June, 1997, in physics. Prior to that I was CTO of net.Genesis from 1994 to 1996.

While at MIT, I was one of the three members of the Student Information Processing Board (SIPB) who set up www.mit.edu in the spring of 1993. I am also a former/inactive member of the Apache group, a volunteer group of developers of Apache, the world's most popular web server.