Sunday, April 13, 2008

My ScanSnap Workflow

I bought and absolutely love my Fujitsu ScanSnap S510M. It is a high-speed compact full-duplex full-color sheet-feed scanner. I put a document in the sheet feeder, press a button on the scanner and a few seconds later there is a PDF in a pre-configured folder with the contents of the document. It's an amazing product. Read any number of other rave reviews of it.

In the roughly three months I've had it, I've scanned over 1200 documents, comprising over 5000 pages. I use "Yep!" for managing all of these and all of my other pre-existing and downloaded PDFs. In total, I have over 3000 PDFs. Yep handles this fine. Overall, I like "Yep!" very much, though it has one or two bugs that irritate me, but it's well worth the price.

The software that comes with the ScanSnap includes both the scanning software itself and FineReader, the OCR software. Both work well, but the user interface to both is lacking. But, fortunately, it's possible to completely disable the user interface of each so I never see it. The functionality of both pieces of software is good. Unfortunately, the integration with the OCR has a moderate flaw, but there's an easy way around it. Specifically, OCR takes far longer than scanning, but it doesn't queue up OCR jobs. This means you can't scan a second document until the first is done OCRing if you have automatic OCR enabled.

So, here's my workflow/setup: I have the ScanSnap Manager set up to scan to PDF with no OCR to a directory called "New Scans/Fresh" without prompting. I have two other directories under "New Scans": "Being OCRd" and "OCRd". I often glance through "Fresh" in Yep to see
if the scanner misrotated a page or something (in general it is very very good). Occasionally, often only once a week or so, before going to bed, I drag all the files in "Fresh" into "Being OCRd", and then drag all of those onto FineReader. I have FineReader configured to
OCR the files in place, and depending how many documents I have scanned since the last OCR batch, it can take several hours. I go to sleep. Later, I drag the completed OCRd PDFs to the "OCRd" directory.

Gradually, documents pile up in "OCRd", and periodically, I go through in Yep and clean up. But, it's worth noting that having them unfiled in a big pile in "OCRd" is still quite useful. With Yep I can search through those and find things quickly even if I haven't done proper
"filing". But, whenever I want to do some filing, I use Yep's "Browse by search folder" mode, which shows me a list of directories that contain PDFs. I don't use the tagging as the primary organization scheme, but do use it and will describe it later. First I select the "OCRd" directory and it shows me all the PDFs that are pending. Usually I'll spot something obvious like a mortgage bill and I'll type "mortgage" in the search box, and the view will be narrowed to just
things mentioning "mortgage". Often this will include some things other than mortgage statements, but often it will nicely narrow it to a homogeneous set. I select all of them and drag them into one of my two "filing cabinets" and the appropriate sub-folder, all within Yep.

I use two folders as filing cabinets. One is just a standard folder under Documents which contains stuff like correspondence, recipes and local restaurant menus. The other is an encrypted sparse image for things like bills and account statements. Some of the hierarchy is
obvious "Bills/Discover" or whatever, but mostly I don't worry too much about it because I know search works well enough. As I mentioned above, I don't use tags as the primary organization scheme, but do use them for task oriented groupings. For example, I used a tag for "2007
taxes" since that included statements from a number of accounts. Similarly, when we bought a new house, I had a "mortgage application" tag.

The system works great, and I'm not a big "organization guy". It's allowed me to shred and recycle paperwork with abandon because if I have something and I think I might want it, I scan it, and get rid of the clutter. The resulting scans are small in size, coming in on average under 100k per page. I love the ability to quickly and easily find any bill, document or other paperwork and my wife loves it as well.

Overall, I wholeheartedly recommend the ScanSnap and "Yep!".