skip to main | skip to sidebar
Loading...
You have reached the blog of
Matthew Gray
matthew@gray.org
I am a father, board gamer and software engineer.

Internet

In addition to my blog (this page), you can find me on BoardGameGeek, Twitter, Flickr, LinkedIn, FriendFeed, and various other places. I also have a slightly stale homepage.

Personal

I am an avid board gamer. I am one of the (volunteer) admins of BoardGameGeek, maintainer of the GameStoreDB, board game blogger, and gaming software geek.

Professional

I am a staff software engineer at Google. Previously, I was the CTO at an 802.11 location and security company, Newbury Networks in Boston. In June, 1999 I received my Masters degree from the MIT Media Lab. I graduated from MIT (undergraduate) in June, 1997, in physics. Prior to that I was CTO of net.Genesis from 1994 to 1996.

While at MIT, I was one of the three members of the Student Information Processing Board (SIPB) who set up www.mit.edu in the spring of 1993. I am also a former/inactive member of the Apache group, a volunteer group of developers of Apache, the world's most popular web server.

Blog

Sunday, April 13, 2008

My ScanSnap Workflow

I bought and absolutely love my Fujitsu ScanSnap S510M. It is a high-speed compact full-duplex full-color sheet-feed scanner. I put a document in the sheet feeder, press a button on the scanner and a few seconds later there is a PDF in a pre-configured folder with the contents of the document. It's an amazing product. Read any number of other rave reviews of it.

In the roughly three months I've had it, I've scanned over 1200 documents, comprising over 5000 pages. I use "Yep!" for managing all of these and all of my other pre-existing and downloaded PDFs. In total, I have over 3000 PDFs. Yep handles this fine. Overall, I like "Yep!" very much, though it has one or two bugs that irritate me, but it's well worth the price.

The software that comes with the ScanSnap includes both the scanning software itself and FineReader, the OCR software. Both work well, but the user interface to both is lacking. But, fortunately, it's possible to completely disable the user interface of each so I never see it. The functionality of both pieces of software is good. Unfortunately, the integration with the OCR has a moderate flaw, but there's an easy way around it. Specifically, OCR takes far longer than scanning, but it doesn't queue up OCR jobs. This means you can't scan a second document until the first is done OCRing if you have automatic OCR enabled.

So, here's my workflow/setup: I have the ScanSnap Manager set up to scan to PDF with no OCR to a directory called "New Scans/Fresh" without prompting. I have two other directories under "New Scans": "Being OCRd" and "OCRd". I often glance through "Fresh" in Yep to see
if the scanner misrotated a page or something (in general it is very very good). Occasionally, often only once a week or so, before going to bed, I drag all the files in "Fresh" into "Being OCRd", and then drag all of those onto FineReader. I have FineReader configured to
OCR the files in place, and depending how many documents I have scanned since the last OCR batch, it can take several hours. I go to sleep. Later, I drag the completed OCRd PDFs to the "OCRd" directory.

Gradually, documents pile up in "OCRd", and periodically, I go through in Yep and clean up. But, it's worth noting that having them unfiled in a big pile in "OCRd" is still quite useful. With Yep I can search through those and find things quickly even if I haven't done proper
"filing". But, whenever I want to do some filing, I use Yep's "Browse by search folder" mode, which shows me a list of directories that contain PDFs. I don't use the tagging as the primary organization scheme, but do use it and will describe it later. First I select the "OCRd" directory and it shows me all the PDFs that are pending. Usually I'll spot something obvious like a mortgage bill and I'll type "mortgage" in the search box, and the view will be narrowed to just
things mentioning "mortgage". Often this will include some things other than mortgage statements, but often it will nicely narrow it to a homogeneous set. I select all of them and drag them into one of my two "filing cabinets" and the appropriate sub-folder, all within Yep.

I use two folders as filing cabinets. One is just a standard folder under Documents which contains stuff like correspondence, recipes and local restaurant menus. The other is an encrypted sparse image for things like bills and account statements. Some of the hierarchy is
obvious "Bills/Discover" or whatever, but mostly I don't worry too much about it because I know search works well enough. As I mentioned above, I don't use tags as the primary organization scheme, but do use them for task oriented groupings. For example, I used a tag for "2007
taxes" since that included statements from a number of accounts. Similarly, when we bought a new house, I had a "mortgage application" tag.

The system works great, and I'm not a big "organization guy". It's allowed me to shred and recycle paperwork with abandon because if I have something and I think I might want it, I scan it, and get rid of the clutter. The resulting scans are small in size, coming in on average under 100k per page. I love the ability to quickly and easily find any bill, document or other paperwork and my wife loves it as well.

Overall, I wholeheartedly recommend the ScanSnap and "Yep!".

Posted by Matthew Gray at 5:46 PM
Labels: General, Technology

5 comments:

Kirit Vora said...

Matthew,

I have recently acquired ScanSnap. I am very excited about it. Here is my work flow. I would like your comments on this.

I use ScanSnap Manager to create a profile for Fine Reader by turning off quick scan.

When I installed Yap, it created a folder called Yap Documents.

I have configured Fine Reader to put all its documents in the Yap Document folder. So here it goes. All the documents that need to be scanned and OCRed are put into ScanSnap so they get scanned. Then they get OCRed by Fine Reader. In the preferences of Fine Reader, I have clicked “Ask Me” for the name of the file. So, I give it a unique name, something that will be relevant.

Once a week I go into this folder and drag all the documents to EverNotes desktop application where I have created a notebook for PDFs only. So, locally when I am looking for a document, I look through Yap and remotely I find them through EverNotes. This way I have the best of both worlds.

08 September, 2008 10:51
Kirit Vora said...

Please respond to kiritvora@gmail.com

08 September, 2008 10:52
kip said...

Does anyone know how to change the file type/creator of a regular PDF file so that FineReader will scan it? I get downloadable bank statements which I cannot OCR due to FineReader being picky about the files it will work with.

25 September, 2008 14:52
NK said...

Matthew-
I was going to pull the trigger on yep but would like to know if you still utilize the same workflow? Did you buy the other products (leap and deep) that bundle with yep? When you OCR a document with abbyy does it affect the image of the original pdf? nice post

25 December, 2008 15:54
Matthew Gray said...

NK: I still use Yep and essentially the same workflow (though I rely on Yep to quickly "eyeball" things more than the OCR, compared to when I first got it). I did not buy or upgrade to Leap/Deep.

OCRing of the files does degrade the image quality of the original PDF unfortunately, but it's usually pretty minor and worth it for the OCR. For scans I know won't OCR (eg, children's drawings) I will move them straight from Fresh to OCRd to maintain the image quality.

25 December, 2008 16:11

Post a Comment

Newer Post Older Post Home
Posts feed Add to Google Reader or Homepage Subscribe in Bloglines // Comments feed

Recently Played

www.flickr.com

Popular Posts

Blog Archive

  • ►  2010 (2)
    • ►  January (2)
      • 2009 Games Summary
      • Ten Years of Games
  • ►  2009 (6)
    • ►  September (2)
      • People who are unintentional "spoilers"
      • Davis Mega Maze via GPS
    • ►  April (1)
      • Google's architecture through the eyes of a 4-year...
    • ►  March (1)
      • Mozy review: It doesn't work
    • ►  February (1)
      • Recent sci-fi reading
    • ►  January (1)
      • 2008 Games Summary
  • ▼  2008 (27)
    • ►  December (1)
      • Lanna Thai Diner Review
    • ►  November (5)
      • Atom feed of your recently played games
      • In praise of short games
      • 1000 different games
      • Simple election Monte Carlo toy
      • Request for online photo hosting/sharing suggestio...
    • ►  October (2)
      • Played a bunch of new games
      • BaordGameGeek and AppEngine
    • ►  July (3)
      • Quick iPhone app reviews
      • I play games with other people
      • Almost 4, Almost a boardgame geek
    • ►  June (2)
      • Spin and Axis reviews
      • Amusing StreetView vignette
    • ►  May (8)
      • 2008 SdJ Virtual Stock Market
      • Next stage of migration and a warning
      • Luck, Skill and Experience in games
      • Moved to Reading & Commute Analysis
      • I am a social network
      • Race for the Galaxy and variety
      • All my blogs
      • Trying out Blogger
    • ▼  April (1)
      • My ScanSnap Workflow
    • ►  March (2)
      • Much Better, reprise
      • A couple more: Parkour and Speed Stacking
    • ►  January (3)
      • Every Year Games
      • Games of the year, 2007
      • 2007 Games Summary
  • ►  2007 (12)
    • ►  September (1)
      • Gaming impact of children
    • ►  April (5)
      • Hot at the Gathering
      • Buy, Maybe, No Buy
      • Newly played at the Gathering, Brief Comments, Tue...
      • Friedemann charms a 2-year-old
      • Gathering 2007, appetizer
    • ►  March (3)
      • Game Card Catalog
      • Gamer, age 2
      • Full Circle
    • ►  January (3)
      • BoardGameGeek Ratings
      • 2006 Games Report
      • Game Metrics for 2006
  • ►  2006 (36)
    • ►  December (2)
      • Games of the year, 2006
      • My BGG tools and toys
    • ►  November (3)
      • Great Service
    • ►  September (3)
    • ►  August (3)
    • ►  July (1)
    • ►  June (2)
    • ►  May (5)
    • ►  April (8)
    • ►  March (2)
    • ►  February (1)
    • ►  January (6)
  • ►  2005 (62)
    • ►  December (3)
    • ►  November (3)
    • ►  October (6)
    • ►  September (2)
    • ►  August (4)
    • ►  July (3)
    • ►  June (4)
    • ►  May (4)
    • ►  April (13)
    • ►  March (7)
    • ►  February (3)
    • ►  January (10)
  • ►  2004 (87)
    • ►  December (5)
    • ►  November (4)
    • ►  October (10)
    • ►  September (5)
    • ►  August (7)
    • ►  July (6)
    • ►  June (10)
    • ►  May (6)
    • ►  April (10)
    • ►  March (11)
    • ►  February (4)
    • ►  January (9)
  • ►  2003 (17)
    • ►  December (7)
    • ►  November (5)
    • ►  October (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2002 (43)
    • ►  December (2)
    • ►  November (2)
    • ►  October (3)
    • ►  September (3)
    • ►  August (3)
    • ►  June (1)
    • ►  May (6)
    • ►  April (7)
    • ►  March (13)
    • ►  February (3)

Disclaimer

I work for Google as a Software Engineer. This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.