SharePoint Scanning Planning – Part 2 – Separation

Document Examination and Separation

One of the key steps in preparing for document scanning and capture is to identify how you will separate or split documents.  What is separation and how does it work?  Details below:

For those of you that are new to document management and capture, document separation is the notion of how we can determine when a document begins and ends.  With most simple scanning software, this process is easy.  You load a single document in the feeder, click scan, and when it is done, you name it and save it.  With advanced capture, you can load multiple documents into the feeder, scan them all at once, and use a separation method to split them into individual digital documents.    This is a massive time saver.  Imagine loading 20 individual documents into a scanner one at a time, scanning each individually, and then entering information about each.   Below are some key separation methods any advanced capture suite should have:

Fixed Page Count Separation – This allows you to split based on a certain page count.  So if you scan a stack of 100 two page forms, you will have 50 separate documents in your capture interface.

Barcode Separation – probably the most pervasive separation method is a barcode separator.  Place a sheet with a specific barcode pattern between each document, and you are off to the races.  To give you the most flexibility, applications should support the following enhanced barcode separation methods:

  • Separate on any barcode
  • Separate on specific barcode terms and patterns
  • Separate on barcode type
  • Separate on barcode count
  • Separate on a certain number of barcodes on a page
  • Separate when a barcode changes

You want to make sure your barcode engine supports 1D and 2D barcodes without the purchase of any expensive modules or add-ons, and it should also have a simple feature that lets you split 2D barcodes and identify separation terms.

Patch Code Separation – So what the heck is a patch code?  Just an old school horizontal barcode.  Below is an example.  If you work in the medical field, most medical billing forms will have these on them, and some scanners actually support using patch codes to shift scanner settings during the scanning process.  For flexibility, choose an application that supports patch code separation.

Scanning Patch Code

Patch Code Example

Optical Character Recognition (OCR) Separation – OCR is the process of converting a scanned or imported image into searchable text.  OCR separation searches for a key word, term or phrase on the document, and will recognize that page as the first page in a new document.  This is a preferred method, as you don’t have to kill trees to print cover sheets, and it makes document preparation simple (no inserting separator sheets).  For example, if you are scanning contracts, and you want to split when you find an 8 digit contract number in the right hand corner, this comes in very handy.  There are several key requirements in this feature that are absolutely required in your application to make sure you get high separation accuracy:

  • Scan at 300DPI and use an app that has image processing software to clean up the page.  Also, your image processing engine must allow processing of imported PDFs and TIFFs if you plan to harvest documents.  Some image correction/processing engines only work with scanners.
  • Insure you capture application allows you to use expression matching (Regular expressions) so you have the utmost flexibility in finding separation patterns.
  • Character sets are key.  These provide the ability to tell the OCR engine the type of characters you are looking for (A-Z, 0-9, etc), so if it misidentifies a character, it auto-corrects the information.
  • Finally, top line applications also allow you to separate when OCR terms change.  So you can look for that contract number, and only split when you find a new one.

Intelligent Character Recognition (ICR) Separation- ICR is the process of converting scanned images of hand printing to text.  This method can be utilized to split pages when certain patterns in hand printing are detected.  Note:  all of the features required to insure accuracy for OCR separation should also be considered if you utilize this method as well.

Document Import and Separation – There are several separation methods that can be key to success if you need to import large volumes of documents, or you want to process documents scanned from copiers, network scanners, or fax machines.  Below is several separation methods required for any document capture from imported files:

  • New File Separation – This method of separation will look at a directory, pick up files, and maintain each new file as its own digital document.
  • Folder-based separation – This is a key method if you are importing documents and want to combine them based on the folder.  One example might be a law firm that has a folder structure of case documents on different subjects for the case and wants to combine each folder into a single PDF file.

Blank Page Separation – I only mention this as I would always, always avoid it unless absolutely necessary, especially if you are scanning in duplex.  Most implementations of this method, unless operated under strict preparation by knowledgeable operators becomes an absolute mess. (Just my humble opinion ;)   )

Separation Scripting – Finally, for those rare and special occasions, you always want a product that has a pre-built scripting interface for customizing the whole process if necessary.  Now let me be clear, not a sales rep “Yeah we can do that” (Which usually means $20,000 in professional services), but a product that has simple hooks into the separation function, that allows you a simple “yes or No” based on some parameter or criteria that anyone with basic scripting skills can write.  When would you use something like this?  Usually for very complex jobs where the original documents cannot be modified, but you need to put some logic in place to spit documents.

The last separation topic I want to cover is something called triggered separation.  Let me set the stage on this one, and describe a process which is near and dear to every accounting manager’s heart, invoices.  So you have a stack of invoices, some single page, some multi-page and you are struck with a dilemma.  If I use barcode separators, and I have 100 single page invoices, do I really have to put 100 barcode separators between them all?  Separation triggers allow you to scan single page and multi-page documents all together.  So in this example, you can stack your singles, and then put separators between your stack of variable length separators.  Put a trigger sheet between the two stacks (this tells the capture software to switch from single page separation to barcode-based separation), and scan the whole stack in one fell swoop.  This is a huge time saver in high volume environments, and can allow you to also build redundant separation logic, so you get the highest accuracy in separation with the least amount of document preparation.  Phewwww.  That was geeky.

Do you really need all of this?  Does separation have to be that complex?  The whole goal here is to have as much as you possibly can in the tool kit to insure you can meet all the capture needs within your organization.  I liken it to buying the base model of a car with no accessories, and then wishing every day you had power windows, the iPod Kit, cruise control, 4WD, etc.

So now you have examined your documents, and figured out how to efficiently scan and split.

Document Routing and Microsoft SharePoint

See a ton of companies struggling with the question:  How do i get my copiers to scan to SharePoint?

I go back and forth on the idea of panel applications that enable intelligent routing at the copier.  It always comes back to contention at the device.  I recall one instance where an admin had all her documents piled on the copier, they were using eCopy, and she was scanning one document at a time, and sending them to SharePoint.  During her 20 minutes of copier hoarding, at least 10 people walked up, and walked away.

There are several things that i believe are absolutely critical to enabling copiers as scanning and capture onramps to SharePoint:

  1. Document Separators are an absolute requirement!!!  You have to be able to take a whole stack of documents, place barcode/routing separators between them, throw them all in the hopper and hit the green button.
  2. Intelligent Routing is required.  Separators need to provide document intelligence, and give the user the ability to pre-index the document through the use of a barcode creation utility, or an Optical Mark Recognition (OMR) routing sheet with check boxes.
  3. Flexibility in routing is required.  An application that can provide automatic routing to SharePoint based on barcodes or checkboxes can provide ultimate flexibility for the users.  The ability to route to site, library and folder is necessary, and the need to set content type and file naming is also a key.

Here is a sample of a routing sheet:   Scanning Route-SP-Dynamic-Template

SharePoint Document Scanning and 2D Barcodes

I have had a ton of inquiries on where to find the barcodes I have shown in my previous post – Using 2D Barcodes when Scanning into SharePoint.  There is a great site that lets you create these at the following URL:

Online 2D Barcode Generator

I use this link to create barcodes for demos, etc., and have many customers that have purchased the Datamatrix or PDF417 font for generating barcodes on their own documents internally.

Note that you need a document capture solution to read these at scan time.

Barcode Cover Sheets and SharePoint

So, in my quest for the ultimate scanning application for SharePoint, I continue to test new technologies, but there are just so many companies jumping on the SharePoint Bandwagon. The most recent technology is one that i have been testing quite a bit lately, and I have mentioned it quite a bit in this BLOG (PSI:Capture).

Imagine users from all over your organization walking up to Multi-Function Devices (MFDs or Copiers), and scanning their documents. Only they are using barcode separator/cover sheets. How do they work? Well, the sheet can serve two purposes: separation and data. The separation function allows you to take a stack of 10 documents, put the separator sheets between each, and then scan the whole stack. The software finds the separator sheets and knows when one document begins and another ends. The data in the barcode can also be read, and populated into columns within the SharePoint application. An example?

Take this paragraph:

“Early on in his life as a midshipman at the Naval Academy, the most important lesson John McCain learned was that to sustain his self-respect for a lifetime it would be necessary for him to have the honor of serving something greater than his self-interest — service to his country. John McCain has always put his country’s interests before any party, special interest and even his own self-interest. He has always and will always do what is right for our country.”

I know, I know, a technology BLOG is no place for political innuendo…sorry. ; )  Needed a quick paragraph and the covention was on.

Now take this barcode:

Separation barcode

The entire barcode is the paragraph above encoded.  It can be read, and entered into a column in SharePoint.  Now this is an extreme example, but these 2D barcodes allow over 1000 characters in a thumbnail sized symbol.  Barcode generators will also allow a separation character, so you can embed multiple pieces of data for different columns within the symbol.  So think of the possibilities…

Each user assigned to a project within SharePoint could have their own cover sheet that has their name, project, location embeded.  All they need to do is scan their documents with the cover sheet on top, and the next thing they know, their paper document is on the SharePoint site with their name attached, and all the fields filled in.

No, not science fiction and very affordable.  Their are several applications that support this function, most notably, PSIGEN products and Kofax products.  Some good additional info on ScanGuru.