The Two Most Popular Features for Scanning to SharePoint

So, when I look at all the customers I have worked with, and examine the feature sets that are most applied in a SharePoint environment, there are two that stand out: Routing Sheets and Advanced Data Extraction.  I would say 90% of all my customers use these features in some way to make the process automated and efficient.  So, what are they?  How do they work?  Outlines below:

Routing Sheets

I have mentioned these quite a bit on the BLOG, and they lend themselves nicely to distributed scanning from MFPs/Copiers, Faxs, network scanners, etc.  A routing sheet is a combo of barcodes and/or checkboxes that can allow the end users to index prior to the scan.  The information can then be translated into metadata.  This feature requires Optical Mark Recognition, or OMR.  So make sure your scanning product supports OMR.  Below are some samples:

Legal Routing Sheet

HR Routing Sheet

Advanced Data Extraction (ADE)

Many of the solutions out there today support what is called Zoning, or the ability to pick information from a specific area on a page and enter it as metadata.  ADE takes that to a whole new level and provides the ability to match patterns and extract information.  So if a customer needs an order number that is 6 digits, and always starts with a 7, the extraction engine can search the whole page and extract.  This is a huge time saver, and allows the utmost in automation and verification of data.

 

What is “Bridging the Gap”?

 The movement towards an office with less paper and more efficiency can be quite difficult, and with the wrong tools can end in failure.  The key challenge is a process I call “Bridging the Gap”, which uses several applications to create a bridge between the physical and digital world, and helps create a seamless process.  So what is required?  How do you create the bridge?

 

On one side of the gap, you have your physical environment: file cabinets, inboxes, stacks of folders on desks, etc.  There are two components that facilitate the crossing:

  • Scanning Hardware – scanners allow the conversion of paper documents into digital documents or images.  Organizations can use scanning copiers, fax machines or dedicated scanners to digitize.
  • Capture Software – capture software works with the scanning hardware to create an efficient and automated bridging process.  It controls the flow of digitized documents, standardizing how they are routed, and using OCR, Barcodes, Advanced Data Extraction (ADE) and other features to automate the collection of information.  It spans the gap and creates a connection to the other side or the repository.

Once the gap has been spanned, the documents need to land somewhere, just as physical documents land in a file cabinet, inbox on someones desk or another location in the organization.  Below are the two components that exist on the far side of the gap:


  • Workflow Software – think of this as the digital inbox and outbox…on steroids.  Workflow Software is utilized to create a digital mirror of your physical processes.  It can move around files, create approval steps, automatically email and perform logic that usually requires intervention by a human.  Some oraganizations dont have this entity on the other side of the gap.
  • Repository –  Think of the repository as a temporary and permanent file cabinet that can hold files during a workflow process, or as an archive copy once the whole process is complete.  You can search, sort and organize, print, distribute and copy.  Most repositories can allow full text search, if the capture software has created a searchable file format, and also allow column based searching for specific criteria.

I have seen many organizations try and bridge the gap, and not have one of the pieces above, or a piece that cannot suit all their needs.  A missing component can impact the overall value of the system.  For example, take a scanning copier that an AP department uses to scan invoices.  They email themselves the scans, open them, rename them and then save them into their repository.  Without capture software to automate the naming and routing, this is a highly inefficient process.  Without capture, files are not made searchable through OCR, and this can also reduce effiency during search.  Another example might be the lack of a repository that can provide all the bits and pieces an organization may require.  Take the organization that just saves PDFs to a network directory.  This may be fine for many organizations that merely need a simple archive to house their files.  But what about an audit event, or legal issue that may require extensive searching and sorting?


“Briding the Gap” and creating an office with less paper can provide an organization countless benefits with proper planning and design, and the inclusion of all the above components.

Document Routing and Microsoft SharePoint

See a ton of companies struggling with the question:  How do i get my copiers to scan to SharePoint?

I go back and forth on the idea of panel applications that enable intelligent routing at the copier.  It always comes back to contention at the device.  I recall one instance where an admin had all her documents piled on the copier, they were using eCopy, and she was scanning one document at a time, and sending them to SharePoint.  During her 20 minutes of copier hoarding, at least 10 people walked up, and walked away.

There are several things that i believe are absolutely critical to enabling copiers as scanning and capture onramps to SharePoint:

  1. Document Separators are an absolute requirement!!!  You have to be able to take a whole stack of documents, place barcode/routing separators between them, throw them all in the hopper and hit the green button.
  2. Intelligent Routing is required.  Separators need to provide document intelligence, and give the user the ability to pre-index the document through the use of a barcode creation utility, or an Optical Mark Recognition (OMR) routing sheet with check boxes.
  3. Flexibility in routing is required.  An application that can provide automatic routing to SharePoint based on barcodes or checkboxes can provide ultimate flexibility for the users.  The ability to route to site, library and folder is necessary, and the need to set content type and file naming is also a key.

Here is a sample of a routing sheet:   Scanning Route-SP-Dynamic-Template

PSIGEN Releases PSI:Capture 4.0

Ok, talk about a game changer.  Take a look at version 4.0 of PSI:Capture, the new release from the mature document capture company has over 100 new features.  It provides the ability to perform Intelligent Character Recognition (ICR) to read hand printing, a whole set of new forms processing technology, enhanced Optical Character Recognition – OCR for SharePoint, and Dynamic Routing for SharePoint.  For a list of features and functions, go to Document Capture 4.0-PSI:Capture.

 

Questions to ask before you start your SharePoint scanning, imaging or capture project

So you want to use Microsoft SharePoint as storage for scanned images? Take a quick breath and don’t charge in too fast, as there are many facets of this type of project that need to be considered.

What type of volume are you scanning on a daily basis?

  
You need to take a deep dive into departmental and end user needs, and really look at the volume of pages they need to image and capture. This brings up a point I discus on a daily basis: Do you want to scan or capture? You may read this and say, what in the world are you talking about, but here is an explanation below:
Let’s create a definition and define a feature set for scanning applications. A scanning application is just a means to take paper, and quickly and easily convert it from paper to digital form. They are well suited to environments with very basic needs, and what I call “onsie-twosie” scanning, or low volume environments. Their feature sets provide very basic functionality, and may allow the use of basic separation, and very basic integrations with SharePoint. The majority of scanning hardware vendors bundle these applications with their hardware, although there are vendors that have taken it to the next level, and provide enhanced scanning capabilities beyond the typical bundled software.
Document Capture software can be utilized for basic scanning needs, but takes you to a whole new level from a “capture” perspective. These applications typically have a number of ways to “slice and dice” documents, and really focus on efficiency, and minimizing the time required to scan, index and capture data. Capture software provides numerous ways to automatically populate columns, including barcode reading, database lookups, OCR, and data extraction. True capture applications provide integration with scanners, folders with images, SharePoint Web Dav folders, etc. Any organization that is serious about processing paper documents, and want to do it in the most efficient, standardized manner, should look seriously at advanced capture applications.
Capture applications are typically well suited to high volume situations or in situations where data can be extracted automatically. Scanning applications are suited for very simple operations, and usually suited to low volume.

What type of scanning device(s) are you going to utilize?

 
There are only a few applications out there that will provide you with the ability to scan from any type of device. Are you going to use network based scanning devices or direct connect scanners? Look into support in these specific areas:
• What type of drivers are supported? ISIS, TWAIN, and VRS should all be allowed.
• Can hot folder functionality provide the auto-import and processing of all different image types, PDF included? Hot folder functionality should span local, network and WebDav folders.
Beware of “panel” based applications. They are typically very static, and can provide a line at the MFP/Copier as people are entering information about their documents at the actual device.


What output format do you want in the SharePoint libraries?

 
Scanning and capture applications today provide a broad array of image output formats, but the standard seems to be PDF Image with Hidden Text. This provides an all in one container for the original image and the searchable text. Install the PDF iFilter, and you have a searchable content store. There are some specialized usages that may require other formats. For instance, if you are importing JPEGs with EXIF tags with your advanced capture application, you will want to keep the original JPEG file with tags intact rather than performing a conversion.


What Scanning and Capture features will be necessary in your environment?


What features should you look for? This is the most difficult question of them all, and you really need to find an application that has a broad and expansive feature set to make sure you can cover today’s needs, and the needs of your organization in the future. This BLOG post is a great place to start:
Trends in Scanning and Capture




How much storage space will I require? Where are you going to store your images?


Just a few stats here to get you on your way:
• The standard scanned page can be estimated at 50K in size (at 300DPI)
• A file cabinet contains between 10,000 and 12,000 pages
This can give you a quick idea of how much storage will be required, and let you do some growth estimation over time.
You should also use these numbers to see if you should use the SharePoint DB for content storage, or utilize Remote BLOB Storage (RBS). SharePoint 2010 with SQL 2008 R2 allows this without the need for additional software through the FILESTREAM provider.


How will I view images once they are in SharePoint?


Without a viewer add-on, SharePoint will require you to open an image to view pages. This can be problematic if you are serving up large image files. Definitely take a look at some of the image viewer add ons to SharePoint. My favorite, VizitSP SharePoint Viewer, provides the ability to view/preview, annotate, image process, search (column based and full text) and have multiple images open in a tabbed view. This is an absolute necessity if you are going to give end users the best experience possible.

Just some questions to get the gears turning and make sure you get all the pieces to the puzzle.

SharePoint 2010 and Document Sets

So many good posts coming out on the web for 2010. Working to figure out all the angles on how to improve SharePoint as an imaging, scanning and capture platform. Document sets seem to be a great focal point. Great article outlining them and how to use:

Document Sets and SharePoint 2010

How are Companies using SharePoint Scanning Applications?

There is a trend right now in SharePoint Scanning and Capture, and I really see organization leveraging MFPs and copiers to provide scanning to the masses.  As you can well image, this could be an absolute nightmare without the use of document capture software that can provide the ability to standardize the posting of information to Document Libraries.

The trend right now is the use of barcode cover sheets and/or OMR routing sheets.  The routing sheets seem to be the hot method at this point in time, because they provide the user the ability to have a pre-printed stack near the copier, and when they need to scan, they grab a sheet and a sharpie, color in the boxes that determine the metadata, and then scan.  Below is a link to an example OMR sheet for a law firm.  Note that the check boxes are mapped to document library, and also are utilized to create file folder and file naming.

SharePoint Scanning OMR Sheet

I will go over some barcode sheet examples in a later post.