PSIGEN Releases PSI:Capture 4.0

Ok, talk about a game changer.  Take a look at version 4.0 of PSI:Capture, the new release from the mature document capture company has over 100 new features.  It provides the ability to perform Intelligent Character Recognition (ICR) to read hand printing, a whole set of new forms processing technology, enhanced Optical Character Recognition – OCR for SharePoint, and Dynamic Routing for SharePoint.  For a list of features and functions, go to Document Capture 4.0-PSI:Capture.

 

What scanning and capture model should you choose?

Model, what the heck does that mean?

In traditional scanning and capture, there are 3 well recognized scanning models: centralized, decentralized and distributed.  Below I will cover each in detail:

  • Centralized – Ah, centralized…the old school method.  Imagine a room with ten blue hairs, feeding big iron scanners, and the hum of paper over rollers filling the air.  This is the traditional scanning model, where paper is shipped to a centralized location, and a few highly trained operators with high speed scanners capture and process paper.  This process is easily standardized, but usually the operators are not the knowledge workers that know most about the documents.
  • De-centralized – As bandwidth got cheaper, companies began to look for ways to put the scanning task into the hands of the end users.  The decentralized model provides branch level scanning, usually with smaller desktop hardware, and gives more control to the knowledge workers.  Things get scanned more quickly, and the indexing process is less error prone.
  • Distributed – with the advent of network connected scanners, copiers and fax machines, distributed scanning has evolved to be the model of choice for SharePoint.  It puts the scanning and capture task in the hands of everyone in the organization.  It does have some drawbacks though:  usually you need some software to standardize and govern the whole process, security becomes an issue with scanner availability, and most manufacturers have limited integration options for ECM.

Typically, a SharePoint Scanning and Capture environment requires some type of Hybrid Solution that can be a mesh of all models.  Beware, you will need a capture application that can prosper in all different types of environments.

 

Questions to ask before you start your SharePoint scanning, imaging or capture project

So you want to use Microsoft SharePoint as storage for scanned images? Take a quick breath and don’t charge in too fast, as there are many facets of this type of project that need to be considered.

What type of volume are you scanning on a daily basis?

  
You need to take a deep dive into departmental and end user needs, and really look at the volume of pages they need to image and capture. This brings up a point I discus on a daily basis: Do you want to scan or capture? You may read this and say, what in the world are you talking about, but here is an explanation below:
Let’s create a definition and define a feature set for scanning applications. A scanning application is just a means to take paper, and quickly and easily convert it from paper to digital form. They are well suited to environments with very basic needs, and what I call “onsie-twosie” scanning, or low volume environments. Their feature sets provide very basic functionality, and may allow the use of basic separation, and very basic integrations with SharePoint. The majority of scanning hardware vendors bundle these applications with their hardware, although there are vendors that have taken it to the next level, and provide enhanced scanning capabilities beyond the typical bundled software.
Document Capture software can be utilized for basic scanning needs, but takes you to a whole new level from a “capture” perspective. These applications typically have a number of ways to “slice and dice” documents, and really focus on efficiency, and minimizing the time required to scan, index and capture data. Capture software provides numerous ways to automatically populate columns, including barcode reading, database lookups, OCR, and data extraction. True capture applications provide integration with scanners, folders with images, SharePoint Web Dav folders, etc. Any organization that is serious about processing paper documents, and want to do it in the most efficient, standardized manner, should look seriously at advanced capture applications.
Capture applications are typically well suited to high volume situations or in situations where data can be extracted automatically. Scanning applications are suited for very simple operations, and usually suited to low volume.

What type of scanning device(s) are you going to utilize?

 
There are only a few applications out there that will provide you with the ability to scan from any type of device. Are you going to use network based scanning devices or direct connect scanners? Look into support in these specific areas:
• What type of drivers are supported? ISIS, TWAIN, and VRS should all be allowed.
• Can hot folder functionality provide the auto-import and processing of all different image types, PDF included? Hot folder functionality should span local, network and WebDav folders.
Beware of “panel” based applications. They are typically very static, and can provide a line at the MFP/Copier as people are entering information about their documents at the actual device.


What output format do you want in the SharePoint libraries?

 
Scanning and capture applications today provide a broad array of image output formats, but the standard seems to be PDF Image with Hidden Text. This provides an all in one container for the original image and the searchable text. Install the PDF iFilter, and you have a searchable content store. There are some specialized usages that may require other formats. For instance, if you are importing JPEGs with EXIF tags with your advanced capture application, you will want to keep the original JPEG file with tags intact rather than performing a conversion.


What Scanning and Capture features will be necessary in your environment?


What features should you look for? This is the most difficult question of them all, and you really need to find an application that has a broad and expansive feature set to make sure you can cover today’s needs, and the needs of your organization in the future. This BLOG post is a great place to start:
Trends in Scanning and Capture




How much storage space will I require? Where are you going to store your images?


Just a few stats here to get you on your way:
• The standard scanned page can be estimated at 50K in size (at 300DPI)
• A file cabinet contains between 10,000 and 12,000 pages
This can give you a quick idea of how much storage will be required, and let you do some growth estimation over time.
You should also use these numbers to see if you should use the SharePoint DB for content storage, or utilize Remote BLOB Storage (RBS). SharePoint 2010 with SQL 2008 R2 allows this without the need for additional software through the FILESTREAM provider.


How will I view images once they are in SharePoint?


Without a viewer add-on, SharePoint will require you to open an image to view pages. This can be problematic if you are serving up large image files. Definitely take a look at some of the image viewer add ons to SharePoint. My favorite, VizitSP SharePoint Viewer, provides the ability to view/preview, annotate, image process, search (column based and full text) and have multiple images open in a tabbed view. This is an absolute necessity if you are going to give end users the best experience possible.

Just some questions to get the gears turning and make sure you get all the pieces to the puzzle.