Poll Results: What type of scanners are companies using to scan to SharePoint?

Conducted a poll over about 3 months, and wanted to see what type of devices people were using to scan to Microsoft SharePoint.  The results are in tune with what I see in the field, as folks are using a distributed scanning model with SP to put scanning into the hands of the knowledge workers.  Below are the results:

Ooops. Did we backup the file cabinets?

If you look at the headlines over the past few years, you cannot help but notice the number of natural disasters that have occurred. In my conferences with IT and Departmental Management, I always pose the question when discussing business continuity or disaster planning: Do you have a plan for your paper?

Just about every company has implemented some type of plan for backing up their important digital files. Some go to the extreme with data snapshots that can be recovered from multiple locations. But companies typically don’t take the same strategy with their paper assets. The good ole file cabinet, the protector of all things paper will provide protection, right?

Companies need to take a good hard look at their paper, and assess the business impact should disaster destroy their file room. Backing up your paper nowadays is not hard, nor expensive when compared to the legal implications and time it would take to reproduce (if possible) contracts, customer files, sales records and the like.

Any paper backup plan involves a concept i call Bridging the Gap (BTG). BTG involve hardware and capture software to digitize and build the bridge to the digital world, and then a repository on the “other side” to house the records and make search and retrieval simple. The repository can be as simple as a set of named network folders, or as complex as a true ECM system like MS SharePoint. Take the initiative and backup your paper today.

The Two Most Popular Features for Scanning to SharePoint

So, when I look at all the customers I have worked with, and examine the feature sets that are most applied in a SharePoint environment, there are two that stand out: Routing Sheets and Advanced Data Extraction.  I would say 90% of all my customers use these features in some way to make the process automated and efficient.  So, what are they?  How do they work?  Outlines below:

Routing Sheets

I have mentioned these quite a bit on the BLOG, and they lend themselves nicely to distributed scanning from MFPs/Copiers, Faxs, network scanners, etc.  A routing sheet is a combo of barcodes and/or checkboxes that can allow the end users to index prior to the scan.  The information can then be translated into metadata.  This feature requires Optical Mark Recognition, or OMR.  So make sure your scanning product supports OMR.  Below are some samples:

Legal Routing Sheet

HR Routing Sheet

Advanced Data Extraction (ADE)

Many of the solutions out there today support what is called Zoning, or the ability to pick information from a specific area on a page and enter it as metadata.  ADE takes that to a whole new level and provides the ability to match patterns and extract information.  So if a customer needs an order number that is 6 digits, and always starts with a 7, the extraction engine can search the whole page and extract.  This is a huge time saver, and allows the utmost in automation and verification of data.

 

What is “Bridging the Gap”?

 The movement towards an office with less paper and more efficiency can be quite difficult, and with the wrong tools can end in failure.  The key challenge is a process I call “Bridging the Gap”, which uses several applications to create a bridge between the physical and digital world, and helps create a seamless process.  So what is required?  How do you create the bridge?

 

On one side of the gap, you have your physical environment: file cabinets, inboxes, stacks of folders on desks, etc.  There are two components that facilitate the crossing:

  • Scanning Hardware – scanners allow the conversion of paper documents into digital documents or images.  Organizations can use scanning copiers, fax machines or dedicated scanners to digitize.
  • Capture Software – capture software works with the scanning hardware to create an efficient and automated bridging process.  It controls the flow of digitized documents, standardizing how they are routed, and using OCR, Barcodes, Advanced Data Extraction (ADE) and other features to automate the collection of information.  It spans the gap and creates a connection to the other side or the repository.

Once the gap has been spanned, the documents need to land somewhere, just as physical documents land in a file cabinet, inbox on someones desk or another location in the organization.  Below are the two components that exist on the far side of the gap:


  • Workflow Software – think of this as the digital inbox and outbox…on steroids.  Workflow Software is utilized to create a digital mirror of your physical processes.  It can move around files, create approval steps, automatically email and perform logic that usually requires intervention by a human.  Some oraganizations dont have this entity on the other side of the gap.
  • Repository –  Think of the repository as a temporary and permanent file cabinet that can hold files during a workflow process, or as an archive copy once the whole process is complete.  You can search, sort and organize, print, distribute and copy.  Most repositories can allow full text search, if the capture software has created a searchable file format, and also allow column based searching for specific criteria.

I have seen many organizations try and bridge the gap, and not have one of the pieces above, or a piece that cannot suit all their needs.  A missing component can impact the overall value of the system.  For example, take a scanning copier that an AP department uses to scan invoices.  They email themselves the scans, open them, rename them and then save them into their repository.  Without capture software to automate the naming and routing, this is a highly inefficient process.  Without capture, files are not made searchable through OCR, and this can also reduce effiency during search.  Another example might be the lack of a repository that can provide all the bits and pieces an organization may require.  Take the organization that just saves PDFs to a network directory.  This may be fine for many organizations that merely need a simple archive to house their files.  But what about an audit event, or legal issue that may require extensive searching and sorting?


“Briding the Gap” and creating an office with less paper can provide an organization countless benefits with proper planning and design, and the inclusion of all the above components.

SharePoint Scanning Planning – Part 4 – Document Scanning Models

Document Scanning Models

After doing some planning on the hardware types and document scanning volumes, the next step would be to examine what type of model you need to deploy.  There are typically 3 standard  models for document scanning and capture: Centralized, De-centralized and Distributed.

Each model has its own pros/cons, and below I will examine each, and dive into some detail.

Centralized

Ah, the centralized model.  Some call this old school scanning and capture, as for many years, this was the only way to get the job done, and convert your paper to digital form.  This model provides a centralized scanning center to provide mass conversion for the organization.  The operation can be run by in house personnel, be managed by a services provider in house, or be outsourced to a scanning service bureau.  It requires high volume/high speed hardware, and typically utilizes advanced capture software to allow for the utmost in automation and efficiency.  The software and hardware operators are typically highly trained, and there are usually only a few of them.  Paper and/or digital media is shipped to the centralized location and processed through a set, standardized capture workflow.

Centralized Pros

  • Easily standardized process due to a limited number of skilled/trained scan operators
  • High speed hardware/software results in minimal processing time once paper is received
  • Centralized reporting and control of overall process
  • No loading on WAN infrastructure
  • Centralized backup and restore

Centralized Cons

  • Usually a high time delay for availability of documents
  • High cost due to shipping of documents
  • High maintenance costs
  • High training costs to bring on new operators
  • Disaster recovery planning issues if centralized site is down
  • Operators are typically not knowledgeable in the documents they are indexing

Decentralized

Over time, as bandwidth and scanning hardware/software prices went down, the obvious move was to decentralize the whole scanning and capture process.  This move placed scanning in the branches, and allowed the whole document capture process to be performed by those who had working knowledge of the documents.  Smaller, desktop class hardware could be used, and most capture companies made batch scanning and upload to the centralized repository simple to accomplish.

Decentralized Pros

  • Scan operators are well versed in the documents they scan
  • Documents are available almost immediately
  • No shipping or transfer costs for documents
  • Branch control of the whole scanning process

Decentralized Cons

  • Standardization can be an issue
  • No centralized control or reporting
  • WAN Bandwidth consumption can be high
  • Licensing costs can be high depending on software utilized

Distributed

The advance of network-based scanning devices and the lowering of bandwidth pricing led to the newest model, the Distributed Model.  Distributed Scanning allows for just about anyone in the organization to walk up to a network scanning device/scanning copier/fax machine and send documents to a repository.  The devices are typically multi-faceted, and along with repository integration, can provide scan to network folder, FTP and email.  Collaborative back-end systems, like Microsoft SharePoint, lend themselves nicely to this model, as they allow anyone to participate in a Document Workspace.

Distributed Pros

  • Put scanning in the hands of everyone in the organization
  • Provides a great launching pad for collaborative solutions
  • Simple, easy to use interfaces allow for minimal training and quick adoption
  • Capture and indexing is now in the hands of the true document owner
  • One-to-many solution provides a single device to service many users

Distributed Cons

  • Lack of standardization without software addition
  • Security and document control can be major issues
  • Bandwidth from smaller branches can be a problem with larger scans
  • Lack of hardware integrations with back-end systems

So, most organizations today are combining the above models to create a Hybrid Scanning and Capture solution, and leveraging all the strengths together to minimize the weaknesses of any one model.   Another strategy is to tie scanning models to specific business processes, as most lend themselves nicely to specific scanning and capture workflows.

Hardware and Choosing Your Scanning Model

 

Most organizations will choose their model to leverage their existing hardware investment, but this can be lead to decisions that seem good at the time, but if deeper examination occurs, it can make sense to realign hardware with the best model.  Take for example, a company that instantly leans toward a distributed model, and attempts to leverage their copier fleet that is currently under lease.  If you examine the part of this guide that covers scanning hardware, copiers will not always fit for the type of scanning you need to perform.  Take for example a branch accounting department that is looking to scan receipts or check stubs.  Will the copier perform well with mixed original sizes?  Just a word of caution to examine the paper, workflow, and document types to get the best feel and adapt the best model.

Capturing Non-image and Office Documents into SharePoint

So, there have been a few clients lately that have been asking intently on how to capture non-scanned/non-image documents.  There seem to be a ton of apps for scanning on the market, but not many to process Office Docs and others.  Quite a few people are not aware of the built in features that you can utilize with Office 2010 and Sharepoint 2010.  First off, in most cases you can use WebDAV to share SharePoint content, and it will behave just like a network folder.  You can navigate sites, libraries and folders, and save documents into them just through drag and drop.  I have this setup on our SP server, and routinely add volumes of documents with a simple drag or copy.

Secondly, in Office 2010, you also have the ability to present document properties, which will show all your SharePoint metadata fields.  This is a huge time saver, and requires no 3rd party app to save and add metadata.  Save yourself some big money and check it out before you spend money on an app that just does what Office 2010 gives you natively.  Just some thoughts.

SharePoint Scanning Planning – Part 3 – Scanning Hardware

Now that I have covered Sizing and Storage in Part 1, and Document Separation in Part 2, now we can start to take a look at scanning hardware.  There are several key questions you need to answer:  Can I use pre-existing hardware such as copiers or fax machines?  Do I need a dedicated scanner?  If I choose to buy a scanner, what features/characteristics are important?

 

Some may argue you need to decide on a scanning model before you dive into hardware (distributed, centralized, or decentralized), but I will cover this in the next section.

 

So let’s start with a key question:

 

Scanning Copier or Dedicated Scanner??

Scanning Multifunction Peripherals (MFPs/copiers) have become standard in most offices. I receive the same question all the time from prospects and customers: Can’t I just use my copier for scanning? In many cases, for a typical office, with typical documents, a copier can be an appropriate component to any scanning solution. As offices become more complex in the way they handle their documents, or they expand their scanning efforts to other departments, dedicated scanners are usually required to achieve the desired result.

Below are some interesting statistics provided by InfoTrends:

· 65 % of office workers use digital copiers/MFPs
· Over 50% use the “scan” feature daily
· 71% expect scanning requirements to increase from year to year
· 72% believe it is necessary to view images before processing
· 36% will require dedicated scanners versus MFP devices
· 36% believe they will need both scanners and MFPs

So what are the benefits/drawbacks to scanning with both types of devices? Below is a summary:

Benefits of MFPs as scanners:

  • Leverage your existing investment in the MFP
  • Most copier maintenance plans do not charge for scans, so you get “free” maintenance for the scanning function (no print/copy, no click charge)
  • MFP manufacturers are really focusing on scanning capabilities: fast speeds, better quality and enhanced drivers, etc.
  • Network scanning functions:
  • Scan to email
  • Scan to Windows Folders
  • Scan to FTP
  • One-to-Many relationship: all workers can use one device.


Drawbacks of MFPs:

  • Contention – copying, scanning and printing may cause “a line at the copier”
  • Poor performance with differing paper sizes
  • Lack of color dropout (Scanning blue or black backgrounds will result in a black page)
  • Lack of image correction capabilities (auto deskew, despeckle, black border removal, streak removal, etc.)
  • Small Document Feeder sizes (50 – 100 pages)
  • On average, file sizes are 10-20% larger
  • Duplex scanning/DPI increase greatly slows down rated speed
  • Black and White scanning only on some models


Benefits of Dedicated Scanners:

  • Convenience – scan at your desk
  • Duplexing does not slow down scanner
  • Color dropout
  • Superior image quality due to enhancement features
  • Ease in handling differing paper sizes/types
  • Larger document feeder selections (up to 1000+ pages)
  • Smaller file sizes
  • Ability to preview scanned documents at scan time


Drawbacks of Dedicated Scanners:

  • One to One relationship – directly connected to PC
  • Additional Maintenance costs


Above are all the pluses and minuses, but in a nutshell, when should you use a dedicated scanner?

  • Scanning 50+ documents per day
  • Workers that are constantly scanning throughout the day
  • Mixed paper sizes, weights and colors
  • Poor quality, older documents or when image enhancement is required
  • OCR or ICR applications
  • High volume copying and printing environments
  • Large Document scanning
  • High security environments

 

 

Now that you have an idea of the pros/cons of both types of scanning devices, now let’s take a look at the different features of scanning devices, and what to look for when purchasing a dedicated scanner.

 

Scanning Speed

 

Scanning speed is a main area of focus when researching scanning hardware. A scanner’s speed is usually directly proportional to its price, but you have to ask yourself one question: How long do you have to accomplish your scanning tasks? If you buy that cheapo scanner at an office products store that scans at 8 pages per minute, good luck in getting those 10 file cabinets scanned. Another note to mention is that all the manufacturers rate their scanner speeds at 200 DPI. If you need high quality images, or are performing OCR, 300 DPI will probably be necessary. This will significantly slow down your scanning speed, as will color scanning and duplex (2-sided) scanning on some models.

 

Document Feeder Capacity

 

The document feeder provides you the ability to load anywhere from 1-1000+ sheets into the scanner. The feeder capacity you require all depends on the volume of paperwork you are scanning, and if you are using an intelligent capture application that provides the ability to use separator sheets to split documents automatically. If you are a Law Firm that routinely scans 200 page documents, then that is a good starting point for your feeder size requirements. This allows you to load your documents, and then let the scanner do the work.

 

Another focus area related to the feeder is the maximum and minimum paper sizes. If you intend to scan legal size paper or insurance cards, make sure the scanner can handle them.

 

Daily Duty Cycle

 

The Duty Cycle (DC) is a rating of the scanner’s durability, and defines just how much paper you can feed through the hardware in a day. If you are scanning 3000 pages per day, you do not want to buy a small desktop scanner with a DC of 750. What happens if you exceed this number? Nothing to begin with, but as time goes on the wear and tear on the unit will begin to show in the form of jams, miss feeds, skewing, etc. This number is also tied to the replacement of consumables (rollers and pads). If you continually exceed the DC, you will more than pay for a higher level scanner in consumables over time, and your maintenance costs may go way up.

 

Scanning Mode

 

Most scanners nowadays can scan both sides of your document, but there are still some lingering models that will only do simplex scanning. Also, if you have the requirement to scan color documents, ensure that color scanning is supported.

 

Warranty and Service

 

All warranties are not created equal. Some scanner manufacturers provide “depot” type service where you have to ship your scanner for warranty service. Others will provide onsite warranty service for a specified period of time. Along with this, the time period on the warranty also varies everywhere from 30 days, to a full year. Scanner service is a separate purchase, and in some cases, can be a shock to the purchaser. A basic service plan on a mid-range scanner can cost over $1000 per year. Get an advanced plan that provides Preventative Maintenance visits, and you could be in the $1500 – $2000 range, depending on your model. Get all the details up front, and some manufacturers will provide multi-year discounts on service.

 

Image Processing

 

Definitely investigate the image processing software that comes bundled with your scanner.  This software will improve the quality of your images, remove shading, borders, etc.  Many of the manufacturers now provide third party image processing software (Kofax VRS), but several have their own built into their drivers.  Most capture software also has built in image processing components as well.

 

So hopefully this will answer the majority of your questions on hardware.  Remember, hardware is just part of the overall capture solution.  Follow on articles will cover information on software selection and required features.


SharePoint Scanning Planning – Part 2 – Separation

Document Examination and Separation

One of the key steps in preparing for document scanning and capture is to identify how you will separate or split documents.  What is separation and how does it work?  Details below:

For those of you that are new to document management and capture, document separation is the notion of how we can determine when a document begins and ends.  With most simple scanning software, this process is easy.  You load a single document in the feeder, click scan, and when it is done, you name it and save it.  With advanced capture, you can load multiple documents into the feeder, scan them all at once, and use a separation method to split them into individual digital documents.    This is a massive time saver.  Imagine loading 20 individual documents into a scanner one at a time, scanning each individually, and then entering information about each.   Below are some key separation methods any advanced capture suite should have:

Fixed Page Count Separation – This allows you to split based on a certain page count.  So if you scan a stack of 100 two page forms, you will have 50 separate documents in your capture interface.

Barcode Separation – probably the most pervasive separation method is a barcode separator.  Place a sheet with a specific barcode pattern between each document, and you are off to the races.  To give you the most flexibility, applications should support the following enhanced barcode separation methods:

  • Separate on any barcode
  • Separate on specific barcode terms and patterns
  • Separate on barcode type
  • Separate on barcode count
  • Separate on a certain number of barcodes on a page
  • Separate when a barcode changes

You want to make sure your barcode engine supports 1D and 2D barcodes without the purchase of any expensive modules or add-ons, and it should also have a simple feature that lets you split 2D barcodes and identify separation terms.

Patch Code Separation – So what the heck is a patch code?  Just an old school horizontal barcode.  Below is an example.  If you work in the medical field, most medical billing forms will have these on them, and some scanners actually support using patch codes to shift scanner settings during the scanning process.  For flexibility, choose an application that supports patch code separation.

Scanning Patch Code

Patch Code Example

Optical Character Recognition (OCR) Separation – OCR is the process of converting a scanned or imported image into searchable text.  OCR separation searches for a key word, term or phrase on the document, and will recognize that page as the first page in a new document.  This is a preferred method, as you don’t have to kill trees to print cover sheets, and it makes document preparation simple (no inserting separator sheets).  For example, if you are scanning contracts, and you want to split when you find an 8 digit contract number in the right hand corner, this comes in very handy.  There are several key requirements in this feature that are absolutely required in your application to make sure you get high separation accuracy:

  • Scan at 300DPI and use an app that has image processing software to clean up the page.  Also, your image processing engine must allow processing of imported PDFs and TIFFs if you plan to harvest documents.  Some image correction/processing engines only work with scanners.
  • Insure you capture application allows you to use expression matching (Regular expressions) so you have the utmost flexibility in finding separation patterns.
  • Character sets are key.  These provide the ability to tell the OCR engine the type of characters you are looking for (A-Z, 0-9, etc), so if it misidentifies a character, it auto-corrects the information.
  • Finally, top line applications also allow you to separate when OCR terms change.  So you can look for that contract number, and only split when you find a new one.

Intelligent Character Recognition (ICR) Separation- ICR is the process of converting scanned images of hand printing to text.  This method can be utilized to split pages when certain patterns in hand printing are detected.  Note:  all of the features required to insure accuracy for OCR separation should also be considered if you utilize this method as well.

Document Import and Separation – There are several separation methods that can be key to success if you need to import large volumes of documents, or you want to process documents scanned from copiers, network scanners, or fax machines.  Below is several separation methods required for any document capture from imported files:

  • New File Separation – This method of separation will look at a directory, pick up files, and maintain each new file as its own digital document.
  • Folder-based separation – This is a key method if you are importing documents and want to combine them based on the folder.  One example might be a law firm that has a folder structure of case documents on different subjects for the case and wants to combine each folder into a single PDF file.

Blank Page Separation – I only mention this as I would always, always avoid it unless absolutely necessary, especially if you are scanning in duplex.  Most implementations of this method, unless operated under strict preparation by knowledgeable operators becomes an absolute mess. (Just my humble opinion ;)   )

Separation Scripting – Finally, for those rare and special occasions, you always want a product that has a pre-built scripting interface for customizing the whole process if necessary.  Now let me be clear, not a sales rep “Yeah we can do that” (Which usually means $20,000 in professional services), but a product that has simple hooks into the separation function, that allows you a simple “yes or No” based on some parameter or criteria that anyone with basic scripting skills can write.  When would you use something like this?  Usually for very complex jobs where the original documents cannot be modified, but you need to put some logic in place to spit documents.

The last separation topic I want to cover is something called triggered separation.  Let me set the stage on this one, and describe a process which is near and dear to every accounting manager’s heart, invoices.  So you have a stack of invoices, some single page, some multi-page and you are struck with a dilemma.  If I use barcode separators, and I have 100 single page invoices, do I really have to put 100 barcode separators between them all?  Separation triggers allow you to scan single page and multi-page documents all together.  So in this example, you can stack your singles, and then put separators between your stack of variable length separators.  Put a trigger sheet between the two stacks (this tells the capture software to switch from single page separation to barcode-based separation), and scan the whole stack in one fell swoop.  This is a huge time saver in high volume environments, and can allow you to also build redundant separation logic, so you get the highest accuracy in separation with the least amount of document preparation.  Phewwww.  That was geeky.

Do you really need all of this?  Does separation have to be that complex?  The whole goal here is to have as much as you possibly can in the tool kit to insure you can meet all the capture needs within your organization.  I liken it to buying the base model of a car with no accessories, and then wishing every day you had power windows, the iPod Kit, cruise control, 4WD, etc.

So now you have examined your documents, and figured out how to efficiently scan and split.