Poll Results: What type of scanners are companies using to scan to SharePoint?

Conducted a poll over about 3 months, and wanted to see what type of devices people were using to scan to Microsoft SharePoint.  The results are in tune with what I see in the field, as folks are using a distributed scanning model with SP to put scanning into the hands of the knowledge workers.  Below are the results:

The Two Most Popular Features for Scanning to SharePoint

So, when I look at all the customers I have worked with, and examine the feature sets that are most applied in a SharePoint environment, there are two that stand out: Routing Sheets and Advanced Data Extraction.  I would say 90% of all my customers use these features in some way to make the process automated and efficient.  So, what are they?  How do they work?  Outlines below:

Routing Sheets

I have mentioned these quite a bit on the BLOG, and they lend themselves nicely to distributed scanning from MFPs/Copiers, Faxs, network scanners, etc.  A routing sheet is a combo of barcodes and/or checkboxes that can allow the end users to index prior to the scan.  The information can then be translated into metadata.  This feature requires Optical Mark Recognition, or OMR.  So make sure your scanning product supports OMR.  Below are some samples:

Legal Routing Sheet

HR Routing Sheet

Advanced Data Extraction (ADE)

Many of the solutions out there today support what is called Zoning, or the ability to pick information from a specific area on a page and enter it as metadata.  ADE takes that to a whole new level and provides the ability to match patterns and extract information.  So if a customer needs an order number that is 6 digits, and always starts with a 7, the extraction engine can search the whole page and extract.  This is a huge time saver, and allows the utmost in automation and verification of data.

 

What is “Bridging the Gap”?

 The movement towards an office with less paper and more efficiency can be quite difficult, and with the wrong tools can end in failure.  The key challenge is a process I call “Bridging the Gap”, which uses several applications to create a bridge between the physical and digital world, and helps create a seamless process.  So what is required?  How do you create the bridge?

 

On one side of the gap, you have your physical environment: file cabinets, inboxes, stacks of folders on desks, etc.  There are two components that facilitate the crossing:

  • Scanning Hardware – scanners allow the conversion of paper documents into digital documents or images.  Organizations can use scanning copiers, fax machines or dedicated scanners to digitize.
  • Capture Software – capture software works with the scanning hardware to create an efficient and automated bridging process.  It controls the flow of digitized documents, standardizing how they are routed, and using OCR, Barcodes, Advanced Data Extraction (ADE) and other features to automate the collection of information.  It spans the gap and creates a connection to the other side or the repository.

Once the gap has been spanned, the documents need to land somewhere, just as physical documents land in a file cabinet, inbox on someones desk or another location in the organization.  Below are the two components that exist on the far side of the gap:


  • Workflow Software – think of this as the digital inbox and outbox…on steroids.  Workflow Software is utilized to create a digital mirror of your physical processes.  It can move around files, create approval steps, automatically email and perform logic that usually requires intervention by a human.  Some oraganizations dont have this entity on the other side of the gap.
  • Repository –  Think of the repository as a temporary and permanent file cabinet that can hold files during a workflow process, or as an archive copy once the whole process is complete.  You can search, sort and organize, print, distribute and copy.  Most repositories can allow full text search, if the capture software has created a searchable file format, and also allow column based searching for specific criteria.

I have seen many organizations try and bridge the gap, and not have one of the pieces above, or a piece that cannot suit all their needs.  A missing component can impact the overall value of the system.  For example, take a scanning copier that an AP department uses to scan invoices.  They email themselves the scans, open them, rename them and then save them into their repository.  Without capture software to automate the naming and routing, this is a highly inefficient process.  Without capture, files are not made searchable through OCR, and this can also reduce effiency during search.  Another example might be the lack of a repository that can provide all the bits and pieces an organization may require.  Take the organization that just saves PDFs to a network directory.  This may be fine for many organizations that merely need a simple archive to house their files.  But what about an audit event, or legal issue that may require extensive searching and sorting?


“Briding the Gap” and creating an office with less paper can provide an organization countless benefits with proper planning and design, and the inclusion of all the above components.

Imaging File Size Comparison for Planning – Color and DPI

When planning for scanning to SharePoint, here is a quick matrix for the impact DPI and color can have on file size, and the size of your content DBs.

Scanning Mode/DPI File Size
Black and White – 200 DPI 26K
Black and White – 300 DPI 38K
Black and White – 400 DPI 51K
Black and White – 600 DPI 80K
Greyscale – 300 DPI 301K
Color- 300 DPI 577K

5 Tips for Optimizing Image Size When Scanning to SharePoint

I find quite a few customers are not optimizing their scanning process, and creating very large image files, slamming their network and bloating their content databases.  Below are 5 tips to live by when Scanning to SharePoint:

  1. Scanning at anything greater than 300 DPI is unecessary.   DPI can be a huge killer, and really bloat your file size.  For most instances, 200 dpi is perfectly fine for archive purposes.  If you are using OCR or performing data extraction, 300 dpi will give you a great quality image.  Anything beyond that will give you no better quality, but increases the file size exponentially.
  2. Use color and grayscale sparingly.  Color and grayscale files can be massive, and can be a huge burden on many different aspects of any SharePoint system.  Use them only when absolutely necessary, as black and white images are perfectly acceptable in almost every instance.
  3. Image processing is key.  Having an image processing engine that can despeckle, deshade and remove black borders will reduce file size and conserve storage.
  4. Check you copiers.  Most copiers today like to show off their fancy color capabilities and typically come with default settings to create color scans.  Check DPI and color settings to make sure your users unknowingly are creating massive files.
  5. TIFF or PDF?  This can be a whole additional conversation, and possible next post.  There really is no difference in file size for the same scanned image, and I find PDF is becoming the de facto standard in imaging.

 

 

5 Keys to a Successful SharePoint Scanning Project

Below are 5 primary keys to implementing a successful SharePoint scanning / imaging project:

1.  Make sure you do some in depth storage planning.  

When imaging to SharePoint or Office 365, you need to make sure you plan for not only storage requirements, but also figure out the loading on your network.  Scanning, if done incorrectly, can great a huge burden on your network and bloat your content databases.  More info here: SharePoint Scanning Storage Planning

2.  Leverage existing scanning devices for the pilot project.

Giving users a familiar  interface will go miles towards acceptance.  Make it easy, and leverage copiers or other scanners within the organization to make the transition to paperless workflows familiar.  More on scanning hardware here: SharePoint Scanning Hardware

3.  Involve end users in SharePoint design.

I have seen so many projects where IT just builds what they think users want.  Make the layout of the site a collaborative effort, and build your site and library structures accordingly.  Map paper documents to digital, and leverage content types and managed metadata .  Finally, capture drives search, and make sure appropriate columns are put in place so users can find, sort and create views simply and easily.

4.  Leverage folders for quick adoption.

Here we go, the old folder argument.  Along with creating a familiar environment, users love folders, and they give quite a bit of power in the SharePoint world.    Adding them costs nothing, and they can be turned off for users who don’t want them.  Use folders.

5.  Automation is key, and necessary for standardization.

Make sure you utilize a scanning application that allows for standardization rule set.  Site, library, content type, folder, file naming and terms should all have the ability to be controlled and automatically set.  Automation makes standardization easy, and totally transparent giving you a repeatable, consistent scanning and capture process.

PDFs and SharePoint: What is recommended??

When scanning to SharePoint, capturing pre-existing images, and creating searchable PDFs, there are several things you should make sure you can enable in your capture software.  Below is a laundry list:

  1. PDF + Hidden Text is the preferred format.  Most scanning devices/applications will allow you to create PDFs, but note that these are image PDFs, and not searchable.  The de facto standard right now in the imaging industry is the PDF image + Hidden Text format.  This requires a capable OCR engine to produce the text layer, and is what I call a “suitcase” document: it contains a pristine image, and a hidden text layer for search.  
  2. Ensure your document capture software can import PDF files.   Just about every organization has pre-existing scanned PDF files.  In almost every case, these are purely PDF Image format, and cannot be searched, or crawled through the PDF ifilter in SharePoint.  If your capture application can import and process PDFs, you have the ability to harvet these documents, extract metadata, and OCR them to create searchable PDFs, or PDF Image + Hidden Text format.
  3. Require the ability to create and populate custom PDF headers.   PDF headers allow custom metadata to be built into the core PDF file.  Why is this necessary?  Once again, I always go back to the “suitcase” analogy, you always want to pack everything you need.  If you create a searchable PDF, and pack metadata into the headers, the file is now an all inclusive data package.  Headers speed up search, and provide for flexibility if you ever export files, or import your PDFs into another system.
  4. Require support for the latest standard.  PDF – A is the latest and greatest standard, and  the goal of this ISO standard was to build a file format suitable for long term archiving.  Ensure you can support this option.