5 Keys to a Successful SharePoint Scanning Project

Below are 5 primary keys to implementing a successful SharePoint scanning / imaging project:

1.  Make sure you do some in depth storage planning.  

When imaging to SharePoint or Office 365, you need to make sure you plan for not only storage requirements, but also figure out the loading on your network.  Scanning, if done incorrectly, can great a huge burden on your network and bloat your content databases.  More info here: SharePoint Scanning Storage Planning

2.  Leverage existing scanning devices for the pilot project.

Giving users a familiar  interface will go miles towards acceptance.  Make it easy, and leverage copiers or other scanners within the organization to make the transition to paperless workflows familiar.  More on scanning hardware here: SharePoint Scanning Hardware

3.  Involve end users in SharePoint design.

I have seen so many projects where IT just builds what they think users want.  Make the layout of the site a collaborative effort, and build your site and library structures accordingly.  Map paper documents to digital, and leverage content types and managed metadata .  Finally, capture drives search, and make sure appropriate columns are put in place so users can find, sort and create views simply and easily.

4.  Leverage folders for quick adoption.

Here we go, the old folder argument.  Along with creating a familiar environment, users love folders, and they give quite a bit of power in the SharePoint world.    Adding them costs nothing, and they can be turned off for users who don’t want them.  Use folders.

5.  Automation is key, and necessary for standardization.

Make sure you utilize a scanning application that allows for standardization rule set.  Site, library, content type, folder, file naming and terms should all have the ability to be controlled and automatically set.  Automation makes standardization easy, and totally transparent giving you a repeatable, consistent scanning and capture process.

New SharePoint 2010 Limitations on Storage

Holy cow…4TB on content DBs.  Below is quoted from TechNet:

Content databases of up to 4 TB are supported when the following requirements are met:

  • Disk sub-system performance of 0.25 IOPs per GB. 2 IIOPs per GB is recommended for optimal performance.
  • You must have developed plans for high availability, disaster recovery, future capacity, and performance testing.

You should also carefully consider the following factors:

  • Requirements for backup and restore may not be met by the native SharePoint Server 2010 backup for content databases larger than 200 GB. It is recommended to evaluate and test SharePoint Server 2010 backup and alternative backup solutions to determine the best solution for your specific environment.
  • It is strongly recommended to have proactive skilled administrator management of the SharePoint Server 2010 and SQL Server installations.
  • The complexity of customizations and configurations on SharePoint Server 2010 may necessitate refactoring (or splitting) of data into multiple content databases. Seek advice from a skilled professional architect and perform testing to determine the optimum content database size for your implementation. Examples of complexity may include custom code deployments, use of more than 20 columns in property promotion, or features listed as not to be used in the over 4 TB section below.
  • Refactoring of site collections allows for scale out of a SharePoint Server 2010 implementation across multiple content databases. This permits SharePoint Server 2010 implementations to scale indefinitely. This refactoring will be easier and faster when content databases are less than 200 GB.
  • It is suggested that for ease of backup and restore that individual site collections within a content database be limited to 100 GB. For more information, see Site collection limits.

For more information on SharePoint Server 2010 data size planning, see Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).

SharePoint Scanning Planning – Part 1 – Storage and Sizing

With SharePoint Scanning and Capture, as with any project, planning is essential to success.  If you are going to use scanning software to send scanned images to a SharePoint Content Database, you need to lay some ground work.  This is the first in a series of planning articles.

One of the key areas of planning for any scanning/capture implementation is sizing and storage.   Many of the customers we work with have no real grasp on the volume of paper they deal with on a day to day basis, and when they make the migration to digitizing their paper, they are often quite surprised at the amount of paper they push through the system.  Obviously, this can cause some serious issues on many different fronts.   So how do you estimate the amount of paper?  There are several key conversion factors used by the document management industry, as outlined below:

 

Description Number of Pages Storage
1 Scanned Page – 8.5 x 11 1 50KB
1 Scanned Page – 11×17 1 100KB
1 File Cabinet – 4 drawers 10,0000 500MB
1 Box 2500 125MB
1 Linear Inch 100 5MB
1 E Size Engineering Drawing (48×36) 16 – 8.5×11 800KB

This table is a basic planning tool, and can be used as a starting point.  One thing to remember is that these are all standard pages.  Not full image magazine pages, but full text pages.  The other thing to keep in mind is that we have listed for boxes and file cabinets, the average number of pages contained within.  In the imaging world, we deal with images, not pages.  What is the difference?  A page may have 2 sides, which are converted digitally into 2 images.  So effectively, if you have a box with double sided pages you are scanning, you will have to double the storage required.

Some other key factors that can contribute to storage and sizing:

DPI Setting – one of the key questions we always receive is What DPI should I set on my scanner?  For most basic scanning and archive applications, you can set your scanner to 200 DPI.  If you are doing OCR or any type of advanced data extraction, you always want a 300 DPI image for maximum accuracy.  Anything beyond that is just a space killer, will slow down your process and really bloat your files.

Black and White, Greyscale and Color – always use black and white scanning to keep file sizes at an absolute minimum.  Greyscale and color scanning should only be used when absolutely necessary, as file sizes are just crazy.  Below is a table of file sizes for the same letter.  The letter was about 50% page coverage.

 

Scanning Mode/DPI File Size
Black and White – 200 DPI 26K
Black and White – 300 DPI 38K
Black and White – 400 DPI 51K
Black and White – 600 DPI 80K
Greyscale – 300 DPI 301K
Color- 300 DPI 577K

Image Processing – image cleanup can significantly reduce file sizes, and it is very important to use this feature whenever you can.  Despeckle, deshade, border removal, etc. will eliminate unnecessary noise in scanned images, and reduce your storage requirement by 10-30% depending on the quality of your documents.

Image Format – There is a lot of misinformation on the market about TIFF versus PDF.  I always hear “We want to store as TIFF because PDFs are just too big.”  Just not the case.  An image scanned to PDF is just a TIFF in PDF clothing (Or a PDF wrapper to be more exact).  The PDF overhead is almost negligible.  The de facto standard in imaging today is rapidly becoming the PDF image with hidden text.  This gives you a nice little file with the pristine image, and converted OCR text in the background.  The text layer adds negligible size to the file.

So now, with all this info, you can estimate volume in images, and then come up with required storage on a monthly, yearly or project basis.