Quality Assurance is critical in any SharePoint Scanning Project
Ah, QA. Is it really necessary? During a recent session with Microsoft, we dove deep into the scanning QA process, and some best practices for insuring a high quality end product. This is a summary of the best practices we shared during our meeting, and 10 critical steps we have found to be of the utmost importance in any SharePoint Scanning operation:
- Have someone else check the work. Scanning can be a mundane, tiring task, and any knowledge worker that is processing over 10 documents per day should have a downstream individual check both their images and data. A scanning best practice has always been to create a task specific scanning “assembly line” where workers are responsible or specific tasks: Capture, Indexing and QA.
- Use automation whenever possible. Ah the human race…fantastic, but prone to errors. The technology exists now to automate data collection and data entry through a wide variety of means: Advanced Data Extraction (ADE), Database Lookups and many others. Using them can reduce data entry errors and make the process much more efficient.
- Use validation and exception processing. Most true capture applications can automatically examine data and make sure they meet set criteria. The most basic are required fields being left blank, but most of our customers use advanced validation that check the pattern of entered data, run database validation to ensure the data matches a corporate record and use custom scripting to enforce business rule sets.
- Always use expected count validation. Paper is paper is paper. Counting the pages, documents and folders being scanned, and entering them into a validation interface before scanning can insure all is scanned and counted. This enables a physical to digital validation, and prevents poorly stacked and prepped paper records from being left out of the process.
- Use scanning hardware that can help. Most scanning hardware today has double feed detection to make sure pages are not stuck together as they go through the feeder. These vary from sonar that sends a pulse through the paper, to technologies that check the length of the image. Canon has some great technology in this arena and can save you the pain of finding out 3 years down the line you don’t have the most important page.
- Use QA sampling to save time. Make sure you utilize auto-viewing and sampling to speed up the process. Most true capture applications will allow you to run a process that samples every nth image or page. This can be a huge benefit in organizations scanning high volumes.
- Check the repository. QA during capture is all well and fine, but there also needs to be a check of the end resting place. Why? To make sure the documents and data are being placed in the right location, with the right rules being applied and the right data fields populated.
- Using reporting to get a warm fuzzy. I always highly recommend using a dual stream output. Place your images and metadata in your repository, and then your data and scanning statistics into a reporting DB. This can help you track your scanning operations, and make sure all your documents are being processed. Some advanced customers bounce this data off a line of business system to find exceptions or missing documents.
- QA the images AND the data. These two go hand in hand. Having an interface that has a dual view where you can see both the batch/folder/document/image structure, as well as a spreadsheet off the data is paramount to making sure all facets of your end product are in order. Some key features: interactive image cleanup tools, blank page flagging, and a thumbnail view.
- Involve the document owners. I have seen time and time again where organizations like to have a 3rd party or non-process owners scan, index and QA documents. Unless the scanning operators are thoroughly trained in document types, classification and data, this can be a recipe for disaster. The fix? Have a document expert QA the documents downstream in the capture workflow to catch any errors.
Scanning to SharePoint with Barcodes
Barcodes drive scanning efficiency
I am seeing more and more creative barcode usage in “tagging” documents and driving them into the correct place in SharePoint. With the right capture solution, barcodes can be used to:
- Dynamically build the destination SharePoint URL for the site.
- Dynamically designate the SharePoint library.
- Build a custom filename and folder name based on barcode elements.
- Set content type, both on folder and document.
- Set document set information
- Populate metadata columns.
- Build the term store dynamically through managed metadata.
Now obviously you need an application that can facilitate the recognition process during scan to SharePoint processing. Below is an overview video of how the process can work.
SharePoint Scanning Created Efficiency
See the Case Study by PSIGEN on how a SharePoint Records Center is being used as a scanned document repository:
Microsoft Legal SharePoint Scanning Case Study
Don’t let Copiers Run the SharePoint Show
Scanning copiers / MFPs can arm unsuspecting users, and bring down your SharePoint farm with excessively large files. How? Here is the scoop, if you are using scanning copiers to send image files to SharePoint, note the below configuration issues with factory set copiers:
- Almost all copiers come pre-configured to scan in color. Those new fangled MFPs are getting better and better at scanning, and oh by the way, most either scan in color or have color auto-sense on out of the box. Why does that matter? Well, a typical page scanned in color is 20-30 times the size of a black and white scanned page. That single page file goes from 25K to 577K…just imagine that 30 page contract .
- Scanning DPI is almost always set above 200. In the scanning world, 200 DPI is typically fine for normal scanning operations. In some cases, when OCR or data extraction is in use, 300 DPI can help a bit. DPI exponentially increases file size.
- No pre-configured one touch buttons. Out of the box, most copiers leave all the settings to the end users. With no pre-configurd scanning profiles, or “one touches”, this can mean disaster. Beware of the 600 DPI, full color, uncompressed bit map coming your way.
Too Many Options for End Users
So, how do you prevent the inevitable? Below are some tips for recommended settings for MFPs:
- Black and White
- Only add color for specific departmental needs
- Use TIFF and PDF (no uncompressed formats)
- When available, use linearized PDF / Web Fast
End Users Gone Wild Part II
Have you armed your end users?
Have you set yourself up for failure with regards to SharePoint? As a former IT Guy, I can speak with experience. If there is nothing to govern standardization, there will be no standardization. Corporate fileshares are a prime example. The problem is, every user has their own way of filing, “foldering” and naming, and as a result, most organizational file servers are a prime example of “End Users Gone Wild”. So how do you prevent the SharePoint Organization Grenade from hitting your servers? Some tips below:
- Control and plan your architecture from the start. Like most Microsoft Technologies, SharePoint looks so easy. Click here, add here, create here. In the hands of the uneducated, a massive mess can immediately ensue. If you are deploying a SharePoint Scanning project, here are some great tips to initiate the planning evolution: 10 Tips for Planning SharePoint Scanning. There are also many other references on the web for basic architecture planning and deployment, here is a Technet Reference: SharePoint Deployment, Planning and Architecture. Do not let your deployment become a sandbox.
- Standardization is key. Use every possible measure to make sure the process of adding documents, scanned or not, is a repeatable and standardized process. SharePoint allows for a number of ”controlled” column types, like choice lists and managed metadata fields that are linked to a set structure. Providing users with an easy pick list for metadata can make their lives easy, and also standardize your information store.
- Use required fields. Pick a few pieces of metadata, and make them required for entry. Too many fields, and the users wont bother entering them, and you will quickly have a repository that is not searchable. Here is a post on defining how you want to find documents: How do you want to find your documents in SharePoint?
- Use third-party software to help in the quest for standardization. From a scanning perspective, there are some great third-party apps that can make adding documents a standardized repeatable process. These apps can control content type, taxonomy, folder structure and file naming to ensure a structured site. Automatically naming and creating folders for documents is extremely important, and makes your repository more usable.
A few tips to help. Know of any other best practices? Please add to the discussion.