Conducted a poll over about 3 months, and wanted to see what type of devices people were using to scan to Microsoft SharePoint. The results are in tune with what I see in the field, as folks are using a distributed scanning model with SP to put scanning into the hands of the knowledge workers. Below are the results:
Category Archives for sharepoint
What is “Bridging the Gap”?
The movement towards an office with less paper and more efficiency can be quite difficult, and with the wrong tools can end in failure. The key challenge is a process I call “Bridging the Gap”, which uses several applications to create a bridge between the physical and digital world, and helps create a seamless process. So what is required? How do you create the bridge?
On one side of the gap, you have your physical environment: file cabinets, inboxes, stacks of folders on desks, etc. There are two components that facilitate the crossing:
- Scanning Hardware – scanners allow the conversion of paper documents into digital documents or images. Organizations can use scanning copiers, fax machines or dedicated scanners to digitize.
- Capture Software – capture software works with the scanning hardware to create an efficient and automated bridging process. It controls the flow of digitized documents, standardizing how they are routed, and using OCR, Barcodes, Advanced Data Extraction (ADE) and other features to automate the collection of information. It spans the gap and creates a connection to the other side or the repository.
Once the gap has been spanned, the documents need to land somewhere, just as physical documents land in a file cabinet, inbox on someones desk or another location in the organization. Below are the two components that exist on the far side of the gap:
- Workflow Software – think of this as the digital inbox and outbox…on steroids. Workflow Software is utilized to create a digital mirror of your physical processes. It can move around files, create approval steps, automatically email and perform logic that usually requires intervention by a human. Some oraganizations dont have this entity on the other side of the gap.
- Repository – Think of the repository as a temporary and permanent file cabinet that can hold files during a workflow process, or as an archive copy once the whole process is complete. You can search, sort and organize, print, distribute and copy. Most repositories can allow full text search, if the capture software has created a searchable file format, and also allow column based searching for specific criteria.
I have seen many organizations try and bridge the gap, and not have one of the pieces above, or a piece that cannot suit all their needs. A missing component can impact the overall value of the system. For example, take a scanning copier that an AP department uses to scan invoices. They email themselves the scans, open them, rename them and then save them into their repository. Without capture software to automate the naming and routing, this is a highly inefficient process. Without capture, files are not made searchable through OCR, and this can also reduce effiency during search. Another example might be the lack of a repository that can provide all the bits and pieces an organization may require. Take the organization that just saves PDFs to a network directory. This may be fine for many organizations that merely need a simple archive to house their files. But what about an audit event, or legal issue that may require extensive searching and sorting?
“Briding the Gap” and creating an office with less paper can provide an organization countless benefits with proper planning and design, and the inclusion of all the above components.
SharePoint Scanning Planning – Part 1 – Storage and Sizing
With SharePoint Scanning and Capture, as with any project, planning is essential to success. If you are going to use scanning software to send scanned images to a SharePoint Content Database, you need to lay some ground work. This is the first in a series of planning articles.
One of the key areas of planning for any scanning/capture implementation is sizing and storage. Many of the customers we work with have no real grasp on the volume of paper they deal with on a day to day basis, and when they make the migration to digitizing their paper, they are often quite surprised at the amount of paper they push through the system. Obviously, this can cause some serious issues on many different fronts. So how do you estimate the amount of paper? There are several key conversion factors used by the document management industry, as outlined below:
| Description | Number of Pages | Storage |
| 1 Scanned Page – 8.5 x 11 | 1 | 50KB |
| 1 Scanned Page – 11×17 | 1 | 100KB |
| 1 File Cabinet – 4 drawers | 10,0000 | 500MB |
| 1 Box | 2500 | 125MB |
| 1 Linear Inch | 100 | 5MB |
| 1 E Size Engineering Drawing (48×36) | 16 – 8.5×11 | 800KB |
This table is a basic planning tool, and can be used as a starting point. One thing to remember is that these are all standard pages. Not full image magazine pages, but full text pages. The other thing to keep in mind is that we have listed for boxes and file cabinets, the average number of pages contained within. In the imaging world, we deal with images, not pages. What is the difference? A page may have 2 sides, which are converted digitally into 2 images. So effectively, if you have a box with double sided pages you are scanning, you will have to double the storage required.
Some other key factors that can contribute to storage and sizing:
DPI Setting – one of the key questions we always receive is What DPI should I set on my scanner? For most basic scanning and archive applications, you can set your scanner to 200 DPI. If you are doing OCR or any type of advanced data extraction, you always want a 300 DPI image for maximum accuracy. Anything beyond that is just a space killer, will slow down your process and really bloat your files.
Black and White, Greyscale and Color – always use black and white scanning to keep file sizes at an absolute minimum. Greyscale and color scanning should only be used when absolutely necessary, as file sizes are just crazy. Below is a table of file sizes for the same letter. The letter was about 50% page coverage.
| Scanning Mode/DPI | File Size |
| Black and White – 200 DPI | 26K |
| Black and White – 300 DPI | 38K |
| Black and White – 400 DPI | 51K |
| Black and White – 600 DPI | 80K |
| Greyscale – 300 DPI | 301K |
| Color- 300 DPI | 577K |
Image Processing – image cleanup can significantly reduce file sizes, and it is very important to use this feature whenever you can. Despeckle, deshade, border removal, etc. will eliminate unnecessary noise in scanned images, and reduce your storage requirement by 10-30% depending on the quality of your documents.
Image Format – There is a lot of misinformation on the market about TIFF versus PDF. I always hear “We want to store as TIFF because PDFs are just too big.” Just not the case. An image scanned to PDF is just a TIFF in PDF clothing (Or a PDF wrapper to be more exact). The PDF overhead is almost negligible. The de facto standard in imaging today is rapidly becoming the PDF image with hidden text. This gives you a nice little file with the pristine image, and converted OCR text in the background. The text layer adds negligible size to the file.
So now, with all this info, you can estimate volume in images, and then come up with required storage on a monthly, yearly or project basis.
Document Routing and Microsoft SharePoint
See a ton of companies struggling with the question: How do i get my copiers to scan to SharePoint?
I go back and forth on the idea of panel applications that enable intelligent routing at the copier. It always comes back to contention at the device. I recall one instance where an admin had all her documents piled on the copier, they were using eCopy, and she was scanning one document at a time, and sending them to SharePoint. During her 20 minutes of copier hoarding, at least 10 people walked up, and walked away.
There are several things that i believe are absolutely critical to enabling copiers as scanning and capture onramps to SharePoint:
- Document Separators are an absolute requirement!!! You have to be able to take a whole stack of documents, place barcode/routing separators between them, throw them all in the hopper and hit the green button.
- Intelligent Routing is required. Separators need to provide document intelligence, and give the user the ability to pre-index the document through the use of a barcode creation utility, or an Optical Mark Recognition (OMR) routing sheet with check boxes.
- Flexibility in routing is required. An application that can provide automatic routing to SharePoint based on barcodes or checkboxes can provide ultimate flexibility for the users. The ability to route to site, library and folder is necessary, and the need to set content type and file naming is also a key.
Here is a sample of a routing sheet: Scanning Route-SP-Dynamic-Template
Scanning, Capture and Managed Metadata
So, really what part does managed metadata play in the whole SharePoint Scanning and Capture scene? If you are unfamiliar with Managed Metadata and the Term Store, there is a great post here: SharePoint Managed Metadata. In SharePoint 2010, you have the ability to build a Taxonomy, or tiered classification structure for all content within the repository. So for example, you could build a level called Accounting Document Types, with sub-levels of AP, AR and Contracts. They could have lower strata that included each of the types of documents that could exist in that department. In SharePoint, this taxonomy is housed within the Term Store, and individual terms are stored within term sets.
So how do I use all of this in scanning and capture? Many capture applications are read only to the Term Store (If they even have the feature set), requiring you to build the taxonomy prior to deploying the solution. I would definitely recommend a capture app that allows dynamic building of the term store based on document characteristics. This allows you to “build as you go”, populating managed metadata columns with key information as you capture documents. PSIGEN just released this in their latest enhanced feature set for SharePoint.
To Folder or Not to Folder(In SharePoint). That is the question.
Should I use folders in SharePoint?
I am always in search of opinions on the use of folders within SharePoint.
Arguments For Folders in SharePoint:
- End users are comfortable with them. The transition to any new technology is always easier, and adoption rates higher the more end users can apply “old school” ways to any new interface.
- Folders, although merely logical in SharePoint, provide a hierarchical structure, and some standardization.
- For the power user, you can get rid of the infantile folders, and create a custom view that eliminates them.
- There is always the 2,000 (or is it 3,000? or maybe 4,000?) object limit within any view. My understanding is that folders in SharePoint can break up you library into segments so you dont need to worry about these limits in rendering a list.
- Logical structure can help down the line for any reorganization, export or migration of data and files.
- For scanning to SharePoint, most advanced capture technologies provide custom foldering as a migration method to SharePoint. Why not use it if it is there?
- Folders are “old school”, and have no place within SharePoint libraries, especially in SharePoint 2010. Customized views, content types and document sets should be utilized for organization and viewing.
- SharePoint should not be used like a file system, it is a database, and the search interface should be used to find what you are looking for in the content databases versus the folder “Hunt and Peck” method.
- Encouraging end users to create folders within a SharePoint Library will only lead to the end users “gone wild” scenario that happened to our file share system.
SharePoint and Document Redaction
Been seeing a large number of requests for redacting documents in SharePoint for the purposes of sharing documents with masked confidential information. Cool demo of SharePoint document redaction app:
GeoTagged Images in SharePoint?
Cool video on extracting geo information from smart phone images, and plotting them in SharePoint:
PSIGEN Releases PSI:Capture 4.0
Ok, talk about a game changer. Take a look at version 4.0 of PSI:Capture, the new release from the mature document capture company has over 100 new features. It provides the ability to perform Intelligent Character Recognition (ICR) to read hand printing, a whole set of new forms processing technology, enhanced Optical Character Recognition – OCR for SharePoint, and Dynamic Routing for SharePoint. For a list of features and functions, go to Document Capture 4.0-PSI:Capture.
AIIM Capture and Business Process Survey
Here are some great bullets from the latest AIIM Survey:
- The strongest driver for scanning and capture is improved searchability and knowledge sharing across thebusiness, followed by productivity improvements, reduced office costs and better customer service.
- 58% of SharePoint users are not storing scanned image files and only 9% are executing any workflow or BPMwith scanned images. File sizes and the ability to handle scanned image throughput are the biggest concerns.
- 39% of responding organizations reach positive payback on their investments in scanning, capture and BPMwithin 12 months, rising to 60% within 18 months. Automatic document classification shows a particularly highreturn for the 19% of respondents utilizing it.
- 60% of respondents have one or more capture and BPM systems. Of these, 39% have a single system in usefor all applications. Of those with multiple systems, 80% are looking to converge to a single system.
- Although respondents expressed a preference to source workflow and BPM as part of an ECM suite or as partof SharePoint, the decision maker for capture and BPM is likely to be a department or Line of Business head,compared to a Head of IT or Head of Compliance for the ECM system.
http://www.aiim.org/pdfdocuments/IW_Capture-and-BPM_2010.pdf
