We find ourselves in the 21st century using a 2000+ year old technology: paper. Paper provides an ideal media to store, transmit, and to modify information. History has proven that this media can last almost forever. But the conundrum is that paper does not scale and allow the efficiency of its digital counter part.
The process of handling paper is receive, interpret, classify, organize, and store. Once the information has been stored, you can then access is by locating in the organization structure, and viewing all the pages until the one in question is found. Digital technology can optimize the storing and retrieval process immensely if the proper implementation has been devised.
Over the years the holy grail of paperless has been the lack of it in offices. While this would be nice in concept, in practice it's a whole other story. Paper is sometimes the most efficient way to process and analyze information for a human. Take the task of reconciling a bank account. It's really simple to print a copy and cross through each transaction as it's been accounted for. Without paper, you would have to find another way to keep track of all the transactions accounted for without having the luxury of crossing through the entries on paper.
Backups
Computers crash, hard drives fail, RAID arrays get dropped. Keep your paper copies. The question is how much time to invest in them. When going digital, make sure to backup. You paper copies still serve as the ultimate backup. We recommend keeping those copies around, the difference is you can skip the organization and filling process and go right to storage!
If you going to organize papers digitally on the computer, don't organize twice: in paper files, and in digital files. Scan the material, and envelop or box by the date it was scanned. Store the paper. There is no need to organize the paper for quick access, that is what we will use the computer for.
Feasibility
Scanning document for computerized storage can be a daunting task if not done right. Files can be 25MB per page, making them useless should they need to be emailed. A properly scanned file can be as small as 81KB per page. This means you can attach 25 pages to an email before you reach the 2MB limit. For some email services, the 10MB limit mean you can attach 125 pages to an email.
For storage purposes you want to make sure that you have the capacity needed for your company, and that the files are as small as possible. Let's take a single person business for example.
ABC Services
ABC Services provide customers with the TV and Satellite equipment they need. Each day the owner, technician, and secretary John writes about 8 invoices. Each invoice is accompanies by a check, and work order. Half of the invoices are accompanied by a new service contract. Each contract is 5 pages front and back. Each week a deposit is made and about 5 bills are paid. All purchases of equipment, tools, and parts totals about 10 receipts. Add in account notices, vendor information, and customer information, this might yield:
Bills:
260 pg/yr
Checks Written:
260 pg/yr
Invoices:
2080 pg/yr
Contacts:
10400 pg/yr
Work Orders:
2080 pg/yr
Check Received:
2080 pg/yr
Receipts:
520 pg/yr
Client Files:
2080 pg/yr
Vendor Files:
540 pg/yr
Total:
20300 pg/yr
Based upon a sample if Internetwork Consulting's files, the average accounting scanned file size averages about 86.163 KB. That's a sample of 2983 files, totaling 251MB. The files are black and white PNG based images. This will translate into 1708.11MB for a year or John's files. It would take John approximately 599 years to fill 1TB (1000GB) of storage. If you had a company of 85 people, and they produced as much paper as John does, it would take 7 years to fill 1TB or storage. Now consider 2TB of storage. That would take a company of 170 employees 7 years to fill!
Only recently with the excessive size and low cost of disk space, has it become more particle to store files digitally than on paper. In about the space of a cubic foot you can store about 24,319,400 pages of letter sized pages. And when you want to access these files, you don't have to leave your desk and physically sift through an ocean of paper, but type you search into the computer and focus on another task until it's found for you. Another added advantage is that when you are done with the file, you don't have to put it up!
Now that you're hopefully sold on digital document filling, lets discuss the components of this system.
Scanner Selection
Selecting a scanner to go digital is not a task to be taken lightly. While most cheap "all-in-one" printers might suffice, there are some things to consider: time, paper handling, and quality. The demands of digitizing paper for archiving are minimal compared to the abilities of modern scanners. What you want to look for in a scanner is the speed: how many pages per minute; will you need to scan mishandled and/or mangled documents; and finally, what quality you are looking for: fax, screen, print.
Flatbeds, Feeders, ADFs, and Duplexing
Their are three major forms of document handling. Some scanner may offer multiple methods for scanning. For example, a flatbed scanner may be equipped with an ADF -- automatic document feeder.
A flat bed scanner has a lid that opens, and glass that you set the document on for scanning. This works great for items that don't feed nicely, or mangled paper that will jam a feeder.
A lot of flatbed scanners have an ADF that attaches to the flatbed lid. This is extremely useful as loading 20300 pages to scan per year can be daunting. The ADF can handle a majority of the paper coming through your office. The problem with an ADF comes with receipts printed on light weight paper, or mangled items proving a flatbed useful.
A feeder scanner skips the flat bed and pulls the paper over the optical elements to scan it. Much like an ADF, but without the flatbed, and usually in a smaller footprint. Some of these scanner use a drum instead of wheels resulting an more flexibility in what you scan (Neat Receipts).
Another nice paper handling feature is automatic duplexing. This allows you to capture front and back of the page in one pass. Some scanner do a manual duplexing requiring you to feed the document through twice.
Speed and Capacity
Scanner speed, measured in pages per minute, and how many pages can the document handler handle. Most cheap printers will advertise they can handle about 10 pages. They might not even advertise the speed -- if speed is not mentioned, expect slow.
Internetwork Consulting has a HP OfficeJet 5610 AIO that we started using for scanning. It was perfect to test going "paperless". This unit can scan up to 2400 DPI, 1.5 PPM, and the ADF can handle about 15 pages. Our cost was about $99.00, and compared to the scanners intended for this job was 1/3 the cost.
When you buy a $300+ scanner, you typically get a faster scan, and greater document loading capacity. Depending on the quantity of pages you plan on scanning per day, you will want to buy a scanner that does not turn it into an all day task. Look for at least 10+ pages per minute, and a capacity to hold the maximum reasonable amount your average document may be.
Resolution, Colors, and Format
Scanning documents to produce quality digital representations is not as easy as some might guess. First you need to know about how a computer represents the image. Each image is made up of dots, and the resolution is the number of dots per inch (DPI). The higher the DPI, the larger the file.
Historical DPI settings are as follows:
DPI
Reference Use
72
Screen
96
Typical Monitor
150
Fax
300
Draft Printing
600
Typical Laser
1200
Photo
2400
Typical Maximu
When scanning document, Internetwork Consulting recommends using about 300 DPI. This is twice as good as faxing, and half as good as printing. While this is supposed to be half as good as printing, you will be surprised to scan a document at 300 DPI and print it. It might even be hard to tell the difference between the scan and original. On the 300 DPI letter size page, you will be scanning approximately 8,415,000 dots into the computer (that's equivalent to an 8 megapixel camera).
If scanned as a color document you should expect about a 25MB file, as grey scale a 8MB file, and as black and white a 1M file. For most documentation purposes, scanning as a black and white will suffice: faxes have been black and white for 20 years. Only use color where necessary, and only on what is necessary since the file will be 24 times larger. Internetwork Consulting hardly uses gray scale: either black and white suffices, or software manipulation is required. When software manipulation is required we like the full spectrum of information color provides. A majority, 95+ percent, of our scans are done a plain black and white. For shares of gray, the scanner or scanner driver will "dither" the document making is simulated grey. This is what a laser printer does, as toner is black or none for white.
The key with black and white is the brightness. For example, alot of scanners on BW cannot pickup red ink. By making the brightness darker, you can get the scanner to pickup the red ink.
Finally, getting the file down to the target 81K. This is done through the file format used. Scanning to PDF can be a great solution, but it puts all the scanned images into one PDF file. For most of Internetwork Consulting's documents, we store them in PNG format. This is a format that will optimally compress for the number of colors you use. If your software doesn't support PNG, a compressed TIFF will suffice. Make sure the compression is CITT 4. This compression will pickup 20,000 white dots, and represent it by "saying repeat 20,000" time. Better than listing out each dot.
Based upon the "5% coverage" used by print manufactures, the PNG or CITT 4 compressed images will be anywhere from 5% of the 1M to 50% of the 1M. That mean you pages should always be from 50K to 500K. Internetwork Consulting's invoices usually scan to about 120K. This can be attributed to the grey background instead of using boxes. This grey background is coverage, and increases the file size.
JPG format is typically useful for pictures and photographs only. This format uses a compression optimized for pictures and video. It will usually generate a file the size of PNG, or larger. PNG uses both compression formats: when black and white, it uses a CITT 4 like compression, and when dealing with color uses a JPG style compression.
If you are using TIFF, make sure that you use compression. TIFF is a raw format by default that doesn't compress. This leads to a maximum quality image, and the largest file.
Organization
Now that we can scan to our hearts content using the proper setting and hardware, the question is how to organize it. When generating 20,000 or more files a year, you will need a system: either software based or manual.
Internetwork Consulting's reference system is a manual system. We found most of the commercial program to be lacking in the user interface design, and over complicating a time tested system. Our manual system is laid out a lot like our file cabinets before going digital.
Software
Software based systems promise to answer all the questions, but in practice, they sometimes just add more hoops to jump through. Try the software before spending a large sum of cash: enter at least a month of documents to it. You will quickly come to learn if the software will work for you.
Most of the software systems that were affordable were not network capable. Some of the network capable programs did not allow unabridged file system access to the files, while other were to complicated of the user interface was unbearable to look at. The network setup of some of these systems was more complicated than needed.
Finally the last hurdle was price. Lot of the systems start at $200 per computer, and work their way to $1500 per computer.