The demand for information has exploded in the past few years as the Internet revolution continues in its third decade. Thanks to the advent of the Internet, humanity has published so much data online that it is just too large for an ordinary person to digest. Seriously, I doubt that Superman or the Flash could get through it all.
If you decide to learn a little something about the World Wide Web, you will find that many articles claim a typical search engine index is composed of 1 trillion documents and that these indexes may add up to 1 billion documents every day. While a vast amount of these Web pages vanishes when big hosting companies (like Yahoo!’s GeoCities or Vox) shut down, online information storage continues its upward spiral in adding capacity and data that is stored.
One person simply cannot find it all, read it all, or even begin to understand it. But what makes all this online information truly overwhelming is that the numbers I gave above only apply to those sites that are part of the “Indexable Web” or the “Light Web”. Some people say many trillions more Web-ready pages are buried in uncrawlable indexes and databases dubbed the Hidden Web or the Deep Web. And I am sure it’s true that such unreachable archives have their own search engines and may only hide behind subscription barriers, or they may be published in proprietary formats. Also, developers for these archives provide specialized search tools that let you search all that remote, hard-to-reach content.
Somewhere between these two regions, which differ by only a few factors, exists the intersection for public archives. Typically denoted “public records”, public databases provide native search functions yet they are often made more accessible from commercial background records search utilities. According to the Background Records blog there are many, many public record Web databases, all of which can be searched in some way. Some of them are indexed in Bing and Google but many of them can only be searched by their own tools or by third-party indexes.
These background records most often come from federal or state archives although some are published by for-profit archives, like business and telephone directories, class or school reunion sites, and so on. In the same way a job site exemplifies typical people records management, these archives exemplify “life records” management. And for all that popular models relate public records with records from government archives.
In fact, one area where public records search is fast becoming important is in processing job applicants. Human resource managers may run very quick searches on potential new hires to see where they have lived and worked, and if they have any criminal background information. This kind of fact-checking may turn up questions about information provided on (or omitted from) job applicants’ resumes.
When you decide to sift through public data to learn about a job applicant, if only to do a detailed background review, you may not have time or the proper resources to search through that much data. This is why the people search industry has become big commerce. I have been told that a number of financial analysts from several sources count the industry’s annual revenues in the billions of dollars. And the reason why they make so much money is that finding all that information is just impossible for any one individual.
It may sound cheesy to say so, but typical Web search only lightly brushes the volume of the data universe. The people search industry faces an uphill climb even though it is only focused on a small portion of the entire Internet. In addition to finding the information, sorting it out and producing high-quality reports is not very easy. Quite a few academic papers talk about the reliability and economics of background checks. One lesson we should learn from all this is that the quality of your online background profile may have a very significant impact on your lifestyle, and maybe there is very little you can do about that.