Wednesday, December 5, 2012

Methods for Searching Electronic Documents


By Karl Obayi


Expert Author Karl Obayi
When it comes to searching for electronic data contained inside a computer for use in litigation, the meaning of "search" takes on a broader and detailed scope. Searching for data on a computer is obviously not the same as searching for a document within a file cabinet.
With the file cabinet search, you are dealing with printed paper, you can see, touch, smell and feel. However searching for data that resides inside a computer amounts to searching for documents you cannot feel, smell or touch.
Interestingly, when you see your document on the computer screen as text or numbers or a combination of both. What is representing those text and numbers on your screen are actually a combination of 1 and zeros known as binaries.
In summary, the computer does not store data in English, French, German or Spanish. The computer stores data as different combinations of 1's and 0's ( Binaries). However the computer has an automated way of converting what you type on your keyboard to what you can understand on the screen. How does it do this? Patience my friend. This will be the topic of another article.
But for now, let me introduce you to some methods for searching for computer data that is not printed on paper but rather resides in the computer Hard disk as 1's and 0's.
Because of the volume of documents we can now store on computers, the job of searching for documents is becoming a more complex undertaking especially if the computer in question was used by a third party.
Computer users, employ different names to save their files. Some of these names, will suggest what the file contains but a lot of file names will not suggest the file content. For example I may save a file with the name "orange" on my computer. If you conduct a search on my computer for the word "fruit", the document saved as "Orange" on my computer will not be revealed.
The above position represents what happens with a general search on the computer. However, there are methods and specialised computer search applications that can do a better job. For example a search for "fruit" will reveal the a file saved as "orange".
Consequently, detailed search for documents on the computer require further knowledge and techniques. This generally requires a starting point referred to as a "keyword" search. We will now touch on various types of searches. But first what is a "keyword"?
Keyword
Very often with computer searches, the emphasis is first, not related to the name of a document but rather the search is for an operative word or string of data that will lead to all the relevant document that contains the search term. Consequently the use of "keywords" is intrinsic to computer search.
Your ability to formulate the appropriate keyword or search terminology is crucial. The computer during the search process, will only respond to the questions you have asked. If you input a search for "oranges", the computer will not search for cars. However, the computer is smart enough to know that oranges and grape belong to the same class. You must then find a way of telling the computer to return grapes as a word even where you have searched for oranges.
To drive the point further, the computer will generally not return the word "died". if you searched for the string text "passed away". However, there are methods of making the computer return searches of words that belong to the same family of meaning.
Let us now address some of these methods that will help enhance our search capabilities on computers. Note, that some of these techniques are only possible by the use of special software installed on a computer e.g. dtsearch application. Search applications, enhance the computers ability to go beyond the ordinary search for keywords but instead enables the computer to conduct search in a more detailed and humanlike manner.
Indexed search
Given the volume of data that computers can now store, thanks to the ever decreasing cost of computer storage (Hard drives), a small laptop for example can now hold a Billion bytes of data or more.
The more the data, the longer the time it takes for the computer to process a search request to look for a particular file or keyword. However the computer can proactively compile a table of all the instances of the word gathered from the documents on the computer. Subsequently anytime a search is conducted the computer will reference the table of words. This table of words complied is known as an index. It's primary purpose, is that it saves time.
Terms like " live search" refer to searches that do not generally make reference to the index table but instead, the search is conducted directly on the computer storage device. This takes more time and consumes more computer resources.
Noise words
To increase the efficiency of the search, most specialised search applications, Ediscovery and forensic applications will provide an opportunity to exclude what is generally referred to as "noise words" during a search. This approach saves time and computer resources. Noise words are the usual everyday word you generally will find in most documents employed in everyday writing. e.g. to, a, was e.t.c. Other terms that can also be excluded from a search are well know file names e.g. names belonging to files created and used by the computer system automatically e.g. anti virus files and user profile files.
Concept search
This concerns the use of search queries to identify a group of documents that have a similar thread, a similar subject matter. This approach is aimed more at locating a group of documents rather than just one document.
Predictive search
As the name implies, this is a search methodology that seeks to predict the class of documents containing the desired word or text which is the subject of the search. It involves the computer analysing a group of documents and drawing some distinction or similarity in contents in the grouped documents and applying that sampling to all other searched documents.
Boolean search
A boolean search allows the use and combination of words during the search for digital documents. special words know as boolean operators are used for these purpose for example. AND, OR, NOT AND NEAR. The use of the boolean operators allows you to limit or expand the scope of your search.
Most internet browser e.g. Google actually default to the use of boolean operators during a search operation.
Fuzzy search
Fuzzy searching will find a word even if it is misspelled. For example a search for the word "orangos" may return a search result of "oranges". This type of search will greatly assist in circumstances where you suspect there may be issues with the spelling of relevant words, especially for documents that have been scanned and likely to contain omitted text.
Phonic search
A phonic search will return words that sound almost alike and begins with the same letter. e.g. a search for spade may also return the word Spain.
Synonym search
A synonym search will return words with the same or almost similar meaning e.g. a search for the word "buy" may return the word "purchase".
Stemming search
A stemming search will return extended patterns of the search word; for example a search for cat may return words like cats, catlike etc.
In conclusion, it is possible to conduct a more detailed search for documents on your computer in a more intelligent and detailed manner; subject to your having the necessary additional computer applications and knowledge of the methods described above.