Thursday, April 2, 2009

Cryptography & Encryption

Thursday, April 2, 2009 1

Over the Internet various communications such as electronic mail, or the use of world wide web browsers are not secure for sending and receiving information. Information sent by those means may include sensitive personal data which may be intercepted. There is commercial activity going on the Internet and many web sites require the users to fill forms and include sensitive personal information such as telephone numbers, addresses, and credit card information. To be able to do that users would like to have a secure, private communication with the other party. Online users may need private and secure communications for other reasons as well. They may simply not want third parties to browse and read their e-mails or alter their content.

What is Cryptography ?
Cryptography defined as "the science and study of secret writing," concerns the ways in which communications and data can be encoded to prevent disclosure of their contents through eavesdropping or message interception, using codes (2), ciphers (3), and other methods, so that only certain people can see the real message. Although the science of cryptography is very old, the desktop-computer revolution has made it possible for cryptographic techniques to become widely used and accessible to nonexperts. David Kahn traces the history of cryptography from Ancient Egypt into the computer age (4). According to Kahn's research from Julius Caesar to Mary, Queen of Scots (5) to Abraham Lincoln's Civil War ciphers, cryptography has been a part of the history. Over the centuries complex computer-based codes, algorithms and machines were created. During World War I, the Germans developed the Enigma machine to have secure communications (6). Enigma codes were decrypted under the secret Ultra project during World War II by the British.

Why Have Cryptography ?

Encryption is the science of changing data so that it is unrecognisable and useless to an unauthorised person. Decryption is changing it back to its original form. The most secure techniques use a mathematical algorithm and a variable value known as a 'key'. The selected key (often any random character string) is input on encryption and is integral to the changing of the data. The EXACT same key MUST be input to enable decryption of the data. This is the basis of the protection.... if the key (sometimes called a password) is only known by authorized individual(s), the data cannot be exposed to other parties. Only those who know the key can decrypt it. This is known as 'private key' cryptography, which is the most well known form.


What is Encryption ?

Encryption is basically an indication of users' distrust of the security of the system, the owner or operator of the system, or law enforcement authorities." (7) 

Encryption transforms original information, called plaintext or cleartext, into transformed information, called ciphertext, codetext or simply cipher, which usually has the appearance of random, unintelligible data. The transformed information, in its encrypted form, is called the cryptogram. (8) 

Encryption algorithm determines how simple or how complex the process of transformation will be (9). Encryption provides confidentiality, integrity and authenticity of the information transferred from A to B. It will be a secret transmission ensuring that its integrity has not been tampered and also it is authentic, that the information was sent by A. All these three points may be important for different reasons for the transmission of data over the Internet (10).

Who needs Cryptography ?
The ability to protect and secure information is vital to the growth of electronic commerce and to the growth of the Internet itself. Many people need or want to use communications and data security in different areas. Banks use encryption methods all around the world (11) to process financial transactions. These involve transfer of huge amount of money from one bank to another. Banks also use encryption methods to protect their customers ID numbers at bank automated teller machines. 

"As the economy continues to move away from cash transactions towards "digital cash", both customers and merchants will need the authentication provided by unforgeable digital signatures in order to prevent forgery and transact with confidence." (12) 

This is an important issue related to the Internet users. There are many companies and even shopping malls selling anything from flowers to bottles of wines over the Internet and these transactions are made by the use of credit cards and secure Internet browsers including encryption techniques. The customers over the Internet would like to be secure about sending their credit card information and other financial details related to them over a multi-national environment. It will only work by the use of strong and unforgeable encryption methods. 

Also business and commercial companies with trade secrets use or would like to use encryption against high-tech eavesdropping and industrial espionage. Professionals such as lawyers, doctors, dentists or accountants who have confidential information throughout their activities will need encryption if they will rely on the use of Internet in the future. Criminals do use encryption and will use it to cover their illegal activities and to make untraceable perfect crimes possible. More important, people need or desire electronic security from government intrusions or surveillance (13) into their activites on the Internet.

Cryptographic Keys: Private and Public 
More complex ciphers use a secret key to control a long sequence of complicated substitutions (14) and transpositions (15). There are two general categories of cryptographic keys: Private key and public key systems. 

Private Key Cryptography
Private key systems use a single key. The single key is used both to encrypt and decrypt the information. Both sides of the transmission need a separate key and the key must be kept secret from. The security of the transmission will depend on how well the key is protected. The US Government developed the Data Encryption Standard ("DES") which operates on this basis and it is the actual US standard. DES keys are 56 bits (16) long. The length of the key was criticised and it was suggested that the short key was designed to be long enough to frustrate corporate eavesdroppers, but short enough to be broken by the National Security Agency ("NSA") (17). Export of DES is controlled by the State Department. DES system is getting old and becoming insecure. US government offered to replace the DES with a new algorithm called Skipjack which involves escrowed encryption. 

Public Key Cryptography
In the public key system there are two keys: a public and a private key. Each user has both keys and while the private key must be kept secret the public key is publicly known. Both keys are mathematically related. If A encrypts a message with his private key then B, the recipient of the message can decrypt it with A's public key. Similarly anyone who knows A's public key can send him a message by encrypting it with his public key. A will than decrypt it with his private key. Public key cryptography was developed in 1977 by Rivest, Shamir and Adleman ("RSA") in the US. This kind of cryptography is more eficient than the private key cryptography because each user has only one key to encrypt and decrypt all the messages that he or she sends or receives.

Endnotes:
1. The word cryptography comes from Greek and kryptos means "hidden" while graphia stands       for "writing".
2. A code is a system of communication that relies on a pre-arranged mapping of meanings such       as those found in a codebook.
3. A cipher is different from a code and it is a method of encrypting any text regardless of its             content.
4. David Kahn, The Codebreakers, Macmillan Company, New York: 1972.
5. Mary, Queen of Scots, lost her life in the 16th century because an encrypted message she sent     from prison was intercepted and deciphered.
6. See David Kahn, Seizing the Enigma, Houghton Mifflin, Boston: 1991.
7. Lance Rose, Netlaw: Your Rights in the Online World, Osborne Mc Graw-Hill, 1995, page 182.
8. Deborah Russell and G.T. Gangemi, Sr., "Encryption" from Computer Security Basics, O'Reilly     & Associates, Inc., California: 1991, pp 165-179 taken from Lance J. Hoffman, Building in Big         Brother: TheCryptography Policy Debate, Spriner-Verlag, New York: 1995, at page 14.
9. ibid. 
10. While military and secret services will require a confidential transmission, it will be important        for banks to have accurate information of their transactions by electronic means.                              Authentication technique provides digital signatures which are unique for every transcation          and cannot be forged.
11. The U.S. Department of the Treasury requires encryption of all U.S. electronic funds transfer       messages. See Gerald Murphy, U.S. Dep't of Treasury, Directive: Electronic Funds and                   Securities Transfer Policy - Message Authentication and Enhanced Security, No. 16-02,                 section 3 (Dec. 21, 1992).
12. A. Michael Froomkin, "The Metaphor is the Key: Cryptography, the Clipper Chip and the             Constitution" [1995] U. Penn. L. Rev. 143, 709-897, at 720.
13. E.g. the FBI during 1970s wiretapped and bugged the communications of Black Panthers and       other dissident groups. See Sanford J. Ungar, FBI 137, (1975). Also between 1953 and 1973,         the CIA opened and photographed almost 250000 first class letters within the US from                 which it compiled a database of almost 1.5 million names. See Church Committee Report, S.           Rep. No. 755, 94th Cong., 2d Sess., pt. 2, 1976, at 6.
14. Substitution ciphers replace the actual bits, characters, or blocks of characters with                         substitutes, eg. one letter replaces another letter. Julius Caesar's military use of such a cipher       was the first clearly documented case. In Caesar's cipher each letter of an original message is         replaced with the letter three palces beyond it in the alphabet.
15. Transposition ciphers rearrange the order of the bits, characters, or blocks of characters that       are being encrypted and decrypted.
16. This means that there are 72 quadrillion different possible keys.
17. See James Bamford, The Puzzle Palace: A Report on America's Most Secret Agency, 1982.

Tuesday, March 31, 2009

Normalizing Your Database

Tuesday, March 31, 2009 1

First Normal Form (1NF)

First Normal Form (1NF) sets the very basic rules for an organized database:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column (the primary key).

What do these rules mean when contemplating the practical design of a database? It’s actually quite simple.

The first rule dictates that we must not duplicate data within the same row of a table. Within the database community, this concept is referred to as the atomicity of a table. Tables that comply with this rule are said to be atomic. Let’s explore this principle with a classic example – a table within a human resources database that stores the manager-subordinate relationship. For the purposes of our example, we’ll impose the business rule that each manager may have one or more subordinates while each subordinate may have only one manager.

Intuitively, when creating a list or spreadsheet to track this information, we might create a table with the following fields:
• Manager
• Subordinate1
• Subordinate2
• Subordinate3
• Subordinate4

However, recall the first rule imposed by 1NF: eliminate duplicative columns from the same table. Clearly, the Subordinate1-Subordinate4 columns are duplicative. Take a moment and ponder the problems raised by this scenario. If a manager only has one subordinate – the Subordinate2-Subordinate4 columns are simply wasted storage space (a precious database commodity). Furthermore, imagine the case where a manager already has 4 subordinates – what happens if she takes on another employee? The whole table structure would require modification.

At this point, a second bright idea usually occurs to database novices: We don’t want to have more than one column and we want to allow for a flexible amount of data storage. Let’s try something like this:
• Manager
• Subordinates

where the Subordinates field contains multiple entries in the form "Mary, Bill, Joe"

This solution is closer, but it also falls short of the mark. The subordinates column is still duplicative and non-atomic. What happens when we need to add or remove a subordinate? We need to read and write the entire contents of the table. That’s not a big deal in this situation, but what if one manager had one hundred employees? Also, it complicates the process of selecting data from the database in future queries.

Here’s a table that satisfies the first rule of 1NF:
• Manager
• Subordinate

In this case, each subordinate has a single entry, but managers may have multiple entries.

Now, what about the second rule: identify each row with a unique column or set of columns (the primary key)? You might take a look at the table above and suggest the use of the subordinate column as a primary key. In fact, the subordinate column is a good candidate for a primary key due to the fact that our business rules specified that each subordinate may have only one manager. However, the data that we’ve chosen to store in our table makes this a less than ideal solution. What happens if we hire another employee named Jim? How do we store his manager-subordinate relationship in the database? 

It’s best to use a truly unique identifier (such as an employee ID) as a primary key. Our final table would look like this:
• Manager ID
• Subordinate ID

Now, our table is in first normal form!

Second Normal Form (2NF)

Over the past month, we've looked at several aspects of normalizing a database table. First, we discussed the basic principles of database normalization. Last time, we explored the basic requirements laid down by the first normal form (1NF). Now, let's continue our journey and cover the principles of second normal form (2NF). 

Recall the general requirements of 2NF: Remove subsets of data that apply to multiple rows of a table and place them in separate tables. 
Create relationships between these new tables and their predecessors through the use of foreign keys. 
Remove subsets of data that apply to multiple rows of a table and place them in separate tables. 
Create relationships between these new tables and their predecessors through the use of foreign keys. 
These rules can be summarized in a simple statement: 2NF attempts to reduce the amount of redundant data in a table by extracting it, placing it in new table(s) and creating relationships between those tables. 

Let's look at an example. Imagine an online store that maintains customer information in a database. They might have a single table called Customers with the following elements: 
• CustNum
• FirstName
• LastName
• Address
• City
• State
• ZIP
• CustNum
• FirstName
• LastName
• Address
• City
• State
• ZIP
A brief look at this table reveals a small amount of redundant data. We're storing the "Sea Cliff, NY 11579" and "Miami, FL 33157" entries twice each. Now, that might not seem like too much added storage in our simple example, but imagine the wasted space if we had thousands of rows in our table. Additionally, if the ZIP code for Sea Cliff were to change, we'd need to make that change in many places throughout the database. 

In a 2NF-compliant database structure, this redundant information is extracted and stored in a separate table. Our new table (let's call it ZIPs) might have the following fields: 
• ZIP
• City
• State
• ZIP
• City
• State
If we want to be super-efficient, we can even fill this table in advance -- the post office provides a directory of all valid ZIP codes and their city/state relationships. Surely, you've encountered a situation where this type of database was utilized. Someone taking an order might have asked you for your ZIP code first and then knew the city and state you were calling from. This type of arrangement reduces operator error and increases efficiency. 

Now that we've removed the duplicative data from the Customers table, we've satisfied the first rule of second normal form. We still need to use a foreign key to tie the two tables together. We'll use the ZIP code (the primary key from the ZIPs table) to create that relationship. Here's our new Customers table: 
• CustNum
• FirstName
• LastName
• Address
• ZIP
• CustNum
• FirstName
• LastName
• Address
• ZIP
We've now minimized the amount of redundant information stored within the database and our structure is in second normal form!


Third Normal Form (3NF)

There are two basic requirements for a database to be in third normal form:
Already meet the requirements of both 1NF and 2NF
Remove columns that are not fully dependent upon the primary key.

Imagine that we have a table of widget orders that contains the following attributes:
• Order Number
• Customer Number
• Unit Price
• Quantity
• Total

Remember, our first requirement is that the table must satisfy the requirements of 1NF and 2NF. Are there any duplicative columns? No. Do we have a primary key? Yes, the order number. Therefore, we satisfy the requirements of 1NF. Are there any subsets of data that apply to multiple rows? No, so we also satisfy the requirements of 2NF.

Now, are all of the columns fully dependent upon the primary key? The customer number varies with the order number and it doesn't appear to depend upon any of the other fields. What about the unit price? This field could be dependent upon the customer number in a situation where we charged each customer a set price. However, looking at the data above, it appears we sometimes charge the same customer different prices. Therefore, the unit price is fully dependent upon the order number. The quantity of items also varies from order to order, so we're OK there.

What about the total? It looks like we might be in trouble here. The total can be derived by multiplying the unit price by the quantity, therefore it's not fully dependent upon the primary key. We must remove it from the table to comply with the third normal form. Perhaps we use the following attributes:
• Order Number
• Customer Number
• Unit Price
• Quantity

Now our table is in 3NF. But, you might ask, what about the total? This is a derived field and it's best not to store it in the database at all. We can simply compute it "on the fly" when performing database queries. For example, we might have previously used this query to retrieve order numbers and totals:
SELECT OrderNumber, Total
FROM WidgetOrders

We can now use the following query:
SELECT OrderNumber, UnitPrice * Quantity AS Total
FROM WidgetOrders

to achieve the same results without violating normalization rules.