All About Computer Science: 2009

Thursday, May 7, 2009

Network Topologies

Thursday, May 7, 2009 0

In computer networking, topology refers to the layout of connected devices. This article introduces the standard topologies of networking.

Topology in Network Design
Think of a topology as a network's virtual shape or structure. This shape does not necessarily correspond to the actual physical layout of the devices on the network. For example, the computers on a home LAN may be arranged in a circle in a family room, but it would be highly unlikely to find a ring topology there.
Network topologies are categorized into the following basic types:
• bus
• ring
• star
• tree
• mesh
More complex networks can be built as hybrids of two or more of the above basic topologies.

Bus Topology

Bus networks (not to be confused with the system bus of a computer) use a common backbone to connect all devices. A single cable, the backbone functions as a shared communication medium that devices attach or tap into with an interface connector. A device wanting to communicate with another device on the network sends a broadcast message onto the wire that all other devices see, but only the intended recipient actually accepts and processes the message.
Ethernet bus topologies are relatively easy to install and don't require much cabling compared to the alternatives. 10Base-2 ("ThinNet") and 10Base-5 ("ThickNet") both were popular Ethernet cabling options many years ago for bus topologies. However, bus networks work best with a limited number of devices. If more than a few dozen computers are added to a network bus, performance problems will likely result. In addition, if the backbone cable fails, the entire network effectively becomes unusable.

Monday, May 4, 2009

DSL and Cable Broadband Speed Tweaks

Monday, May 4, 2009 0

So-called broadband speed tweaks are techniques to improve performance of DSL and cable Internet connections. Home networking enthusiasts started experimenting with cable and DSL tweaks many years ago, when broadband Internet services first became popular.

Performance tweaks for low-speed (dialup) Internet often do not work for broadband links. Additionally, while broadband speed tweaks originally focused on increasing the performance of general Web surfing, speed tweaks are now more commonly made to tune specific applications like P2P file sharing systems and games.

Limitations of Broadband Speed Tweaks
First, broadband tweaks should be made only after your network is tested and running reliably. Speed tweaks are performance optimizations only, not designed to fix installation errors or basic network configuration issues.

You should expect broadband tweaks might yield only small speed increases, and then only in certain situations. For example, a tweak to improve the performance of one online game may only benefit that title and then only initially when it is loading. Broadband tweaks may help certain applications (like games) but at the same time slow down others (like Web browsing). In general, assume any performance benefits you obtain may be on the order of 10-20% gain rather than 100-200%.

Finally, speed tweaks also can create instability on some networks. Depending on the type of equipment and Internet service you use, some tweaks will be technically incompatible and need to be avoided.

Types of Broadband Speed Tweaks
The most common broadband tweaks involve adjusting various parameters of the TCP/IP network protocol, typically:
• TCP receive window size
• Maximum Transmission Unit (MTU)
• Maximum Segment Size (MSS)
• Time-To-Live (TTL)

The Microsoft Windows Registry contains default values for TCP/IP parameters. You can apply these speed tweaks to your computers by using a Registry editor or the TCP Optimizer utility (see below) to change some of the default values on each, rebooting the computers each time. Other operating systems like Linux and Mac OS X provide alternative mechanisms to tune TCP/IP parameters.

Another common broadband tweak entails manipulating Web browser settings. For example, suppressing the download of large images saves network bandwidth that can be used instead to download other data faster.

Finally, though less common, a few speed tweaks modify settings on routers and modems. For example, TCP/IP MTU settings can be changed on a broadband router separate from individual computers on the network.

About Web Accelerators for Broadband Tweaks
Speed tweaks have been traditionally applied to the network by an administrator manually, one device at a time, but in recent years software applications have been developed to help automate and maintain tweaks.

So-called Internet download accelerators are pre-packaged software programs that automatically apply speed tweaks to a computer. Installing and running an accelerator program will automatically make the Registry, Web browser and other configuration changes. The more sophisticated applications collect information about your computers and network and apply tweaks intelligently to ensure maximum benefit.

While many Web accelerators are designed specifically for dialup networks, examples of accelerator applications useful for broadband are:
• Download Accelerator Plus
• Google Web Accelerator
• Netfury
• ONSPEED

Making Broadband Tweaks Work
Because speed tweaks can cause computer and network crashes if made improperly, test each change methodically. If possible, use a proven Web accelerator program rather than configuring tweaks manually, and test each change individually before making the next one.

To determine whether a speed tweak is working, use an Internet speed test service to measure your Internet performance before and after making a tweak. In addition, try local file transfers, Web downloads, online games, and other applications you use often to assess whether a tweak makes any noticeable difference. Do not hesitate to undo a change if you cannot observe any benefit.

Wi-Fi Hotspots

A Wi-Fi hotspot is a wireless access point that provides Internet access to network devices in public locations such as downtown centers, cafes, airports and hotels. Businesses and schools are increasingly using Wi-Fi hotspots for their internal (intranet) networks. Home wireless networks also use similar Wi-Fi technology.

Requirements to Use Wi-Fi Hotspots
Computers (and other devices) connect to hotspots using a Wi-Fi network adapter. Newer laptop computers contain built-in adapters, but most other computers do not. Wi-Fi network adapters can be purchased and installed separately. Depending on the type of computer and personal preferences, USB, PC Card, ExpressCard, or even PCI card adapters can be used.

Public Wi-Fi hotspots normally require a paid subscription. The sign-up process involves providing credit card information online or by phone and choosing a service plan. Some service providers offer plans that work at thousands of hotspots throughout the country.

A few pieces of technical information are also required to access Wi-Fi hotspots. The network name (also called SSID) distinguishes hotspot networks from each other. Encryption keys (a long series of letters and numbers) scramble the network traffic to and from a hotspot; most businesses require these as well. Service providers supply this profile information for their hotspots.

Finding Wi-Fi Hotspots
Computers can automatically scan for hotspots within range of their wireless signal. These scans identify the network name (SSID) of the hotspot allowing the computer to initiate a connection.

Instead of using a computer to find hotspots, some people prefer to use a separate gadget called a Wi-Fi finder. These small devices scan for hotspot signals similarly to computers, and many provide some indication of signal strength to help pinpoint their exact location.

Before traveling to a far-away place, the location of Wi-Fi hotspots can be found using online wireless hotspot finder services.

Connect To Wi-Fi Hotspots
The process for connecting to a Wi-Fi hotspot works similarly on home, business and public wireless networks. With the profile (network name and encryption settings) applied on the wireless network adapter, you initiate the connection from your computer operating system (or software that was supplied with the network adapter). Paid or restricted hotspot services will require you to log in with a user name and password the first time you access the Internet.

Dangers of Wi-Fi Hotspots
Although few incidents of hotspot security issues are reported in the press, many people remain skeptical of their safety. Some caution is justified as a hacker with good technical skills can break into your computer through a hotspot and potentially access your personal data.

Taking a few basic precautions will ensure reasonable safety when using Wi-Fi hotspots. First, research the public hotspot service providers and choose only reputable ones who use strong security settings on their networks. Next, ensure you do not accidentally connect to non-preferred hotspots by checking your computer's settings. Finally, be aware of your surroundings and watch for suspicious individuals in the vicinity who may be reading your screen or even plotting to steal your computer.

Is It Legal to Use Open Access Wi-Fi Internet Connections?
Wi-Fi wireless technology simplifies the sharing of network connections between multiple computers and people. Even if you don't subscribe to Internet service, you can log on to public hotspots or to a neighbor's wireless access point to get online. However, using someone else's Internet service isn't always a good idea. It may even be illegal.
Using a public Wi-Fi hotspot is normally legal when you have the service provider's permission and follow their terms of service. These terms may include the following:
• restrictions on time of day when, or locations where, the public Internet may be accessed
• limits on the amount of network bandwidth that may be used
• restrictions on viewing of inappropriate Web sites or engaging in questionable online activities

Using a neighbor's wireless access point may not be legal even with their permission. The answer varies depending on the policies of residential Internet service providers and plans among other factors.

Many U.S. states prohibit unauthorized access to computer networks including open Wi-Fi networks. While interpretations of these laws may vary, some precedents have been set in recent years:
• 2007 - a man in Michigan was fined for using the Wi-Fi hotspot of a local cafe, from his car
• 2006 - Illinois man David Kauchak was fined for unauthorized use of a local agency's Wi-Fi access point
• 2005 - a Florida man faced felony charges for piggybacking onto a neighbor's Internet connection without their permission

Similar restrictions on using open Wi-Fi networks exist outside the U.S. as well:
• 2006 - In Singapore, a teenager received a probationary sentence for wirelessly accessing a neighbor's Internet connection without their permission.
• 2005 - In the U.K., Gregory Straszkiewicz was fined and his computer confiscated for using a local resident's Internet service illicitly

Just as entering a home or business without the owner's permission is considered trespassing (even if the doors are unlocked), likewise accessing wireless Internet connections (even open access ones) can be considered an illegal activity. At a minimum, obtain consent from the operator of any Wi-Fi access point before using their service. Read any online Terms of Service documentation carefully when signing on, and contact the owner offline if necessary to ensure compliance.

Summary
Wi-Fi hotspots are becoming an increasingly common form of Internet access. Connecting to a hotspot requires a wireless network adapter, knowledge of the profile information of that hotspot, and sometimes a subscription to a paid service. Computers and Wi-Fi finder gadgets both are capable of scanning the nearby area for Wi-Fi hotspots, and several online services allow you find far-away hotspot locations. Whether using a home, business or public hotspot, the connection process remains essentially the same. Likewise, as with any wireless network, security issues for Wi-Fi hotspots need to be managed.

Connect Two Home Computers for File Sharing

The simplest kind of home network contains exactly two computers. You can use this kind of network to share files, a printer or another peripheral device, and even an Internet connection. To connect two computers for sharing these and other network resources, consider the options described below.

Connecting Two Computers Directly With Cable
The traditional method to network two computers involves making a dedicated link by plugging one cable into the two systems. Several alternatives exist for networking two computers in this manner:
• Ethernet crossover cable
• Null modem serial cable or parallel peripheral cable
• Special-purpose USB cables

Ethernet - Of the above choices, the Ethernet method is preferred as it supports a reliable, high-speed connection with minimal configuration required. Additionally, Ethernet technology offers the most general-purpose solution, allowing networks with more than two computers to be built fairly easily later. If one of your computers possesses an Ethernet adapter but the other has USB, an Ethernet crossover cable can still be used by first plugging a USB-to-Ethernet converter unit into the computer's USB port.

Serial and parallel - This type of cabling, called Direct Cable Connection (DCC) when using Microsoft Windows, offers lower performance but offers the same basic functionality as Ethernet cables. You may prefer this option if you have such cables readily available and network speed is not a concern. Serial and parallel cables are never used to network more than two computers.

USB - Ordinary USB cables must not be used to connect two computers directly to each other. Attempting to do so can electrically damage the computers! However, special USB cables designed for direct connection exist that can be used safely. You may prefer this option over others if your computers lack functional Ethernet network adapters.

To make dedicated connections with Ethernet, USB, serial or parallel cables requires
1. each computer have a functioning network interface with an external jack for the cable, and
2. the network settings on each computer appropriately configured

One phone line or power cord cannot be used to directly connect two computers to each other for networking.

Network Bandwidth and Latency

What Is Network Latency?Bandwidth is just one element of what a person perceives as the speed of a network. Latency is another element that contributes to network speed. The term latency refers to any of several kinds of delays typically incurred in processing of network data. A so-called low latency network connection is one that generally experiences small delay times, while a high latency connection generally suffers from long delays.Latency vs. BandwidthAlthough the theoretical peak bandwidth of a network connection is fixed according to the technology used, the actual bandwidth you will obtain varies over time and is affected by high latencies. Excessive latency creates bottlenecks that prevent data from filling the network pipe, thus decreasing effective bandwidth. The impact of latency on network bandwidth can be temporary (lasting a few seconds) or persistent (constant) depending on the source of the delays.Latency of Satellite Internet ServiceSatellite Internet service illustrates the difference between latency and bandwidth on computer networks. Satellite Internet connections possess both high bandwidth and high latency. When loading a Web page, for example, most satellite users can observe a noticeable delay from the time they enter a Web address to the time the page begins loading. This high latency is due primarily to propagation delay as the request message travels at the speed of light to the distant satellite station and back to the home network. Once the messages arrive on Earth, however, the page loads quickly like on other high-bandwidth Internet connections (DSL or cable). Besides propagation delays, latency also may also involve transmission delays (properties of the physical medium) and processing delays (such as passing through proxy servers or making network hops on the Internet).Measuring Network LatencyNetwork tools like ping tests and traceroute measure latency by determining the time it takes a given network packet to travel from source to destination and back, the so-called round-trip time. Round-trip time is not the only way to specify latency, but it is the most common. On DSL or cable Internet connections, latencies of less than 100 milliseconds (ms) are typical and less than 25 ms desired. Satellite Internet connections, on the other hand, average 500 ms or higher latency.SummaryTwo key elements of network performance are bandwidth and latency. The average person is more familiar with the concept of bandwidth as that is the one advertised by manufacturers of network equipment. However, latency matters equally to the end user experience as the behavior of satellite Internet connections illustrates. Businesses use the term Quality of Service (QoS) to refer to measuring and maintaining consistent performance on a network by managing both bandwidth and latency in a coordinated fashion.

Thursday, April 2, 2009

Cryptography & Encryption

Thursday, April 2, 2009 1

Over the Internet various communications such as electronic mail, or the use of world wide web browsers are not secure for sending and receiving information. Information sent by those means may include sensitive personal data which may be intercepted. There is commercial activity going on the Internet and many web sites require the users to fill forms and include sensitive personal information such as telephone numbers, addresses, and credit card information. To be able to do that users would like to have a secure, private communication with the other party. Online users may need private and secure communications for other reasons as well. They may simply not want third parties to browse and read their e-mails or alter their content.

What is Cryptography ?
Cryptography defined as "the science and study of secret writing," concerns the ways in which communications and data can be encoded to prevent disclosure of their contents through eavesdropping or message interception, using codes (2), ciphers (3), and other methods, so that only certain people can see the real message. Although the science of cryptography is very old, the desktop-computer revolution has made it possible for cryptographic techniques to become widely used and accessible to nonexperts. David Kahn traces the history of cryptography from Ancient Egypt into the computer age (4). According to Kahn's research from Julius Caesar to Mary, Queen of Scots (5) to Abraham Lincoln's Civil War ciphers, cryptography has been a part of the history. Over the centuries complex computer-based codes, algorithms and machines were created. During World War I, the Germans developed the Enigma machine to have secure communications (6). Enigma codes were decrypted under the secret Ultra project during World War II by the British.

Why Have Cryptography ?

Encryption is the science of changing data so that it is unrecognisable and useless to an unauthorised person. Decryption is changing it back to its original form. The most secure techniques use a mathematical algorithm and a variable value known as a 'key'. The selected key (often any random character string) is input on encryption and is integral to the changing of the data. The EXACT same key MUST be input to enable decryption of the data. This is the basis of the protection.... if the key (sometimes called a password) is only known by authorized individual(s), the data cannot be exposed to other parties. Only those who know the key can decrypt it. This is known as 'private key' cryptography, which is the most well known form.

What is Encryption ?

Encryption is basically an indication of users' distrust of the security of the system, the owner or operator of the system, or law enforcement authorities." (7)

Encryption transforms original information, called plaintext or cleartext, into transformed information, called ciphertext, codetext or simply cipher, which usually has the appearance of random, unintelligible data. The transformed information, in its encrypted form, is called the cryptogram. (8)

Encryption algorithm determines how simple or how complex the process of transformation will be (9). Encryption provides confidentiality, integrity and authenticity of the information transferred from A to B. It will be a secret transmission ensuring that its integrity has not been tampered and also it is authentic, that the information was sent by A. All these three points may be important for different reasons for the transmission of data over the Internet (10).

Who needs Cryptography ?
The ability to protect and secure information is vital to the growth of electronic commerce and to the growth of the Internet itself. Many people need or want to use communications and data security in different areas. Banks use encryption methods all around the world (11) to process financial transactions. These involve transfer of huge amount of money from one bank to another. Banks also use encryption methods to protect their customers ID numbers at bank automated teller machines.

"As the economy continues to move away from cash transactions towards "digital cash", both customers and merchants will need the authentication provided by unforgeable digital signatures in order to prevent forgery and transact with confidence." (12)

This is an important issue related to the Internet users. There are many companies and even shopping malls selling anything from flowers to bottles of wines over the Internet and these transactions are made by the use of credit cards and secure Internet browsers including encryption techniques. The customers over the Internet would like to be secure about sending their credit card information and other financial details related to them over a multi-national environment. It will only work by the use of strong and unforgeable encryption methods.

Also business and commercial companies with trade secrets use or would like to use encryption against high-tech eavesdropping and industrial espionage. Professionals such as lawyers, doctors, dentists or accountants who have confidential information throughout their activities will need encryption if they will rely on the use of Internet in the future. Criminals do use encryption and will use it to cover their illegal activities and to make untraceable perfect crimes possible. More important, people need or desire electronic security from government intrusions or surveillance (13) into their activites on the Internet.

Cryptographic Keys: Private and Public
More complex ciphers use a secret key to control a long sequence of complicated substitutions (14) and transpositions (15). There are two general categories of cryptographic keys: Private key and public key systems.

Private Key Cryptography
Private key systems use a single key. The single key is used both to encrypt and decrypt the information. Both sides of the transmission need a separate key and the key must be kept secret from. The security of the transmission will depend on how well the key is protected. The US Government developed the Data Encryption Standard ("DES") which operates on this basis and it is the actual US standard. DES keys are 56 bits (16) long. The length of the key was criticised and it was suggested that the short key was designed to be long enough to frustrate corporate eavesdroppers, but short enough to be broken by the National Security Agency ("NSA") (17). Export of DES is controlled by the State Department. DES system is getting old and becoming insecure. US government offered to replace the DES with a new algorithm called Skipjack which involves escrowed encryption.

Public Key Cryptography
In the public key system there are two keys: a public and a private key. Each user has both keys and while the private key must be kept secret the public key is publicly known. Both keys are mathematically related. If A encrypts a message with his private key then B, the recipient of the message can decrypt it with A's public key. Similarly anyone who knows A's public key can send him a message by encrypting it with his public key. A will than decrypt it with his private key. Public key cryptography was developed in 1977 by Rivest, Shamir and Adleman ("RSA") in the US. This kind of cryptography is more eficient than the private key cryptography because each user has only one key to encrypt and decrypt all the messages that he or she sends or receives.

Endnotes:
1. The word cryptography comes from Greek and kryptos means "hidden" while graphia stands for "writing".
2. A code is a system of communication that relies on a pre-arranged mapping of meanings such as those found in a codebook.
3. A cipher is different from a code and it is a method of encrypting any text regardless of its content.
4. David Kahn, The Codebreakers, Macmillan Company, New York: 1972.
5. Mary, Queen of Scots, lost her life in the 16th century because an encrypted message she sent from prison was intercepted and deciphered.
6. See David Kahn, Seizing the Enigma, Houghton Mifflin, Boston: 1991.
7. Lance Rose, Netlaw: Your Rights in the Online World, Osborne Mc Graw-Hill, 1995, page 182.
8. Deborah Russell and G.T. Gangemi, Sr., "Encryption" from Computer Security Basics, O'Reilly & Associates, Inc., California: 1991, pp 165-179 taken from Lance J. Hoffman, Building in Big Brother: TheCryptography Policy Debate, Spriner-Verlag, New York: 1995, at page 14.
9. ibid.
10. While military and secret services will require a confidential transmission, it will be important for banks to have accurate information of their transactions by electronic means. Authentication technique provides digital signatures which are unique for every transcation and cannot be forged.
11. The U.S. Department of the Treasury requires encryption of all U.S. electronic funds transfer messages. See Gerald Murphy, U.S. Dep't of Treasury, Directive: Electronic Funds and Securities Transfer Policy - Message Authentication and Enhanced Security, No. 16-02, section 3 (Dec. 21, 1992).
12. A. Michael Froomkin, "The Metaphor is the Key: Cryptography, the Clipper Chip and the Constitution" [1995] U. Penn. L. Rev. 143, 709-897, at 720.
13. E.g. the FBI during 1970s wiretapped and bugged the communications of Black Panthers and other dissident groups. See Sanford J. Ungar, FBI 137, (1975). Also between 1953 and 1973, the CIA opened and photographed almost 250000 first class letters within the US from which it compiled a database of almost 1.5 million names. See Church Committee Report, S. Rep. No. 755, 94th Cong., 2d Sess., pt. 2, 1976, at 6.
14. Substitution ciphers replace the actual bits, characters, or blocks of characters with substitutes, eg. one letter replaces another letter. Julius Caesar's military use of such a cipher was the first clearly documented case. In Caesar's cipher each letter of an original message is replaced with the letter three palces beyond it in the alphabet.
15. Transposition ciphers rearrange the order of the bits, characters, or blocks of characters that are being encrypted and decrypted.
16. This means that there are 72 quadrillion different possible keys.
17. See James Bamford, The Puzzle Palace: A Report on America's Most Secret Agency, 1982.

Tuesday, March 31, 2009

Normalizing Your Database

Tuesday, March 31, 2009 1

First Normal Form (1NF)

First Normal Form (1NF) sets the very basic rules for an organized database:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column (the primary key).

What do these rules mean when contemplating the practical design of a database? It’s actually quite simple.

The first rule dictates that we must not duplicate data within the same row of a table. Within the database community, this concept is referred to as the atomicity of a table. Tables that comply with this rule are said to be atomic. Let’s explore this principle with a classic example – a table within a human resources database that stores the manager-subordinate relationship. For the purposes of our example, we’ll impose the business rule that each manager may have one or more subordinates while each subordinate may have only one manager.

Intuitively, when creating a list or spreadsheet to track this information, we might create a table with the following fields:
• Manager
• Subordinate1
• Subordinate2
• Subordinate3
• Subordinate4

However, recall the first rule imposed by 1NF: eliminate duplicative columns from the same table. Clearly, the Subordinate1-Subordinate4 columns are duplicative. Take a moment and ponder the problems raised by this scenario. If a manager only has one subordinate – the Subordinate2-Subordinate4 columns are simply wasted storage space (a precious database commodity). Furthermore, imagine the case where a manager already has 4 subordinates – what happens if she takes on another employee? The whole table structure would require modification.

At this point, a second bright idea usually occurs to database novices: We don’t want to have more than one column and we want to allow for a flexible amount of data storage. Let’s try something like this:
• Manager
• Subordinates

where the Subordinates field contains multiple entries in the form "Mary, Bill, Joe"

This solution is closer, but it also falls short of the mark. The subordinates column is still duplicative and non-atomic. What happens when we need to add or remove a subordinate? We need to read and write the entire contents of the table. That’s not a big deal in this situation, but what if one manager had one hundred employees? Also, it complicates the process of selecting data from the database in future queries.

Here’s a table that satisfies the first rule of 1NF:
• Manager
• Subordinate

In this case, each subordinate has a single entry, but managers may have multiple entries.

Now, what about the second rule: identify each row with a unique column or set of columns (the primary key)? You might take a look at the table above and suggest the use of the subordinate column as a primary key. In fact, the subordinate column is a good candidate for a primary key due to the fact that our business rules specified that each subordinate may have only one manager. However, the data that we’ve chosen to store in our table makes this a less than ideal solution. What happens if we hire another employee named Jim? How do we store his manager-subordinate relationship in the database?

It’s best to use a truly unique identifier (such as an employee ID) as a primary key. Our final table would look like this:
• Manager ID
• Subordinate ID

Now, our table is in first normal form!

Second Normal Form (2NF)

Over the past month, we've looked at several aspects of normalizing a database table. First, we discussed the basic principles of database normalization. Last time, we explored the basic requirements laid down by the first normal form (1NF). Now, let's continue our journey and cover the principles of second normal form (2NF).

Recall the general requirements of 2NF: Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
These rules can be summarized in a simple statement: 2NF attempts to reduce the amount of redundant data in a table by extracting it, placing it in new table(s) and creating relationships between those tables.

Let's look at an example. Imagine an online store that maintains customer information in a database. They might have a single table called Customers with the following elements:
• CustNum
• FirstName
• LastName
• Address
• City
• State
• ZIP
• CustNum
• FirstName
• LastName
• Address
• City
• State
• ZIP
A brief look at this table reveals a small amount of redundant data. We're storing the "Sea Cliff, NY 11579" and "Miami, FL 33157" entries twice each. Now, that might not seem like too much added storage in our simple example, but imagine the wasted space if we had thousands of rows in our table. Additionally, if the ZIP code for Sea Cliff were to change, we'd need to make that change in many places throughout the database.

In a 2NF-compliant database structure, this redundant information is extracted and stored in a separate table. Our new table (let's call it ZIPs) might have the following fields:
• ZIP
• City
• State
• ZIP
• City
• State
If we want to be super-efficient, we can even fill this table in advance -- the post office provides a directory of all valid ZIP codes and their city/state relationships. Surely, you've encountered a situation where this type of database was utilized. Someone taking an order might have asked you for your ZIP code first and then knew the city and state you were calling from. This type of arrangement reduces operator error and increases efficiency.

Now that we've removed the duplicative data from the Customers table, we've satisfied the first rule of second normal form. We still need to use a foreign key to tie the two tables together. We'll use the ZIP code (the primary key from the ZIPs table) to create that relationship. Here's our new Customers table:
• CustNum
• FirstName
• LastName
• Address
• ZIP
• CustNum
• FirstName
• LastName
• Address
• ZIP
We've now minimized the amount of redundant information stored within the database and our structure is in second normal form!

Third Normal Form (3NF)

There are two basic requirements for a database to be in third normal form:
Already meet the requirements of both 1NF and 2NF
Remove columns that are not fully dependent upon the primary key.

Imagine that we have a table of widget orders that contains the following attributes:
• Order Number
• Customer Number
• Unit Price
• Quantity
• Total

Remember, our first requirement is that the table must satisfy the requirements of 1NF and 2NF. Are there any duplicative columns? No. Do we have a primary key? Yes, the order number. Therefore, we satisfy the requirements of 1NF. Are there any subsets of data that apply to multiple rows? No, so we also satisfy the requirements of 2NF.

Now, are all of the columns fully dependent upon the primary key? The customer number varies with the order number and it doesn't appear to depend upon any of the other fields. What about the unit price? This field could be dependent upon the customer number in a situation where we charged each customer a set price. However, looking at the data above, it appears we sometimes charge the same customer different prices. Therefore, the unit price is fully dependent upon the order number. The quantity of items also varies from order to order, so we're OK there.

What about the total? It looks like we might be in trouble here. The total can be derived by multiplying the unit price by the quantity, therefore it's not fully dependent upon the primary key. We must remove it from the table to comply with the third normal form. Perhaps we use the following attributes:
• Order Number
• Customer Number
• Unit Price
• Quantity

Now our table is in 3NF. But, you might ask, what about the total? This is a derived field and it's best not to store it in the database at all. We can simply compute it "on the fly" when performing database queries. For example, we might have previously used this query to retrieve order numbers and totals:
SELECT OrderNumber, Total
FROM WidgetOrders

We can now use the following query:
SELECT OrderNumber, UnitPrice * Quantity AS Total
FROM WidgetOrders

to achieve the same results without violating normalization rules.

Database Normalization Basics

If you've been working with databases for a while, chances are you've heard the term normalization. Perhaps someone's asked you "Is that database normalized?" or "Is that in BCNF?" All too often, the reply is "Uh, yeah." Normalization is often brushed aside as a luxury that only academics have time for. However, knowing the principles of normalization and applying them to your daily database design tasks really isn't all that complicated and it could drastically improve the performance of your DBMS.

In this article, we'll introduce the concept of normalization and take a brief look at the most common normal forms. Future articles will provide in-depth explorations of the normalization process.

What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF)
Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.

Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.
Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.

Monday, March 30, 2009

Database Keys

Monday, March 30, 2009 0

As you may already know, databases use tables to organize information. (If you don’t have a basic familiarity with database concepts, read What is a Database?) Each table consists of a number of rows, each of which corresponds to a single database record. So, how do databases keep all of these records straight? It’s through the use of keys.

Primary Keys
The first type of key we’ll discuss is the primary key. Every database table should have one or more columns designated as the primary key. The value this key holds should be unique for each record in the database. For example, assume we have a table called Employees that contains personnel information for every employee in our firm. We’d need to select an appropriate primary key that would uniquely identify each employee. Your first thought might be to use the employee’s name.
This wouldn’t work out very well because it’s conceivable that you’d hire two employees with the same name. A better choice might be to use a unique employee ID number that you assign to each employee when they’re hired. Some organizations choose to use Social Security Numbers (or similar government identifiers) for this task because each employee already has one and they’re guaranteed to be unique. However, the use of Social Security Numbers for this purpose is highly controversial due to privacy concerns. (If you work for a government organization, the use of a Social Security Number may even be illegal under the Privacy Act of 1974.) For this reason, most organizations have shifted to the use of unique identifiers (employee ID, student ID, etc.) that don’t share these privacy concerns.
Once you decide upon a primary key and set it up in the database, the database management system will enforce the uniqueness of the key. If you try to insert a record into a table with a primary key that duplicates an existing record, the insert will fail.
Most databases are also capable of generating their own primary keys. Microsoft Access, for example, may be configured to use the AutoNumber data type to assign a unique ID to each record in the table. While effective, this is a bad design practice because it leaves you with a meaningless value in each record in the table. Why not use that space to store something useful?

Foreign Keys
The other type of key that we’ll discuss in this course is the foreign key. These keys are used to create relationships between tables. Natural relationships exist between tables in most database structures. Returning to our employees database, let’s imagine that we wanted to add a table containing departmental information to the database. This new table might be called Departments and would contain a large amount of information about the department as a whole. We’d also want to include information about the employees in the department, but it would be redundant to have the same information in two tables (Employees and Departments). Instead, we can create a relationship between the two tables.
Let’s assume that the Departments table uses the Department Name column as the primary key. To create a relationship between the two tables, we add a new column to the Employees table called Department. We then fill in the name of the department to which each employee belongs. We also inform the database management system that the Department column in the Employees table is a foreign key that references the Departments table. The database will then enforce referential integrity by ensuring that all of the values in the Departments column of the Employees table have corresponding entries in the Departments table.
Note that there is no uniqueness constraint for a foreign key. We may (and most likely do!) have more than one employee belonging to a single department. Similarly, there’s no requirement that an entry in the Departments table have any corresponding entry in the Employees table. It is possible that we’d have a department with no employees.

Definition:

The primary key of a relational table uniquely identifies each record in the table. It can either be a normal attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one record per person) or it can be generated by the DBMS (such as a globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a single attribute or multiple attributes in combination. Examples, Imagine we have a STUDENTS table that contains a record for each student at a university. The student's unique student ID number would be a good choice for a primary key in the STUDENTS table. The student's first and last name would not be a good choice, as there is always the chance that more than one student might have the same name.

A foreign key is a field in a relational table that matches the primary key column of another table. The foreign key can be used to cross-reference tables.

A database management system (DBMS) is the software that allows a computer to perform database functions of storing, retrieving, adding, deleting and modifying data. Relational database management systems (RDBMS) implement the relational model of tables and relationships. Examples, Microsoft Access, MySQL, Microsoft SQL Server, Oracle and FileMaker Pro are all examples of database management systems.

Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table.

Consider the situation where we have two tables: Employees and Managers. The Employees table has a foreign key attribute entitled ManagedBy which points to the record for that employee’s manager in the Managers table. Referential integrity enforces the following three rules:

We may not add a record to the Employees table unless the ManagedBy attribute points to a valid record in the Managers table.
If the primary key for a record in the Managers table changes, all corresponding records in the Employees table must be modified using a cascading update.
If a record in the Managers table is deleted, all corresponding records in the Employees table must be deleted using a cascading delete.

Tuesday, March 24, 2009

What is a Database?

Tuesday, March 24, 2009 0

Databases are designed to offer an organized mechanism for storing, managing and retrieving information. They do so through the use of tables. If you’re familiar with spreadsheets like Microsoft Excel, you’re probably already accustomed to storing data in tabular form. It’s not much of a stretch to make the leap from spreadsheets to databases. Let’s take a look.

Database Tables

Just like Excel tables, database tables consist of columns and rows. Each column contains a different type of attribute and each row corresponds to a single record. For example, imagine that we were building a database table that contained names and telephone numbers. We’d probably set up columns named “FirstName”, “LastName” and “TelephoneNumber.” Then we’d simply start adding rows underneath those columns that contained the data we’re planning to store. If we were building a table of contact information for our business that has 50 employees, we’d wind up with a table that contains 50 rows.

Databases and Spreadsheets

At this point, you’re probably asking yourself an obvious question – if a database is so much like a spreadsheet, why can’t I just use a spreadsheet? Databases are actually much more powerful than spreadsheets in the way you’re able to manipulate data. Here are just a few of the actions that you can perform on a database that would be difficult if not impossible to perform on a spreadsheet:

Retrieve all records that match certain criteria
Update records in bulk
Cross-reference records in different tables
Perform complex aggregate calculations

Definition

A database is a collection of information organized into interrelated tables of data and specifications of data objects.

A table in a relational database is a predefined format of rows and columns that define an entity.

Database tables are composed of individual columns corresponding to the attributes of the object.

In a relational database, a row consists of one set of attributes (or one tuple) corresponding to one instance of the entity that a table schema describes.

A single data item related to a database object. The database schema associates one or more attributes with each database entity.

A database record consists of one set of tuples for a given relational table. In a relational database, records correspond to rows in each table.

All About Computer Science