Checksums are used to make ethernet communications robust. I’ll attempt to explain how that works and why leveraging checksums in a network qualification is efficient and effective.
Practically every packet of information passed in an ethernet network is carefully checked for accuracy and, if the test fails, the system automatically requests the packet be resent. Ethernet devices like the network interface cards (NIC) in computers follow strict guidelines and communication protocols before passing data from the network on to the computer.
While an ethernet device prepares a packet of data to be sent over the ethernet, it uses an algorithm to create a checksum from that data. The checksum cannot be used to re-create the data, but it is used to ensure that the data is identical to the original data that formed the packet.
You might ask, “How can the checksum can be part of the data packet it is a checksum of?” The data is like the contents of a sack of lunch and the receipt is the checksum. The receiving network card is designed to separate the datagram from the header and checksum portions of the packet and perform the test: just like I have no problem separating the paper from my lunch.
Ethernet networks like the internet were designed to provide for this kind of inherent testing so the infrastructure would not need to be dis-assembled for testing. The components with the greatest risk of failure are not the wires and fibers but the interface cards and devices that they connect.
Inherent testing assures that tests include the network translation devices including the connection interface where the “rubber meets the road” so to speak. It is critical that the tests include these connections “in situ” since even a tiny piece of lint can scatter the light in fiberoptic connections or increase resistance in copper connections.
The checksum tests are embedded in the underlying protocols that define how data are sent over the network. They cannot be turned off by end users or adjusted to allow a percentage of error: any anomaly requires a resend of the entire packet. Persistant failures result in a complete loss of communication, not deilvery of inaccurate information.
Let’s say that someone offers to drive to town and get a lunch for me. If my goal is to mitigate the risk that they deliver an inaccurate lunch, I can approach the problem from two angles:
I could check their car over and carefully survey the road surface for the route I expect they will travel.
Or, knowing that they might take either the Interstate or the county road, I could just wait until they return and check the receipt (i.e. checksum) against the contents of my lunch sack. If one BYTE is missing from my hamburger, the checksum test will fail and I’ll send them back for another entire packet, er I mean, sack lunch.
With the checksum approach there is an opportunity for a slow lunch delivery – especially if I keep sending them back until they no longer eat part of my lunch – but the risk I wanted to mitigate was the opportunity for an inaccurate lunch. Testing the wire and fiber components of the infrastructure don’t do anything to assure the integrity of the packets delivered: if they don’t precisely match the checksum they will never be delivered.
So, how can we quantitatively determine the state of our network infrastructure from the NIC of the server to the NIC of the client PC and include any other ethernet data acquisition devices without unplugging anything? We could add up the number of resend requests that are sent when data packets fail the checksum test. Switches and hubs have been designed to track such success parameters as that for many years. Investments made to report this data would surely provide a return by empowering the organization to track and trend the true health of the network.
Reports and summaries of packet failures are nice, but not necessary to leverage the power of the checksum in network qualification documents. All that is needed there is a savvy explanation of how all your network devices checksum essentially every ethernet* packet for quality and automatically requests the data be resent on error.
Hopefully this blog entry will assist you in that task.** You are welcome to use any part of it without reference but please leave me a comment if it has been helpful or entertaining. You are welcome to reference the work in it’s permanent archive: http://jimf.name/checksum-mitigates-network-risks/
* Note that the checksum cited here does not apply to non-ethernet protocols like RS-232 and RS-485 serial communications. Implementations of serial and other connections require some other means of qualifying accuracy.
** The author is not responsible or liable for misuse or interpretation of the information presented here: USE AT YOUR OWN RISK!