1. Overview
Data storage and compression are crucial in computer science because they determine how efficiently information is stored and transmitted. Understanding how data is measured and compressed allows us to optimize file sizes, reduce storage requirements, and improve data transfer speeds, all essential for modern computing.
Key Definitions
- Bit: The smallest unit of data in a computer, representing a 0 or 1.
- Nibble: A group of 4 bits.
- Byte: A group of 8 bits.
- Kibibyte (KiB): 1024 bytes.
- Mebibyte (MiB): 1024 KiB.
- Gibibyte (GiB): 1024 MiB.
- Tebibyte (TiB): 1024 GiB.
- Pebibyte (PiB): 1024 TiB.
- Exbibyte (EiB): 1024 PiB.
- Data Compression: Reducing the size of a file by removing redundant or less important data.
- Lossy Compression: A compression technique that permanently removes some data, resulting in a smaller file size but potentially lower quality.
- Lossless Compression: A compression technique that reduces file size without losing any data, allowing the original file to be perfectly reconstructed.
- Run Length Encoding (RLE): A lossless compression technique that replaces repeated sequences of data with a count and a value.
- Sample Rate: The number of audio samples taken per second, measured in Hertz (Hz).
- Sample Resolution/Bit Depth: The number of bits used to represent each audio sample.
- Colour Depth: The number of bits used to represent the colour of a single pixel in an image.
Core Content
Data Storage Measurement
Computers use binary (0s and 1s) because electronic circuits can easily represent two states: on (1) or off (0). This makes processing simple and reliable.
The units of data storage increase by powers of 2 (1024) when using binary prefixes:
Unit Abbreviation Value in Bytes Byte B 1 Kibibyte KiB 1024 Mebibyte MiB 1024 KiB Gibibyte GiB 1024 MiB Tebibyte TiB 1024 GiB Pebibyte PiB 1024 TiB Exbibyte EiB 1024 PiB
File Size Calculations
- Image File Size:
- Formula: File Size (bytes) = (Width * Height * Colour Depth) / 8
- Example: An image is 1000 pixels wide, 500 pixels high, and has a colour depth of 24 bits. File Size = (1000 * 500 * 24) / 8 = 1,500,000 bytes = 1.5 MB (approximately).
- Sound File Size:
- Formula: File Size (bytes) = (Sample Rate * Sample Resolution * Duration * Number of Channels) / 8
- Example: An audio file has a sample rate of 44100 Hz, a sample resolution of 16 bits, a duration of 60 seconds, and 2 channels (stereo). File Size = (44100 * 16 * 60 * 2) / 8 = 10,584,000 bytes = 10.584 MB (approximately).
Data Compression
Purpose and Need:
- Reduces file size for:
- Less storage space needed.
- Faster transmission/download times.
- Reduced bandwidth usage.
- Essential for:
- Streaming media.
- Web content delivery.
- Mobile devices with limited storage.
- Reduces file size for:
Lossy Compression:
- Permanently removes data considered less important (e.g., high frequencies in audio or subtle colour changes in images).
- Smaller file sizes but potential loss of quality.
- Suitable when perfect quality is not essential.
- Examples: JPEG (images), MP3 (audio), MPEG (video).
Lossless Compression:
- No data is lost; the original file can be perfectly reconstructed.
- Larger file sizes compared to lossy compression but preserves quality.
- Used for text, executables, and images where detail is critical.
- Examples: ZIP, PNG (images, can also be lossy), GIF.
Run Length Encoding (RLE):
A lossless compression technique.
Replaces repeated sequences of data with a count and the repeated value.
Effective for images with large areas of the same colour.
Example: "AAAABBBBCCCDD" becomes "4A4B3C2D".
Pseudocode Example:
INPUT: String of characters OUTPUT: Compressed String FUNCTION RLECompress(inputString) compressedString = "" count = 1 FOR i = 1 TO LENGTH(inputString) - 1 IF inputString[i] == inputString[i+1] THEN count = count + 1 ELSE compressedString = compressedString + count + inputString[i] count = 1 ENDIF ENDFOR compressedString = compressedString + count + inputString[LENGTH(inputString)] //Add the last character group RETURN compressedString ENDFUNCTION //Example stringToCompress = "AAAABBBBCCCDD" compressedResult = RLECompress(stringToCompress) OUTPUT compressedResult //Displays "4A4B3C2D"
Exam Focus
- Data Representation: Be prepared to explain why binary is used, relating it to electronic circuits and on/off states.
- File Size Calculations: Practice calculating image and audio file sizes. Pay attention to units (bits, bytes, KB, MB, etc.). Show your working.
- Compression: Clearly distinguish between lossy and lossless compression. Give advantages and disadvantages of each in specific situations.
- RLE: Understand how RLE works and when it is most effective. Be prepared to compress a given string using RLE.
- Technical Terminology: Use precise language. Don't just say "makes the file smaller"; say "reduces the file size by removing redundant data."
Common Mistakes to Avoid
- ❌ Wrong: "Computers use binary because they understand it." ✓ Right: "Computers use binary because electronic circuits have two states (on/off) representing 0 and 1, enabling simple and reliable processing."
- ❌ Wrong: "MP3 is a type of compression." ✓ Right: "MP3 is an example of lossy compression."
- ❌ Wrong: "Lossy compression is good because the file is smaller." ✓ Right: "Lossy compression is advantageous when file size is a priority and a slight reduction in quality is acceptable, such as in streaming video."
- ❌ Wrong: "RLE makes files smaller" ✓ Right: "RLE is effective when there are long runs of repeated data"
- ❌ Omitting units (bytes, KB, MB, etc.) when calculating file sizes. Always state the units.
Exam Tips
- Always read the question carefully and identify what is being asked. Are they asking for the type of compression or an example?
- When calculating file sizes, show all your working steps clearly. This allows for partial credit even if the final answer is incorrect.
- When explaining the advantages and disadvantages of compression techniques, provide specific examples to illustrate your points.
- Practice RLE compression on different strings to become comfortable with the process. Consider cases where RLE is not effective (e.g., a completely random string).