1. Overview
Data representation is fundamental to computer science because it explains how computers store and manipulate information. Understanding how text, sound, and images are converted into binary form allows us to appreciate the limitations and possibilities of digital technology and to make informed choices about file formats and compression techniques. This topic also lays the groundwork for understanding more advanced concepts such as data structures and algorithms.
Key Definitions
- Character Set: A collection of characters (letters, numbers, symbols) that a computer system recognises. Each character is assigned a unique numerical code.
- ASCII (American Standard Code for Information Interchange): A 7-bit character encoding standard representing 128 characters.
- Unicode: A character encoding standard that supports a vast number of characters, including those from different languages worldwide.
- Sample Rate: The number of audio samples taken per second, measured in Hertz (Hz).
- Sample Resolution (Bit Depth): The number of bits used to represent each audio sample, determining the accuracy of the sound's amplitude.
- Pixel: The smallest unit of a digital image, containing colour information.
- Resolution: The number of pixels in an image, usually expressed as width x height (e.g., 1920 x 1080).
- Colour Depth: The number of bits used to represent the colour of a single pixel.
- Bitmap Image: An image represented as a grid of pixels, each containing colour data. Also known as raster images.
- Vector Graphic: An image stored as mathematical descriptions of shapes (lines, curves, polygons).
Core Content
Text Representation
- Computers represent text using character sets, which assign a numerical code to each character.
- ASCII:
- Uses 7 bits to represent 128 characters (0-127).
- Includes uppercase and lowercase letters (A-Z, a-z), digits (0-9), punctuation marks, and control characters (e.g., line feed, carriage return).
- Limited to representing English characters and basic symbols.
- Extended ASCII uses 8 bits, allowing for 256 characters, but isn't a universal standard.
- Unicode:
- Supports over 143,000 characters from almost all writing systems worldwide.
- Uses variable-length encoding schemes like UTF-8 (most common for web pages), UTF-16, and UTF-32.
- UTF-8 uses 1-4 bytes per character.
- Each character is assigned a unique code point. Example ASCII codes:
| Character | Decimal | Binary |
|---|---|---|
| A | 65 | 01000001 |
| B | 66 | 01000010 |
| Z | 90 | 01011010 |
| a | 97 | 01100001 |
| 0 | 48 | 00110000 |
| Space | 32 | 00100000 |
Note: Uppercase and lowercase letters have different codes (A=65, a=97). * Advantage of Unicode: Supports almost all languages, globally compatible. * Disadvantage of Unicode: Larger file size compared to ASCII for simple English text.
Sound Representation
- Sound is an analogue signal; computers need to convert it into a digital form (binary) through a process called sampling.
- The analogue sound wave's amplitude is measured at regular intervals.
- Sample Rate:
- Measured in Hertz (Hz) - samples per second.
- Higher sample rate means more samples are taken per second, resulting in a more accurate representation of the original sound.
- Higher sample rate = better sound quality, but larger file size.
- Example: CD quality audio uses a sample rate of 44,100 Hz (44.1 kHz).
Image Representation
- Digital images are represented as a grid of pixels.
- Bitmap Images (Raster Images):
- Each pixel contains colour information.
- Resolution: The number of pixels in the image (width x height).
- Higher resolution = more pixels = more detail, but a larger file size.
- Example: A 1920x1080 image has approximately 2 million pixels.
- Colour Depth: The number of bits used to represent the colour of each pixel.
- 1 bit = 2 colours (e.g., black and white)
- 8 bits = 28 = 256 colours (commonly used for GIFs)
- 24 bits = 224 = 16,777,216 colours (True colour – commonly used for JPEGs and PNGs)
Exam Focus
- Examiners expect you to understand the relationship between data representation (e.g., sample rate, colour depth, resolution) and file size, and how increasing one affects the other.
- Use precise technical terms: "sample rate", "sample resolution", "colour depth", "resolution", "character set", "encoding".
- Explain why computers use binary, not just that they "understand" it.
- Be prepared to calculate file sizes based on given parameters (e.g., audio duration, sample rate, colour depth).
- Understand the differences between bitmap and vector images and their respective advantages and disadvantages.
Common Mistakes to Avoid
- ❌ Wrong: "Computers use binary because it's what they understand." ✓ Right: "Computers use binary because electronic circuits have two states (on/off), which can be represented as 0 and 1. This enables simple and reliable processing and storage of data."
- ❌ Wrong: "Higher resolution is always better." ✓ Right: "Higher resolution results in more detail in the image, but also increases file size. You need to consider the trade-off between quality and file size."
- ❌ Wrong: "ASCII supports all languages." ✓ Right: "ASCII is limited to 128 characters and primarily supports English characters. Unicode is needed to support a wider range of languages."
- ❌ Wrong: Failing to specify units. ✓ Right: Always include units like Hz, bits, bytes, KB, MB, pixels, etc. in your answers and calculations.
- ❌ Wrong: Not explaining what sample rate or bit depth actually mean. ✓ Right: Make sure you can clearly explain what each term represents and how it affects the quality of the sound or image.
Exam Tips
- When asked to explain why computers use binary, focus on the ease of implementation with electronic circuits (on/off states).
- For file size calculations, pay careful attention to units (bits vs. bytes, KB vs. MB) and make sure you show your working.
- When comparing ASCII and Unicode, highlight Unicode's ability to represent characters from multiple languages as the main advantage.
- If asked about choosing between bitmap and vector graphics, consider the type of image (photorealistic vs. simple shapes) and the need for scalability.