5 min read
On this page

Encoding & Transformation

Encoding is the process of converting information from one representation to another. Transformation is changing data's shape, format, or structure while preserving its meaning. Together, they are how information moves between systems, languages, and contexts.

You do this constantly in everyday life — converting between languages, currencies, units of measurement, and formats. This chapter explores how encoding and transformation work and why they matter in technology.

What Is Encoding?

Encoding means representing information using a particular system of symbols or rules. The information stays the same; the representation changes.

Everyday Encoding

The number "seven" can be encoded many ways:

English:     seven
Spanish:     siete
Roman:       VII
Binary:      111
Morse code:  --...
Tally marks: |||| ||
Braille:     (a pattern of raised dots)

Each encoding represents the same concept — the quantity seven — using a different system. The choice of encoding depends on who or what needs to read it.

Language as Encoding

Language itself is an encoding of thought. When you say "the cat sat on the mat," you are encoding an image or idea into a sequence of sounds (or letters). The listener decodes those sounds back into the image.

Translation between languages is re-encoding:

English:   The cat sat on the mat.
French:    Le chat s'est assis sur le tapis.
Japanese:  猫がマットの上に座った。

The meaning is preserved (approximately). The encoding changes.

What Is Transformation?

Transformation changes the shape or structure of data while keeping the essential content intact.

Currency Conversion

$100 USD = €92.50 EUR = ¥15,000 JPY (approximate)

The value is preserved; the representation changes. Currency conversion is a transformation with a conversion rate.

Unit Conversion

72°F = 22.2°C
1 mile = 1.609 kilometers
1 kilogram = 2.205 pounds

Same physical reality, different measurement systems. The transformation follows a specific rule or formula.

Map Projections

A globe is a three-dimensional representation of Earth. A flat map is a two-dimensional transformation of that globe. Every flat map distorts something — area, shape, distance, or direction — because you cannot perfectly represent a sphere on a flat surface.

Mercator projection:  Preserves angles and shapes, distorts area
                      (Greenland looks as big as Africa)
Peters projection:    Preserves area, distorts shapes
                      (continents look stretched)
Robinson projection:  Compromises on everything for a balanced look

Each projection is a different transformation of the same underlying data, optimized for different purposes.

Encoding in Technology

Character Encoding

Computers store everything as numbers. To store text, each character is assigned a number. This assignment is a character encoding.

ASCII encoding (early standard):
  A = 65
  B = 66
  a = 97
  1 = 49
  ! = 33

UTF-8 encoding (modern standard):
  A = 65
  é = 233
  中 = 20013
  😀 = 128512

ASCII only covers English letters and basic symbols — 128 characters total. UTF-8 covers virtually every writing system in the world plus emoji and special symbols.

When you see garbled text on a webpage — characters like "é" instead of "é" — it usually means the text was encoded in one system and decoded in another. The data is not corrupted; it is being interpreted with the wrong encoding.

Image Encoding

A digital photograph encodes visual information as numbers:

Each pixel stores:
  Red value:   0-255
  Green value:  0-255
  Blue value:   0-255

A single pixel:
  Red = 66, Green = 135, Blue = 245 -> a shade of blue

A 1920x1080 image:
  1920 x 1080 = 2,073,600 pixels
  Each pixel = 3 bytes
  Total = ~6.2 megabytes (uncompressed)

Image formats like JPEG and PNG use different compression strategies to reduce file size. JPEG sacrifices some quality for smaller files (lossy compression). PNG preserves exact quality but creates larger files (lossless compression). The choice is a trade-off.

Data Transformation in Technology

Serialization

Serialization transforms data from a program's internal format into a format that can be stored or transmitted:

In the program's memory:
  An object with fields: name, age, email

Serialized to JSON:
  {"name": "Maria", "age": 34, "email": "maria@email.com"}

Serialized to XML:
  <person>
    <name>Maria</name>
    <age>34</age>
    <email>maria@email.com</email>
  </person>

The receiving system deserializes — transforms the stored format back into its internal representation. Serialization and deserialization are how systems exchange data.

Data Pipelines

A data pipeline is a sequence of transformations that raw data passes through to become useful:

Raw data:  Messy spreadsheet from a supplier with inconsistent formatting

Step 1 - Clean:     Fix misspellings, standardize date formats
Step 2 - Validate:  Remove rows with missing required fields
Step 3 - Transform: Convert currencies to USD, convert units to metric
Step 4 - Enrich:    Add geographic coordinates from addresses
Step 5 - Load:      Insert into the database in the correct format

Each step transforms the data into a more useful form. The raw input and the final output represent the same information, but the final form is clean, consistent, and ready for use.

Lossless vs Lossy

An important distinction in encoding and transformation:

Lossless

The original can be perfectly reconstructed from the encoded form. Nothing is lost.

Everyday: Translating "hello" to "hola" and back to "hello"
Tech:     ZIP compression, PNG images, FLAC audio

Lossy

Some information is deliberately discarded to achieve a benefit (usually smaller size). The original cannot be perfectly reconstructed.

Everyday: Summarizing a 300-page book into a 2-page summary
Tech:     JPEG images, MP3 audio, video streaming

Lossy encoding is acceptable when the lost information is not important for the use case. A music listener may not notice the frequencies MP3 removes. But a professional audio engineer might.

Round-Trip Problems

A round trip is when data is encoded, transmitted or stored, and then decoded back. Problems arise when the round trip is not perfect:

Problem: Enter a price as $1,234.56 in a form
  -> System stores it as text: "1,234.56"
  -> Another system reads it as a number: 1.23456 (wrong!)
  -> The comma was interpreted as a decimal separator

Problem: Text written in UTF-8
  -> Stored in a system that assumes ASCII
  -> Characters outside ASCII become garbled
  -> Re-saving does not fix it — the original data is lost

Round-trip fidelity — ensuring data survives the journey from one format to another and back — is a constant concern in technology.

Common Pitfalls

Assuming one encoding fits all

ASCII works for English text but fails for other languages. A format that works for one system may not work for another. Always consider the full range of data you need to handle.

Losing data in transformation

Every lossy transformation is irreversible. Converting a high-resolution image to a low-resolution JPEG and then trying to enlarge it back does not restore the original quality. Keep original data when possible.

Ignoring encoding when sharing data

When two systems exchange data, they must agree on the encoding. Sending UTF-8 text to a system expecting ASCII, or sending dates in MM/DD/YYYY format to a system expecting DD/MM/YYYY, causes errors.

Not validating after transformation

After transforming data, check that the result is correct. Did the currency conversion use the right rate? Did the date format conversion handle ambiguous dates (is 03/04/2026 March 4th or April 3rd)?

Chaining lossy transformations

Each lossy step degrades the data further. Saving a JPEG, editing it, and saving as JPEG again introduces more quality loss each time. This is called generation loss.

Key Takeaways

  • Encoding represents information using a particular system of symbols. Transformation changes data's shape or format while preserving meaning.
  • Everyday encoding includes language, currency, and units of measurement. Everyday transformations include translation, conversion, and summarization.
  • In technology, character encoding (ASCII, UTF-8), image encoding (JPEG, PNG), and audio encoding (MP3, WAV) are fundamental.
  • Data pipelines are sequences of transformations that turn raw data into usable information.
  • Lossless transformations preserve all information. Lossy transformations discard some information for practical benefits.
  • When systems exchange data, they must agree on encoding. Mismatches cause garbled or incorrect data.