Error handling is the part of Python everyone skips until something breaks in production. In Data Engineering, pipelines fail constantly — API timeouts, corrupted CSV rows, database hiccups, missing files. The difference between a fragile script and a production pipeline is whether you anticipated those failures. Robust error handling means your code logs the problem, retries intelligently, or fails safely without losing progress.


Basic Syntax

Try-Except Block

try:
    # Code that might raise an exception
    result = 10 / x
except ZeroDivisionError:
    # Handle the specific exception
    print("Cannot divide by zero")

Multiple Except Blocks

try:
    # Risky code
    file = open("data.csv")
    value = int(file.read())
    result = 100 / value
except FileNotFoundError:
    print("File does not exist")
except ValueError:
    print("Could not convert to integer")
except ZeroDivisionError:
    print("Cannot divide by zero")
except Exception as e:
    # Catch any other exception
    print(f"Unexpected error: {e}")

Try-Except-Else-Finally

try:
    file = open("data.txt")
    content = file.read()
except FileNotFoundError:
    print("File not found")
else:
    # Executes ONLY if no exception occurred
    print(f"Read {len(content)} characters")
finally:
    # ALWAYS executes, even if exception or return in except
    if file:
        file.close()
    print("Cleanup complete")

Key concept: Only ONE except block executes. Python checks top-to-bottom and stops at the first match.


Quick Reference Examples

Catching and Logging Exceptions

import logging
 
logger = logging.getLogger(__name__)
 
try:
    data = fetch_from_api()
    process(data)
except ConnectionError as e:
    logger.error(f"API connection failed: {e}", exc_info=True)
    # exc_info=True includes full traceback in logs
except Exception as e:
    logger.critical(f"Unexpected error: {e}", exc_info=True)
    raise  # Re-raise to propagate to caller

Using Else and Finally (Resource Cleanup)

database_connection = None
 
try:
    database_connection = connect_to_db("postgresql://...")
    cursor = database_connection.cursor()
    cursor.execute("SELECT * FROM users")
    results = cursor.fetchall()
except ConnectionError as e:
    logger.error(f"Could not connect to database: {e}")
    results = []
else:
    print(f"Successfully retrieved {len(results)} rows")
finally:
    # Always close the connection, whether success or failure
    if database_connection:
        database_connection.close()
        logger.info("Database connection closed")

Re-raising Exceptions with Context

try:
    validate_csv(file)
except ValidationError as e:
    logger.error(f"CSV validation failed: {e}")
    raise ValueError(f"Invalid CSV: {e}") from e
    # 'from e' preserves original exception in traceback

Custom Exception Handling Pattern

def load_config(config_file: str) -> dict:
    """Load config safely with fallback."""
    try:
        with open(config_file) as f:
            return json.load(f)
    except FileNotFoundError:
        logger.warning(f"Config not found: {config_file}, using defaults")
        return DEFAULT_CONFIG
    except json.JSONDecodeError as e:
        logger.error(f"Invalid JSON in {config_file}: {e}")
        raise ValueError(f"Corrupted config file: {e}") from e

Common Built-in Exceptions

ExceptionWhen It OccursExample
ValueErrorInvalid value for operationint("abc")
TypeErrorWrong type for operation"5" + 5
KeyErrorKey not found in dictd["missing_key"]
IndexErrorList index out of rangelst[999]
FileNotFoundErrorFile doesn’t existopen("missing.txt")
ZeroDivisionErrorDivision by zero10 / 0
ConnectionErrorNetwork/database failureAPI call times out
TimeoutErrorOperation takes too longSlow API response
AttributeErrorAttribute doesn’t existobj.missing_attr

Creating Custom Exceptions

Custom exceptions help you handle domain-specific errors clearly.

Simple Custom Exception

class InvalidTemperatureError(Exception):
    """Raised when temperature is outside valid range."""
    pass
 
def validate_temperature(celsius: float) -> float:
    if celsius < -273.15:
        raise InvalidTemperatureError(
            f"Temperature {celsius}°C is below absolute zero"
        )
    return celsius

Custom Exception with Additional Data

class DataValidationError(Exception):
    """Raised when data validation fails."""
    
    def __init__(self, message: str, invalid_rows: list, row_count: int):
        super().__init__(message)
        self.invalid_rows = invalid_rows
        self.row_count = row_count
    
    def summary(self) -> str:
        percent = (len(self.invalid_rows) / self.row_count) * 100
        return f"{len(self.invalid_rows)} invalid rows ({percent:.1f}%)"
 
# Usage
try:
    validate_csv(data)
except DataValidationError as e:
    logger.error(f"Validation failed: {e.summary()}")
    logger.debug(f"Invalid rows: {e.invalid_rows[:10]}")  # Show first 10

Data Engineering Patterns

Retry Logic with Exponential Backoff

import time
from typing import Callable, Any
 
def retry(max_attempts: int = 3, delay: float = 1.0, backoff: float = 2.0):
    """Decorator to retry a function with exponential backoff."""
    def decorator(func: Callable) -> Callable:
        def wrapper(*args, **kwargs) -> Any:
            attempt = 0
            current_delay = delay
            
            while attempt < max_attempts:
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    attempt += 1
                    if attempt >= max_attempts:
                        logger.error(f"{func.__name__} failed after {max_attempts} attempts")
                        raise
                    
                    logger.warning(
                        f"{func.__name__} failed (attempt {attempt}/{max_attempts}), "
                        f"retrying in {current_delay}s: {e}"
                    )
                    time.sleep(current_delay)
                    current_delay *= backoff
        return wrapper
    return decorator
 
@retry(max_attempts=5, delay=1.0, backoff=2.0)
def fetch_from_api(url: str) -> dict:
    """Retry up to 5 times with exponential backoff (1s, 2s, 4s, 8s, 16s)."""
    response = requests.get(url, timeout=5)
    response.raise_for_status()
    return response.json()

Validating ETL Data with Custom Exceptions

def extract_and_validate(file_path: str) -> list[dict]:
    """Extract CSV with validation."""
    try:
        with open(file_path) as f:
            reader = csv.DictReader(f)
            records = list(reader)
    except FileNotFoundError:
        raise DataSourceError(f"File not found: {file_path}")
    except csv.Error as e:
        raise DataSourceError(f"CSV parsing error: {e}") from e
    
    # Validate data
    invalid_rows = []
    for i, record in enumerate(records, start=1):
        try:
            validate_record(record)
        except ValueError as e:
            invalid_rows.append((i, record, str(e)))
    
    if invalid_rows:
        raise DataValidationError(
            f"Found {len(invalid_rows)} invalid records",
            invalid_rows,
            len(records)
        )
    
    return records
 
def validate_record(record: dict) -> None:
    """Validate a single record."""
    if not record.get("id"):
        raise ValueError("Missing required field: id")
    
    try:
        int(record["id"])
    except ValueError:
        raise ValueError(f"id must be integer, got: {record['id']}")

Fallback Pattern for Missing Data

def load_data_with_fallback(primary_source: str, fallback_source: str) -> list[dict]:
    """Try primary source, fall back to secondary on failure."""
    try:
        logger.info(f"Attempting to load from {primary_source}")
        return load_from_database(primary_source)
    except ConnectionError as e:
        logger.warning(f"Primary source failed: {e}, using fallback")
        try:
            return load_from_database(fallback_source)
        except Exception as e:
            logger.critical(f"Both sources failed: {e}")
            raise DataUnavailableError(
                f"Could not load data from {primary_source} or {fallback_source}"
            ) from e

Tips & Gotchas

  • Never use bare except: — It catches everything including KeyboardInterrupt and SystemExit, making it impossible to stop your program.
# ❌ BAD: Catches ALL exceptions, even Ctrl+C
try:
    process_data()
except:
    print("Error")
 
# ✅ GOOD: Catch specific exceptions
try:
    process_data()
except (ValueError, TypeError) as e:
    logger.error(f"Data error: {e}")
except Exception as e:
    logger.critical(f"Unexpected error: {e}")
  • Catch specific exceptions first, generic ones last. Python stops at the first match.
# ❌ Wrong order (ValueError will never match)
try:
    int("abc")
except Exception as e:
    print(e)
except ValueError as e:
    print("Invalid number")
 
# ✅ Correct order
try:
    int("abc")
except ValueError as e:
    print("Invalid number")
except Exception as e:
    print(e)
  • Use from e when re-raising to preserve the original traceback. This is critical for debugging.
# ❌ Loses original error context
try:
    risky_operation()
except Exception:
    raise ValueError("Operation failed")
 
# ✅ Preserves traceback
try:
    risky_operation()
except Exception as e:
    raise ValueError("Operation failed") from e
  • The finally block always executes, even if you return in the except block. Use it for cleanup (closing files, connections, releasing locks).
try:
    file = open("data.txt")
    return file.read()  # Looks like function ends here
except FileNotFoundError:
    return "No data"
finally:
    file.close()  # But this STILL executes!
  • The else block runs ONLY if no exception occurred. Use it to separate “success path” from “error handling.”
try:
    result = compute_expensive_operation()
except ValueError as e:
    logger.error(f"Computation failed: {e}")
else:
    # Only executes if compute_expensive_operation() succeeded
    save_result_to_database(result)
    logger.info("Successfully saved result")
  • Log exceptions with exc_info=True to get full tracebacks. Don’t just print the message.
# ❌ Loses debugging info
except Exception as e:
    print(f"Error: {e}")
 
# ✅ Includes full traceback in logs
except Exception as e:
    logger.error(f"Error: {e}", exc_info=True)


Key Takeaway:
Anticipate failures, catch specific exceptions, log with full context, and always clean up resources in finally blocks. Production code is not about preventing errors—it’s about handling them gracefully.