Python Modules, Functions & Lists

These three concepts are where Python stops being a calculator and starts being a real codebase. Modules give your code structure, functions make logic reusable, and lists are how you move batches of records before handing them off to a DataFrame. Without a solid grip on all three, you’ll find yourself copy-pasting logic across files and wondering why debugging takes three times longer than it should.

Production ETL pipelines live inside modules (extractors.py, transformers.py). Each transformation is a function with type hints. Data moves through lists first. Get these fundamentals right and everything that follows — classes, decorators, Airflow DAGs — makes much more sense.


Quick Reference

Module Import (Standard Pattern)

# ✅ PREFERRED: Standard library → third-party → local (blank lines between)
import os
import sys
from datetime import datetime
 
import pandas as pd
import requests
 
from my_pipeline import extract_data, load_data

Function with Type Hints

def transform_temperature(celsius: float, target_unit: str = "F") -> float:
    """
    Convert Celsius to Fahrenheit or Kelvin.
    
    Args:
        celsius: Temperature value in Celsius
        target_unit: "F" for Fahrenheit, "K" for Kelvin
        
    Returns:
        Converted temperature as float
        
    Raises:
        ValueError: If target_unit is not recognized
    """
    if target_unit == "F":
        return (celsius * 9/5) + 32
    elif target_unit == "K":
        return celsius + 273.15
    else:
        raise ValueError(f"Unknown unit: {target_unit}")

List Comprehension (Pythonic Loop)

# ❌ Old way: imperative loop
celsius = [0, 10, 20, 30, 40]
fahrenheit = []
for c in celsius:
    fahrenheit.append((c * 9/5) + 32)
 
# ✅ Pythonic way: comprehension
fahrenheit = [(c * 9/5) + 32 for c in celsius]
 
# With filtering
hot_celsius = [c for c in celsius if c >= 20]  # [20, 30, 40]

Key Concepts

1. Modules

Definition: A module is a .py file containing Python code (functions, classes, variables). Modules prevent code duplication and provide namespaces—essential when scaling from scripts to pipelines.

Usage: Organize related functionality into separate modules. In ETL, typical structure is extract.py, transform.py, load.py. Import at the top of your file; Python caches imports (runs once per session).

Best practices:

  • Import at module level (top of file), not inside functions.
  • Group imports: standard library → third-party → local.
  • Use absolute imports (from my_package import func) over relative imports.
  • Never use from module import * (pollutes namespace, unclear where names come from).

2. Functions

Definition: Functions are reusable blocks of code that take inputs (parameters) and return outputs. They’re the atoms of transformations in data pipelines.

Usage: Wrap any logic you’d repeat twice into a function. Use type hints to document expected types. Write docstrings to explain what the function does and edge cases.

Core patterns:

  • Positional arguments: func(a, b) — order matters, best for ≤3 params.
  • Keyword arguments: func(a=1, b=2) — explicit, readable, best for complex calls.
  • Default arguments: func(a, b="default") — optional parameters.
  • *args: func(*items) — variable-length tuple of positional args.
  • **kwargs: func(**config) — variable-length dict of keyword args.

3. Lists

Definition: Lists are ordered, mutable sequences of items. They’re the primary data structure for processing batches of records before moving to DataFrames.

Usage: Create with [...], access with indexing (list[0], list[-1]), modify with .append(), .insert(), .remove(), .pop(). Iterate with for item in list:.

Key methods:

  • .append(x) — add to end.
  • .insert(i, x) — insert at index.
  • .remove(x) — remove first occurrence.
  • .pop(i) — remove and return at index.
  • .sort(), .reverse() — in-place modifications.

Common Patterns

PatternUse CaseExample
Import organizationKeeping code readable and imports debuggableimport os; import pandas as pd; from .utils import helper
Function with type hintsMaking functions self-documenting and IDE-friendlydef process(data: list[dict]) -> list[dict]:
List comprehension with filterTransforming and selecting data in one pass[x*2 for x in nums if x > 0]
Nested comprehensionFlattening 2D structures (matrices, lists of lists)[item for row in matrix for item in row]
Default function argumentOptional parameters without overloadingdef run(config_path: str = "default.yaml"):
*args in functionUnknown number of similar argumentsdef log_events(*events): for e in events: print(e)
**kwargs in functionConfiguration dicts (common in Airflow)def task(task_id: str, **context): ...

Tips & Gotchas

  • Imports must be at the top of the file. If an import fails inside a function, it will crash at runtime, not on load. Put imports where Python sees them first.

  • from module import * is dangerous. If two modules export the same name, which one wins? Use explicit imports like from module import func1, func2.

  • Functions return None by default. If you write a function without a return statement, it returns None. Check every code path.

  • Mutable default arguments are shared. Don’t use def func(items=[]): because the list persists across calls. Use def func(items=None): items = items or [] instead.

  • List comprehensions must be readable. If your comprehension is more than one line or has >2 conditions, use a regular loop. Code clarity beats cleverness.

  • Lists are ordered, dicts are not (well, 3.7+ dicts preserve insertion order but don’t rely on it for logic). Use a list when order matters, a dict when you need key-value pairs.

  • Remember: imports are cached. If you reload a module in a Jupyter notebook, Python doesn’t automatically re-run the import. Use importlib.reload() or restart the kernel.



Key Takeaway: Organize your code into modules, wrap your logic in typed functions, and move your data through lists. These three habits separate readable, maintainable pipelines from scripts that only the original author can debug — and only on a good day.