close
Skip to content

πŸ›[BUG]:AttributeError: 'NoneType' object has no attribute 'patches_enabled' during import (Multi-GPU without megatron-core)Β #834

@mg-huang

Description

@mg-huang

Version

0.13.0

On which installation method(s) does this occur?

No response

Describe the issue


πŸ“Œ Title

[Bug] AttributeError: 'NoneType' object has no attribute 'patches_enabled' during import (Multi-GPU without megatron-core)

πŸ“ Description

When running inference on a multi-GPU system without distributed dependencies (e.g., megatron-core) installed, importing physicsnemo causes a fatal AttributeError.

The code correctly uses a try...except block to catch the ImportError for distributed libraries, falling back to setting ShardTensor = None. However, on line 223 of natten_patches.py, there is an unprotected module-level access to ShardTensor.patches_enabled, which inevitably crashes the application upon import.

πŸ”„ Steps to Reproduce

  1. Use a multi-GPU environment (e.g., 2x NVIDIA RTX/A100).
  2. Install earth2studio and physicsnemo without megatron-core or apex.
  3. Attempt to import the module:
    python -c "from physicsnemo.domain_parallel.shard_utils.natten_patches import ShardTensor"
  4. Observe the crash.

❌ Error Traceback

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../site-packages/physicsnemo/domain_parallel/shard_utils/natten_patches.py", line 223, in <module>
    "natten.functional", "na2d", enabled=ShardTensor.patches_enabled
AttributeError: 'NoneType' object has no attribute 'patches_enabled'

πŸ” Root Cause

In physicsnemo/domain_parallel/shard_utils/natten_patches.py, the fallback logic sets ShardTensor to None if distributed dependencies are missing.
However, at line 223, the code executes at the module level:

patch_method(
    "natten.functional", "na2d", enabled=ShardTensor.patches_enabled
)

Since ShardTensor is None, calling .patches_enabled raises an AttributeError, making it a hard blocker for any non-distributed multi-GPU inference.

πŸ’‘ Proposed Solution

Add a safe None check or use getattr to allow the import to succeed gracefully for non-distributed users:

# Suggested fix for line 223:
patch_method(
    "natten.functional", "na2d", enabled=getattr(ShardTensor, 'patches_enabled', False)
)

πŸ–₯️ Environment

  • OS: Linux
  • Python: 3.11
  • PyTorch: 2.5.1
  • Hardware: Multi-GPU (e.g., 2 GPUs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions