obsolete Issue: Pandas to_sql 'Engine' object has no attribute 'cursor' - matifaro/pandas GitHub Wiki

Issue - Pandas to_sql 'Engine' object has no attribute 'cursor'

The issue

The issue Pandas to_sql 'Engine' object has no attribute 'cursor' can be found in https://github.com/pandas-dev/pandas/issues/60684.

When using pandas.to_sql() with a SQLAlchemy engine, a runtime error occurs: 'Engine' object has no attribute cursor, since it assumes that the con parameter is a DBAPI connection, with a .cursor() method. While passing an Engine, we must first get a proper connection, as it does not have that attribute.

Requirements

Developers use DataFrama.to_sql() to upload large DataFrames to a database. It is expected that raw DBAPI connections or SQLAlchemy connections are accepted.

  • As a developer, I want df.to_sql() to work when passing a SQLAlchemy Engine by automatically connecting it, so I don't have to manually call .connect() every time.

Source code files

The following files are involved in the issue:

  • pandas/io/sql.py - Contains to_sql() and database handling code
  • pandas/tests/io/test_sql.py - Contains tests for to_sql()
def to_sql(
    frame,
    name: str,
    con,
    schema: str | None = None,
    if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail",
    index: bool = True,
    index_label: IndexLabel | None = None,
    chunksize: int | None = None,
    dtype: DtypeArg | None = None,
    method: Literal["multi"] | Callable | None = None,
    engine: str = "auto",
    **engine_kwargs,
) -> int | None:

    if if_exists not in ("fail", "replace", "append", "delete_rows"):
        raise ValueError(f"'{if_exists}' is not valid for if_exists")

    if isinstance(frame, Series):
        frame = frame.to_frame()
    elif not isinstance(frame, DataFrame):
        raise NotImplementedError(
            "'frame' argument should be either a Series or a DataFrame"
        )

    with pandasSQL_builder(con, schema=schema, need_transaction=True) as pandas_sql:
        return pandas_sql.to_sql(
            frame,
            name,
            if_exists=if_exists,
            index=index,
            index_label=index_label,
            schema=schema,
            chunksize=chunksize,
            dtype=dtype,
            method=method,
            engine=engine,
            **engine_kwargs,
        )

Design of the fix

Because this issue stems from the common confusion between SQLAlchemy Engine and raw database connections, there is a growing consensus that pandas.to_sql() should be more flexible and user-friendly in handling these types. After researching the problem and typical usage patterns, we designed a solution that covers the following situations:

  • When a user passes a SQLAlchemy Engine instead of a direct DBAPI connection, Pandas should automatically create a connection by calling engine.connect() internally, without raising an error;

  • When a user passes a valid DBAPI connection, with a .cursor() method, Pandas should proceed without modification, maintaining backward compatibility.

In order to make the fix work as naturally as possible, we planned to minimize user impact, having the function intelligently detect the type of the con parameter without requiring the user to change their code.

Class Diagram

Fix source code

In our fix, we detect if con has a .connect() method but not a .cursor() method and, if so, we automatically call con = con.connect() before proceeding.

def to_sql(
    frame,
    name: str,
    con,
    schema: str | None = None,
    if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail",
    index: bool = True,
    index_label: IndexLabel | None = None,
    chunksize: int | None = None,
    dtype: DtypeArg | None = None,
    method: Literal["multi"] | Callable | None = None,
    engine: str = "auto",
    **engine_kwargs,
) -> int | None:

    if hasattr(con, "connect") and not hasattr(con, "cursor"):
        con = con.connect()


    if if_exists not in ("fail", "replace", "append", "delete_rows"):
        raise ValueError(f"'{if_exists}' is not valid for if_exists")

    if isinstance(frame, Series):
        frame = frame.to_frame()
    elif not isinstance(frame, DataFrame):
        raise NotImplementedError(
            "'frame' argument should be either a Series or a DataFrame"
        )

    with pandasSQL_builder(con, schema=schema, need_transaction=True) as pandas_sql:
        return pandas_sql.to_sql(
            frame,
            name,
            if_exists=if_exists,
            index=index,
            index_label=index_label,
            schema=schema,
            chunksize=chunksize,
            dtype=dtype,
            method=method,
            engine=engine,
            **engine_kwargs,
        )

With this fix, we ensure compatibility with SQLAlchemy engines without impacting other connection types.

Test for the new code with the fix:


def test_to_sql_sqlalchemy_engine():
    engine = create_engine('sqlite:///:memory:')
    test_table_name = 'test_table'
    test_df = pd.DataFrame({
        'id': [1, 2, 3],
        'name': ['Alice', 'Bob', 'Charlie']
    })
    test_df.to_sql(test_table_name, con=engine, if_exists='replace', index=False)
    result = pd.read_sql_table(test_table_name, con=engine)
    pd.testing.assert_frame_equal(test_df, result)

Submit the fix

Currently we haven't submitted this fix yet given that we are still waiting on a response from the head of the collaborators.