obsolete Issue: Pandas to_sql 'Engine' object has no attribute 'cursor' - matifaro/pandas GitHub Wiki
Issue - Pandas to_sql 'Engine' object has no attribute 'cursor'
The issue
The issue Pandas to_sql 'Engine' object has no attribute 'cursor'
can be found in https://github.com/pandas-dev/pandas/issues/60684.
When using pandas.to_sql()
with a SQLAlchemy engine, a runtime error occurs: 'Engine' object has no attribute cursor, since it assumes that the con
parameter is a DBAPI connection, with a .cursor() method. While passing an Engine, we must first get a proper connection, as it does not have that attribute.
Requirements
Developers use DataFrama.to_sql()
to upload large DataFrames to a database. It is expected that raw DBAPI connections or SQLAlchemy connections are accepted.
- As a developer, I want
df.to_sql()
to work when passing a SQLAlchemy Engine by automatically connecting it, so I don't have to manually call.connect()
every time.
Source code files
The following files are involved in the issue:
pandas/io/sql.py
- Containsto_sql()
and database handling codepandas/tests/io/test_sql.py
- Contains tests forto_sql()
def to_sql(
frame,
name: str,
con,
schema: str | None = None,
if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail",
index: bool = True,
index_label: IndexLabel | None = None,
chunksize: int | None = None,
dtype: DtypeArg | None = None,
method: Literal["multi"] | Callable | None = None,
engine: str = "auto",
**engine_kwargs,
) -> int | None:
if if_exists not in ("fail", "replace", "append", "delete_rows"):
raise ValueError(f"'{if_exists}' is not valid for if_exists")
if isinstance(frame, Series):
frame = frame.to_frame()
elif not isinstance(frame, DataFrame):
raise NotImplementedError(
"'frame' argument should be either a Series or a DataFrame"
)
with pandasSQL_builder(con, schema=schema, need_transaction=True) as pandas_sql:
return pandas_sql.to_sql(
frame,
name,
if_exists=if_exists,
index=index,
index_label=index_label,
schema=schema,
chunksize=chunksize,
dtype=dtype,
method=method,
engine=engine,
**engine_kwargs,
)
Design of the fix
Because this issue stems from the common confusion between SQLAlchemy Engine and raw database connections, there is a growing consensus that pandas.to_sql()
should be more flexible and user-friendly in handling these types. After researching the problem and typical usage patterns, we designed a solution that covers the following situations:
-
When a user passes a SQLAlchemy Engine instead of a direct DBAPI connection, Pandas should automatically create a connection by calling
engine.connect()
internally, without raising an error; -
When a user passes a valid DBAPI connection, with a .cursor() method, Pandas should proceed without modification, maintaining backward compatibility.
In order to make the fix work as naturally as possible, we planned to minimize user impact, having the function intelligently detect the type of the con parameter without requiring the user to change their code.
Fix source code
In our fix, we detect if con has a .connect()
method but not a .cursor()
method and, if so, we automatically call con = con.connect()
before proceeding.
def to_sql(
frame,
name: str,
con,
schema: str | None = None,
if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail",
index: bool = True,
index_label: IndexLabel | None = None,
chunksize: int | None = None,
dtype: DtypeArg | None = None,
method: Literal["multi"] | Callable | None = None,
engine: str = "auto",
**engine_kwargs,
) -> int | None:
if hasattr(con, "connect") and not hasattr(con, "cursor"):
con = con.connect()
if if_exists not in ("fail", "replace", "append", "delete_rows"):
raise ValueError(f"'{if_exists}' is not valid for if_exists")
if isinstance(frame, Series):
frame = frame.to_frame()
elif not isinstance(frame, DataFrame):
raise NotImplementedError(
"'frame' argument should be either a Series or a DataFrame"
)
with pandasSQL_builder(con, schema=schema, need_transaction=True) as pandas_sql:
return pandas_sql.to_sql(
frame,
name,
if_exists=if_exists,
index=index,
index_label=index_label,
schema=schema,
chunksize=chunksize,
dtype=dtype,
method=method,
engine=engine,
**engine_kwargs,
)
With this fix, we ensure compatibility with SQLAlchemy engines without impacting other connection types.
Test for the new code with the fix:
def test_to_sql_sqlalchemy_engine():
engine = create_engine('sqlite:///:memory:')
test_table_name = 'test_table'
test_df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
test_df.to_sql(test_table_name, con=engine, if_exists='replace', index=False)
result = pd.read_sql_table(test_table_name, con=engine)
pd.testing.assert_frame_equal(test_df, result)
Submit the fix
Currently we haven't submitted this fix yet given that we are still waiting on a response from the head of the collaborators.