Pandas Series: to_sql() function

Last update on September 15 2022 12:57:02 (UTC/GMT +8 hours)

Series-to_sql() function

The to_sql() function is used to write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.

Syntax:

Series.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

Parameters:

Name	Description	Type/Default Value	Required / Optional
name	Name of SQL table.	string	Required
con	Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects.	sqlalchemy.engine.Engine or sqlite3.Connection	Required
schema	Specify the schema (if database flavor supports this). If None, use default schema.	string	Optional
if_exists	How to behave if the table already exists. fail: Raise a ValueError. replace: Drop the table before inserting new values. append: Insert new values to the existing table.	{'fail', 'replace', 'append'}, default 'fail'	Required
index	Write DataFrame index as a column. Uses index_label as the column name in the table.	bool, default True	Required
index_lable	Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.	string or sequence, default None	Required
chunksize	Rows will be written in batches of this size at a time. By default, all rows will be written at once.	int	Optional/td>
dtype	Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode.	dict	Optional
method	Controls the SQL insertion clause used: None : Uses standard SQL INSERT clause (one per row). 'multi': Pass multiple values in a single INSERT clause. callable with signature (pd_table, conn, keys, data_iter). Details and a sample callable implementation can be found in the section insert method.	{None, ‘multi’, callable}, default None	Required
dropna	If true, ALL nan rows will not be written to store.	bool, default False	Required
errors	Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.	str, default ‘strict’	Required

Raises: ValueError
When the table already exists and if_exists is 'fail' (the default).

Notes Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.

Example - Create an in-memory SQLite database:

Create a table from scratch with 3 rows:

Python-Pandas Code:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df = pd.DataFrame({'name' : ['User A', 'User B', 'User C']})
df

Output:

   name
0	User A
1	User B
2	User C

Python-Pandas Code:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df = pd.DataFrame({'name' : ['User A', 'User B', 'User C']})
df.to_sql('users', con=engine)
engine.execute("SELECT * FROM users").fetchall()

Output:

[(0, 'User A'), (1, 'User B'), (2, 'User C')]

Python-Pandas Code:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine
df1 = pd.DataFrame({'name' : ['User D', 'User E']})
df1.to_sql('users', con=engine, if_exists='append')
engine.execute("SELECT * FROM users").fetchall()

Output:

[(0, 'User A'), (1, 'User B'), (2, 'User C'), (0, 'User D'), (1, 'User E')]

Example - Overwrite the table with just df1:

Python-Pandas Code:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine
df1 = pd.DataFrame({'name' : ['User D', 'User E']})
df1.to_sql('users', con=engine, if_exists='append')
df1.to_sql('users', con=engine, if_exists='replace',
           index_label='id')
engine.execute("SELECT * FROM users").fetchall()

Output:

[(0, 'User D'), (1, 'User E')]

Example - Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars:

Python-Pandas Code:

import numpy as np
import pandas as pd
df = pd.DataFrame({"X": [2, None, 3]})
df

Output:

   X
0	2.0
1	NaN
2	3.0

Python-Pandas Code:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.types import Integer
df.to_sql('integers', con=engine, index=False,
          dtype={"X": Integer()})
engine.execute("SELECT * FROM integers").fetchall()

Output:

[(2,), (None,), (3,)]

Previous: Series-to_hdf() function
Next: Series-to_json() function