Pandas:How to replace and converts dtype of columns data

Published on:
Last updated:

This post is also available in: 日本語 (Japanese)

Using pandas, I wrote a sample code as a memo that replaces the comma(,) in numbers with the replace() function and then converts the data type.

As I wrote in the comments in the sample code, if the data type of the Series to be replaced is str, you can use the '.str' accessor, and if it is a float, you can use replace() directly.
When I use it occasionally, it is a specification that I have forgotten.

import pandas as pd

# Sample data
sample_list = {'sampleA':["1,000","2,000","3,000"],
               'sampleB':["4,000","5,000","6,000"]}

# Create dataframe
df = pd.DataFrame(sample_list)
print(df)
"""
  sampleA sampleB
0   1,000   4,000
1   2,000   5,000
2   3,000   6,000
"""

# Comfirm data type
print(type(df['sampleA'][0]))
"""
<class 'str'>
"""

# If the data type is str, you can use '.str' accessor
df['sampleA'] = df['sampleA'].str.replace(',','').astype(float)
print(df)
"""
   sampleA sampleB
0   1000.0   4,000
1   2000.0   5,000
2   3000.0   6,000
"""

# If the data type is float, you can use replace function directly
df['sampleA'] = df['sampleA'].replace(1000, 0)
print(df)
"""
   sampleA sampleB
0      0.0   4,000
1   2000.0   5,000
2   3000.0   6,000
"""

About
Kuniyoshi Takemoto is the founder of Amelt.net LLC, and editor of this blog(www.amelt.net).Learn more and follow me on LinkedIn.