python:How to handle an error when converting a number with a minus to a float

Published on:
Last updated:

This post is also available in: 日本語 (Japanese)

When dealing with finance data in python3, when I try to convert it to float type with the astype(float) function, an error may occasionally occur due to the minus sign in the data.

ValueError: could not convert string to float: '−12'

This is because there are some symbols that represent minus, so I wrote how to deal with it.

There are several symbols that represent the minus sign.

The minus sign entered from the familiar keyboard becomes the following symbol.
It seems to be called HYPHEN-MINUS (U+002D).

import unicodedata
print("-")) # HYPHEN-MINUS
print("-".encode('utf-8')) # b'-'
print(b'-'.decode('utf-8')) # -

However, there are some negative signs.
The following seems to be called MINUS SIGN (U+2212).

import unicodedata
print("−")) # MINUS SIGN
print("−".encode('utf-8')) # b'\xe2\x88\x92'
print(b'\xe2\x88\x92'.decode('utf-8')) # −

If you try to convert a number with MINUS SIGN(U+2212) to float type with the astype(float) function, you will get an error "ValueError: could not convert string to float" .

Sample code to convert a number with a minus sign(MINUS SIGN) to float type

This is a sample code that replaces sample data with "MINUS SIGN" as a minus sign with "HYPHEN-MINUS" and converts it to float type.
I think most financial data have percentages, so the sample data also has percentages.

import pandas as pd

# Sample data with MINUS SIGN
sample = [["−12%","10%","0%"],["−8%","−4%","5%"]]
df = pd.DataFrame(sample)
      0    1   2
0  −12%  10%  0%
1   −8%  −4%  5%

# Loop through column names in order
# Replace % and MINUS SIGN, and convert to float type
for i in df.columns:
	df[i] = df[i].str.replace('%','').str.replace('−','-').astype(float)
      0     1    2
0 -12.0  10.0  0.0
1  -8.0  -4.0  5.0
No tags for this post.

Kuniyoshi Takemoto is the founder of LLC, and editor of this blog( more and follow me on LinkedIn.