This post is also available in: 日本語 (Japanese)

When dealing with finance data in python3, when I try to convert it to float type with the astype(float) function, an error may occasionally occur due to the minus sign in the data.

ValueError: could not convert string to float: '−12'

This is because there are some symbols that represent minus, so I wrote how to deal with it.

Contents

## There are several symbols that represent the minus sign.

The minus sign entered from the familiar keyboard becomes the following symbol.

It seems to be called **HYPHEN-MINUS (U+002D)**.

import unicodedata # HYPHEN-MINUS print(unicodedata.name("-")) # HYPHEN-MINUS print("-".encode('utf-8')) # b'-' print(b'-'.decode('utf-8')) # -

However, there are some negative signs.

The following seems to be called **MINUS SIGN (U+2212)**.

import unicodedata # MINUS SIGN print(unicodedata.name("−")) # MINUS SIGN print("−".encode('utf-8')) # b'\xe2\x88\x92' print(b'\xe2\x88\x92'.decode('utf-8')) # −

If you try to convert a number with MINUS SIGN(U+2212) to float type with the astype(float) function, you will get an error **"ValueError: could not convert string to float" **.

## Sample code to convert a number with a minus sign(MINUS SIGN) to float type

This is a sample code that replaces sample data with "MINUS SIGN" as a minus sign with "HYPHEN-MINUS" and converts it to float type.

I think most financial data have percentages, so the sample data also has percentages.

import pandas as pd # Sample data with MINUS SIGN sample = [["−12%","10%","0%"],["−8%","−4%","5%"]] df = pd.DataFrame(sample) print(df) """ 0 1 2 0 −12% 10% 0% 1 −8% −4% 5% """ # Loop through column names in order # Replace % and MINUS SIGN, and convert to float type for i in df.columns: df[i] = df[i].str.replace('%','').str.replace('−','-').astype(float) print(df) """ 0 1 2 0 -12.0 10.0 0.0 1 -8.0 -4.0 5.0 """No tags for this post.