Context
I need to clean financial data with mixed formats. The data has been punched in manually by different departments, some of them using "." as decimal and "," as grouping digit (e.g. US notation: $1,000,000.00) while others are using "," as decimal and "." as grouping digit (e.g. notation used in certain European countries: $1.000.000,00).
Input:
Here's a fictional example set:
df <- data.frame(Y2019= c("17.530.000,03","28000000.05", "256.000,23", "23,000",
"256.355.855","2565467,566","225,4534.126")
)
Y2019
1 17.530.000,03
2 28000000.05
3 256.000,23
4 23,000
5 256.355.855
6 2565467,566
7 225,4534.126
Desired result:
Y2019
1 17530000.03
2 28000000.05
3 256000.23
4 23000
5 256355855
6 2565467.566
7 2254534.126
My attempt:
I got pretty close by considering the first occurrence (starting from the right) of "," or "." as the decimal operator and replacing the other occurrences accordingly. However, some entries are without decimals (e.g. entry 4 and 5), rendering this strategy less useful.
Any input is greatly appreciated!