Remove invalid character from file
I was having trouble opening a .C file. It would open fine with nano
but geany didn't like it.
file -i MAIN.C
would only give me MAIN.C: application/octet-stream; charset=binary
, which is what file
says when it can't recognize the charset.
Detecting the actual charset
Using python chardet :
chardetect MAIN.C
MAIN.C: Windows-1254 with confidence 0.3362399919117238
This shouldn't be a problem for geany, but let's try converting it from "Windows-1254" to "utf8".
Converting the charset
With recode
:
recode Windows-1254..UTF8 MAIN.C
recode: MAIN.C failed: Invalid entry in « CP1254..UTF-8 »
With iconv
:
iconv -f Windows-1254 -t utf-8 -o OUT.C MAIN.C
iconv: illegal input sequence at position 198
So it turns out there's some rogue data at 198 that prevents the file from being interpreted correcty.
Removing non-printable characters from the file
Using sed
:
sed $'s/[^[:print:]\t]//g' MAIN.C > OUT.C
did the trick.
file OUT.C
OUT.C: C source, Non-ISO extended-ASCII text, with LF, NEL line terminators
Source
https://stackoverflow.com/questions/43108359/how-to-remove-all-special-characters-in-linux-text