Wednesday, July 8, 2015

Quick Tip: RegEx strings - changing "/" to "\" with re.sub() in Python

So, here's a quick tip on using Python and Regular Expressions to change a slash ("/") to a backslash ("\").

Here's the code, and I'll explain it afterwards:

>>> re.sub("/", '\\\\', "fooas/dsadsf")
'fooas\\dsadsf'
>>> print re.sub("/", '\\\\', "fooas/dsadsf")
fooas\dsadsf
>>> print re.sub("/", '\\\\', "fooas/dsa/d/sf")
fooas\dsa\d\sf
>>> print re.sub("/", '\\\\', data)
it\in\it

Notice that we're using 4x "\"'s to accomplish what we're trying to do. Now, remember this is Python we're dealing with, so the \ is an escape character for other uses, such as \t (tab), \r (carriage return), and \n (new line). 

With that being said, in Python strings, the representation of the \ character (to print it in the string) is "\\"...that is double backslash. So, when we go to replace one instance of a slash, we have to use 4 backslashes...why? 

Because the string representation in the RegEx needs to be escaped, and then the replaced string needs to be escaped too...more is less. So, 4x the backslashes looks like this "\\" and "\\" which then becomes "\\" in the replaced expression, which then becomes "\" in the string.

This concludes your lesson in confusing syntax for the day. Thanks for playing!

-A

No comments: