<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Monzool's Personal Publishing &#187; Regular Expressions</title>
	<atom:link href="http://monzool.net/blog/category/programming/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://monzool.net/blog</link>
	<description>a/ Jan Skriver Sørensen</description>
	<lastBuildDate>Wed, 09 May 2012 19:37:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Help On Python Regular Expressions</title>
		<link>http://monzool.net/blog/2007/10/15/help-on-python-regular-expressions/</link>
		<comments>http://monzool.net/blog/2007/10/15/help-on-python-regular-expressions/#comments</comments>
		<pubDate>Mon, 15 Oct 2007 20:38:21 +0000</pubDate>
		<dc:creator>monzool</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://monzool.net/blog/2007/10/15/help-on-python-regular-expressions/</guid>
		<description><![CDATA[REGULAR EXPRESSIONS ARE a powerful friend, but the friendship doesn&#8217;t come easy. Regular expressions can be somewhat baffling getting a grasp on, but when finally understood, the possibilities are almost endless. When developing the searching expression used in HTML Parsing With Beautiful Soup I realized that my regular expression knowledge had gotten a bit rusty. [...]]]></description>
			<content:encoded><![CDATA[<p><strong>REGULAR EXPRESSIONS ARE</strong> a powerful friend, but the friendship doesn&#8217;t come easy. <a href="http://www.regular-expressions.info/" title="Regular Expression information">Regular expressions</a> can be somewhat baffling getting a grasp on, but when finally understood, the possibilities are almost endless.</p>
<p>When developing the searching expression used in <a href="http://monzool.net/blog/2007/10/15/html-parsing-with-beautiful-soup/" class="locallink">HTML Parsing With Beautiful Soup</a> I realized that my regular expression knowledge had gotten a bit rusty. Fortunately I had double-up on the luck. 1) It was a Python program, hence the Python shell was available. 2) I found <a href="http://en.wikipedia.org/wiki/David_Mertz">David Mertz</a>&#8216;s book <a href="http://gnosis.cx/TPiP/">Text Processing in Python</a>.
</p>
<p>The Python shell makes it easy to experiment and tweak any regular expressions on the fly, but the downside is that its not easy to visually evaluate the outcome of your current expression. David&#8217;s book helped two folds. It has extensive theory on Python regular expression syntax, but most superhero-like is the small function provided, that makes it possible to see the outcome of an evaluated expression.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># Credits: David Mertz</span>
<span style="color: #ff7700;font-weight:bold;">def</span> re_show<span style="color: black;">&#40;</span>pat, s<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span> pat, <span style="color: #dc143c;">re</span>.<span style="color: black;">M</span> <span style="color: black;">&#41;</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span> <span style="color: #483d8b;">&quot;{<span style="color: #000099; font-weight: bold;">\g</span>&lt;0&gt;}&quot;</span>, s.<span style="color: black;">rstrip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: black;">&#41;</span>, <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span></pre></div></div>

<p>
<p class="section">Using regular expressions in Python requires importing of the regular expression libirary.</p>
</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span></pre></div></div>

<p>
<p>If using the Python shell just enter the same in the shell prompt. The function by David Mertz can also be typed directly into the shell</p>
</p>
<div class="dotbox">
<pre>
>>> import re
>>> def re_show(pat, s):
...    print re.compile( pat, re.M ).sub( "{\g<0>}", s.rstrip() ), '\n'
>>>
</pre>
</div>
<p>
<p>The <code>re_show</code> wrapper displays the source and emphasizes the result of the expression, as being the contents between the &#8216;{&#8216; and &#8216;}&#8217; pair.
</p>
<p>
<p class="section">Next is creation of some example text on which to experiment.</p>
</p>
<div class="dotbox">
<pre>
>>> s = 'if (Hulk.color != "green"): print "Grey Hulk"'
</pre>
</div>
<p>
<p class="section">Now the experiments can begin. The following searches for everything between the first &#8216;(&#8216; to the last &#8216;)&#8217;.</p>
</p>
<div class="dotbox">
<pre>
>>> re_show(r'\(.*\)', s)
</pre>
</div>
<p>Result: </p>
<p>
  <code>if</code><strong><code>{</code></strong><code>(Hulk.color != "green")</code><strong><code>}</code></strong><code>: print "Grey Hulk"</code>
  </p>
</p>
<p class="section">
<p>Another example could be an case-insensitive match on the colors of Hulk.</p>
<div class="dotbox">
<pre>
>>> re_show(r'(?i)green|(?i)grey', s)
</pre>
</div>
<div></div>
<p>Result: </p>
<p>
  <code>if (Hulk.color != "</code><strong><code>{</code></strong><code>green</code><strong><code>}</code></strong><code>"): print "</code><strong><code>{</code></strong><code>Grey</code><strong><code>}</code></strong><code>Hulk"</code>
 </p>
</p>
<p class="section">This is just at minuscule introduction to the powers of regular expressions. If your into regular expressions in Python, I highly recommend to buying the book &#8211; or donate and read it online.</p>
]]></content:encoded>
			<wfw:commentRss>http://monzool.net/blog/2007/10/15/help-on-python-regular-expressions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

