<?xml version="1.0" encoding="utf-8" standalone="yes"?><feed xmlns="http://www.w3.org/2005/Atom">
  <title></title>
  <subtitle></subtitle>
  <id>https://www.endpointdev.com/blog/tags/hl7/</id>
  <link href="https://www.endpointdev.com/blog/tags/hl7/"/>
  <link href="https://www.endpointdev.com/blog/tags/hl7/" rel="self"/>
  <updated>2022-01-27T00:00:00+00:00</updated>
  <author>
    <name>End Point Dev</name>
  </author>
  
    <entry>
      <title>How to use regular expression group quantifiers in PostgreSQL</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2022/01/regex-group-quantifiers-postgresql/"/>
      <id>https://www.endpointdev.com/blog/2022/01/regex-group-quantifiers-postgresql/</id>
      <published>2022-01-27T00:00:00+00:00</published>
      <author>
        <name>Selvakumar Arumugam</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2022/01/regex-group-quantifiers-postgresql/20220123_222150.webp&#34; alt=&#34;Side of brick building with windows and protruding roof, tiered shrubs in foreground, with a small corner of blue sky, clouds, and snowy mountaintop&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Jon Jensen --&gt;
&lt;p&gt;I recently encountered a situation where it was necessary to extract address content from text in HL7 V2 format from a PostgreSQL table&amp;rsquo;s column. The following example is representative:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;||121212^^^2^ID 1|676767||SELVA^KUMAR^^^^|19480203|M||B||123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York||123456-7890|||M|NON|4000|&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In order to manipulate our example, the address section needs to be extracted from the HL7 V2 message PID segment for patient demographic information. The segments have delimiters for fields (&lt;code&gt;|&lt;/code&gt;), components (&lt;code&gt;^&lt;/code&gt;), subcomponents (&lt;code&gt;&amp;amp;&lt;/code&gt;) and repetition (&lt;code&gt;~&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Our example has only fields and components delimited by pipe (&lt;code&gt;|&lt;/code&gt;) and caret (&lt;code&gt;^&lt;/code&gt;). The address contains nine components delimited by &lt;code&gt;^&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I hoped to do this by applying a regular expression (regex) because the address is in a standard format that regex can match with alphanumeric and caret repetition.&lt;/p&gt;
&lt;p&gt;Here is my journey figuring out how to match the data I wanted.&lt;/p&gt;
&lt;h3 id=&#34;regex-pattern-in-grep&#34;&gt;Regex pattern in grep&lt;/h3&gt;
&lt;p&gt;As a test, I got this regex working with the &lt;code&gt;grep&lt;/code&gt; command, which successfully extracts the address section from the content:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ &lt;span style=&#34;color:#369&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;||121212^^^2^ID 1|676767||SELVA^KUMAR^^^^|19480203|M||B||123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York||123456-7890|||M|NON|4000|&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ &lt;span style=&#34;color:#038&#34;&gt;echo&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#369&#34;&gt;$content&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt; | grep -Eo &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;([A-Za-z0-9 #&amp;#39;.,/-]*\^){8}[A-Za-z0-9 ]*&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;123456&lt;/span&gt; SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;postgresql-regex-attempt&#34;&gt;PostgreSQL regex attempt&lt;/h3&gt;
&lt;p&gt;The PostgreSQL &lt;code&gt;regexp_matches&lt;/code&gt; function supports extraction by pattern-matching data from the content. But when I used the same regex pattern with the &lt;code&gt;regexp_matches&lt;/code&gt; function, instead of all eight groups, only the eighth value was returned:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-postgres&#34; data-lang=&#34;postgres&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=# &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;SELECT&lt;/span&gt; regexp_matches(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;||121212^^^2^ID 1|676767||SELVA^KUMAR^^^^|19480203|M||B||123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York||123456-7890|||M|NON|4000|&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;([A-Za-z0-9 #&amp;#39;&amp;#39;.,/-]*\^){8}[A-Za-z0-9 ]*&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;g&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; regexp_matches 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt; {^}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;More generally, our query is returning the &lt;em&gt;N&lt;/em&gt;th matching group instead of returning all matching groups until the &lt;em&gt;N&lt;/em&gt;th regex group. So if we try to fetch the text matching with 3 groups, the quantifier will return the third field of all match sections instead of the third group itself.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-postgres&#34; data-lang=&#34;postgres&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=# &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;SELECT&lt;/span&gt; regexp_matches(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;||121212^^^2^ID 1|676767||SELVA^KUMAR^^^^|19480203|M||B||123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York||123456-7890|||M|NON|4000|&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;([A-Za-z0-9 #&amp;#39;&amp;#39;.,/-]*\^){3}[A-Za-z0-9 ]*&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;g&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   regexp_matches   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt; {^}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; {^}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; {&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;New York City^&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; {USA^}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;plperl-function&#34;&gt;PL/Perl function&lt;/h3&gt;
&lt;p&gt;Since that doesn’t satisfy our requirements, I tried using a Perl regex through my own PL/Perl function, and got the expected answer:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-postgres&#34; data-lang=&#34;postgres&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=# &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;CREATE&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;OR&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;REPLACE&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;FUNCTION&lt;/span&gt; perl_regexp_matches (&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;IN&lt;/span&gt; str &lt;span style=&#34;color:#038&#34;&gt;text&lt;/span&gt;, &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;IN&lt;/span&gt; pattern &lt;span style=&#34;color:#038&#34;&gt;text&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;RETURNS&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;text&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;$$
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    my ($input, $pattern) = @_;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    $output = [$input =~ m/($pattern)/];
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    return $output-&amp;gt;[0]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;$$&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;LANGUAGE&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;plperl&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=# &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;SELECT&lt;/span&gt; perl_regexp_matches(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;||121212^^^2^ID 1|676767||SELVA^KUMAR^^^^|19480203|M||B||123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York||123456-7890|||M|NON|4000|&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;([A-Za-z0-9 #&amp;#39;&amp;#39;.,/-]*\^){8}[A-Za-z0-9 ]*&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    perl_regexp_matches
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;123456&lt;/span&gt; SAMPLE ROAD^^New York City^NY^&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;12345&lt;/span&gt;^USA^H^^New York
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;But I researched further for a simple solution to achieve the result without using a custom function and the PL/Perl extension.&lt;/p&gt;
&lt;h3 id=&#34;postgresql-regex-solution&#34;&gt;PostgreSQL regex solution&lt;/h3&gt;
&lt;p&gt;In Postgres regex syntax, parentheses &lt;code&gt;()&lt;/code&gt; create a numbered capture group which leads to returning the contained matching results.&lt;/p&gt;
&lt;p&gt;To get the entire matching data, the regex should have a question mark and a colon (&lt;code&gt;?:&lt;/code&gt;) added at the beginning of the regex pattern to create a non-capturing group. Then because no group is capturing, instead the complete match is returned:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-postgres&#34; data-lang=&#34;postgres&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=&amp;gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;SELECT&lt;/span&gt; regexp_matches(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;||121212^^^2^ID 1|676767||SELVA^KUMAR^^^^|19480203|M||B||123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York||123456-7890|||M|NON|4000|&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;(?:[A-Za-z0-9 #&amp;#39;&amp;#39;.,/-]*\^){8}[A-Za-z0-9 ]*&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                         regexp_matches                         
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;----------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt; {&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That turns out to be what was happening with my PL/Perl function where &lt;code&gt;m/($pattern)/&lt;/code&gt; captures the entire match, and what &lt;code&gt;grep&lt;/code&gt; was doing because of its option &lt;code&gt;-o&lt;/code&gt; or &lt;code&gt;--only-matching&lt;/code&gt;, which prints the matching part of the lines rather than its default of printing the entire line.&lt;/p&gt;
&lt;p&gt;And we can also use the Postgres function &lt;code&gt;substring&lt;/code&gt; to return the bare text itself instead of arrays as &lt;code&gt;regexp_matches&lt;/code&gt; does:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-postgres&#34; data-lang=&#34;postgres&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=# &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;substring&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;||121212^^^2^ID 1|676767||SELVA^KUMAR^^^^|19480203|M||B||123456 SAMPLE ROAD^^New York City^NY^12345^USA^H^^New York||123456-7890|||M|NON|4000|&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;FROM&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;(?:[A-Za-z0-9 #&amp;#39;&amp;#39;.,/-]*\^){8}[A-Za-z0-9 ]*&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                         &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;substring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;123456&lt;/span&gt; SAMPLE ROAD^^New York City^NY^&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;12345&lt;/span&gt;^USA^H^^New York
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;reference&#34;&gt;Reference&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.hl7.org/about/&#34;&gt;HL7 International website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP&#34;&gt;PostgreSQL POSIX regular expressions documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </content>
    </entry>
  
</feed>
