[gui-talk] quick and dirty regular expressions: was free application for converting BRF to word files

Dean Martineau dean at topdotenterprises.com
Sat Jun 16 05:35:29 UTC 2012


OK, I'll give a short introduction to a fairly long subject.  There is much
more to regular expressions than I know.  There may well be better ways to
do what I'm doing than what I'll show, but what I'll show you works.

Regular expressions are a way to use specific characters to stand for types
of text.  In English, the regular expression (sometimes shortened to regex)
I'm going to give you means: find all the carriage return characters that
aren't preceded by a period, question mark, exclamation point, right
parentnesis or quote mark and turn those characters into space characters.  

I use EdSharp , a very accessible and powerful gtext editor created by Jamal
Mazrui.  It isn't perfect; it sometimes crashes, so save work frequently.
It is free and pretty-well documented.  You can get it from 

http://www.EmpowermentZone.com/edsetup.exe

You don't have to use it.  Lots of text editors come with regular expression
options.  But there are minor variations among them.  EdSharp uses the .net
engine; others use the Perl engine.  I've gotten used to EdSsharp so I stick
to it.  

If you're using a 32-bit system, EdSharp comes with NFBTrans built in, so if
you open a .brf file, it will back-translate it for you, if it doesn't
crash.  

(It really doesn't crash that often, but it does sometimes.)

So I'll give three levels of instruction here.  First, if all you want to do
is carry out the steps without understanding what it all means, in EdSharp,
with a back-translated document open, press ctrl+shift+r and paste this in:

([^!").?])

Then press tab and type or paste:

$1\n\n

This gives you a paragraph demarcated by a blank line.

At the second level, there's an EdSharp feature, transform Files, invoked by
alt+equals.  With this keystroke, you apply one or more regular expressions
to one or more files.  Here is the EdSharp help on the subject:

The Transform Files command, Alt+Equals, applies a saved set of search and
replace tasks to one or more files -- typically to massage data or
formatting
in predictable ways. EdSharp prompts for the job file containing the regular
expressions to apply. Each task is defined by three lines: (1) a comment
explaining
the operation, (2) the search expression, and (3) the replacement
expression. A blank line separates each task. The current editing window
should contain
the list of files to process, one per line. Such a list could be typed
manually or generated via the Path List command (Control+Shift+P). If a file
does
not include a leading path, the prior one is assumed. An intervening dialog
lets you test what changes would occur without actually performing them. In
either case, you can subsequently use the Review Output command,
Alt+Shift+F5, to examine the change log.

Here is the content of a sample transform job that defines two tasks:

[Begin Content of TrimLine.job]
Remove leading space or tab characters from each line
(\A|\n)( |\t)+
$1

Remove trailing space or tab characters from each line
( |\t)+(\r|\Z)
$2
[End Content of TrimLine.job]


Finally, what does the regular expression mean?  We'll take a folksy look at
it from the outside in

We use parentheses for grouping or capturing.  In this case, we're
capturing.  This powerful little expression is going to find all sorts of
characters followed by carriage returns.  I want it to remove the carriage
return, but keep the neighboring character, so I use parentheses, and I can
refer to the group later with the expression $1.  

Brackets let you designate a string of characters to act on.  You can often
use hyphens in the string.  For instance, a-z means any and all lowercase
letters, A-Z means any and all uppercase, and 0-9 means any and all  digits.
Unfortunately, with what we want to do, there isn't a handy grouping, so we
just list the characters in ANSI order.  

The caret character turns this into a negative group.  It means: find
anything not included in this group.  So the period character is part of the
group.  The engine will find, say, a letter t followed by a carriage return,
and it will remove that carriage return and leave the t, but when it finds a
period followed by a carriage return, it will leave that carriage return in
the document 

The backslash n stands for a carriage return character.  There are lots of
pairs of a backslash followed by a character that stand for important
characters or groups of characters.

In the replacement string, $1 means the first string in parentheses.  We
only had one.  Whatever was found in the instructions found in that string,
it will be included in the replacement text.

Well, that may be enough to make your head spin if you're even still
reading.

There are lots of documents teaching regular expressions on the Internet.  A
few are even intelligible, but I don't know which.  Bookshare has good books
on them too.  

Dean




-----Original Message-----
From: gui-talk-bounces at nfbnet.org [mailto:gui-talk-bounces at nfbnet.org] On
Behalf Of Humberto Avila
Sent: Friday, June 15, 2012 2:04 PM
To: 'Discussion of the Graphical User Interface, GUI Talk Mailing List'
Subject: Re: [gui-talk] free application for converting BRF to word files

Hello, what is f sharp? Where can I get it? Also, how do I fix the lines and
use regular expressions you said, in word? This might bea a solution worth
for me, Since I can not afford the $625.00 or so cost of DBT. Until I can
afford, I want to learn how to do this.

-----Original Message-----
From: gui-talk-bounces at nfbnet.org [mailto:gui-talk-bounces at nfbnet.org] On
Behalf Of Dean Martineau
Sent: Friday, June 15, 2012 1:41 PM
To: Discussion of the Graphical User Interface, GUI Talk Mailing List
Cc: <GUI-talk at nfbNet.org>
Subject: Re: [gui-talk] free application for converting BRF to word files

Well, unfortunately, you do get what you pay for. You won't find anything
free which is comparable to the $600 Duxbury program. 

My quick and dirty solution to these problems is to use regular expressions.
I run the back-translated file through Ed-Sharp, though one could use many
text editors. There I can strip out any carriage return not followed by a
capital letter, and/or not preceded by one of several sentence-ending
punctuation marks. If the brf file is marked with indented paragraphs, this
is easier. These results aren't perfect, but they makever for a more
readable file. You can also use these techniques  to strip unwanted line
breaks from the brf file before back translating it. 

Dean

On Jun 15, 2012, at 13:16, "Humberto Avila" <avila.bert.humberto2 at gmail.com>
wrote:

> Hi all.
> 
> 
> 
> Does anybody on this list know of an application for windows xp or windows
7
> that can accurately convert .brf files into .doc or .DOCX files and that
is
> as good as Duxbury braille translator? I found some applications that can
> translate word and text files into .brf, and even an online service that
can
> do the text to braille conversion, but I need backwards. I need to be able
> to read, if not, convert .brf files into Microsoft word documents with no
> problem. 
> 
> I find the free nfbTrans and winBT programs less compelling for me because
> they only translate .brf files into .txt text files and the lines of the
> resulting .txt files are very shortened and sometimes run together or many
> blank lines.
> 
> 
> 
> Any suggestions are welcome.
> 
> 
> 
> Sincerely,
> 
> Humberto
> 
> _______________________________________________
> gui-talk mailing list
> gui-talk at nfbnet.org
> http://nfbnet.org/mailman/listinfo/gui-talk_nfbnet.org
> To unsubscribe, change your list options or get your account info for
gui-talk:
>
http://nfbnet.org/mailman/options/gui-talk_nfbnet.org/dean%40topdotenterpris
es.com

_______________________________________________
gui-talk mailing list
gui-talk at nfbnet.org
http://nfbnet.org/mailman/listinfo/gui-talk_nfbnet.org
To unsubscribe, change your list options or get your account info for
gui-talk:
http://nfbnet.org/mailman/options/gui-talk_nfbnet.org/avila.bert.humberto2%4
0gmail.com


_______________________________________________
gui-talk mailing list
gui-talk at nfbnet.org
http://nfbnet.org/mailman/listinfo/gui-talk_nfbnet.org
To unsubscribe, change your list options or get your account info for
gui-talk:
http://nfbnet.org/mailman/options/gui-talk_nfbnet.org/dean%40topdotenterpris
es.com





More information about the GUI-Talk mailing list