Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #2217 > unrolled thread

Refactoring exercise.

Started byRoedy Green <see_website@mindprod.com.invalid>
First post2012-11-03 01:37 -0700
Last post2012-11-04 09:32 -0800
Articles 5 — 3 participants

Back to article view | Back to comp.lang.java.help


Contents

  Refactoring exercise. Roedy Green <see_website@mindprod.com.invalid> - 2012-11-03 01:37 -0700
    Re: Refactoring exercise. Roedy Green <see_website@mindprod.com.invalid> - 2012-11-03 08:40 -0700
    Re: Refactoring exercise. markspace <-@.> - 2012-11-03 11:26 -0700
      Re: Refactoring exercise. Roedy Green <see_website@mindprod.com.invalid> - 2012-11-04 06:57 -0800
        Re: Refactoring exercise. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-04 09:32 -0800

#2217 — Refactoring exercise.

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-11-03 01:37 -0700
SubjectRefactoring exercise.
Message-ID<ddl9989u8pc6vq1r0vaduqp8iamvcurn25@4ax.com>
This is a beginner's exercise in refactoring.

Below is posted the text for Amper.
You can get any other code it needs or find out more about what it is
for from http://mindprod.com/products1.html#AMPER

Notice how similar the methods ampifyPossiblyCommentedString and
ampifyPossiblyScriptedString are.

Your task is to refactor out that commonality in handling of text
inside/outside markers so that the logic appears only once.

Hint: see http://mindprod.com/jgloss/callback.html

This is aimed at newbies.  If you are an old-timer, see if you could
come up with something clever to make the code more readable and
terse. 

You don't have to show a complete working program, just the refactored
logic.

/*
 * [Amper.java]
 *
 * Summary: amper, converts invalid & to &amp; in html.
 *
 * Copyright: (c) 1999-2012 Roedy Green, Canadian Mind Products,
http://mindprod.com
 *
 * Licence: This software may be copied and used freely for any
purpose but military.
 *          http://mindprod.com/contact/nonmil.html
 *
 * Requires: JDK 1.5+
 *
 * Created with: JetBrains IntelliJ IDEA IDE
http://www.jetbrains.com/idea/
 *
 * Version History:
 *  1.1 2006-03-05
 *  1.2 2007-03-26 fix bug in StripEntities. Was not doing &#xffff;
properly.
 *  1.3 2007-04-07 recover from crash. Tidy code.
 *  1.4 2007-05-10 add icon, PAD file.
 *  1.5 2007-06-29 add -q command line support. New CommandLine
interface.
 *  1.6 2008-08-03 change detail parameter so that you can request
three levels of detail, rather than two.
 *  1.7 2012-01-25 now handles HTML5 entities. It now leaves any
unusual entities as is.
 *  1.8 2012-02-09 fix bug. Now handles even very longest HTML5
entities. No longer extends DeEntifyStrings.
 *  1.9 2012-06-18 allow you to ampify .htm and .csv files
 *  2.0 2012-11-03 deal text inside <script is no longer ampified.
 *                 new methods ampifyPossiblyScriptedString(String)
ampifyPossiblyCommentedString(String)
 *                 deprecated ampifyCommented.
 */
package com.mindprod.amper;

import com.mindprod.commandline.CommandLine;
import com.mindprod.common11.Misc;
import com.mindprod.filter.AllButSVNDirectoriesFilter;
import com.mindprod.filter.ExtensionListFilter;
import com.mindprod.hunkio.HunkIO;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import static java.lang.System.out;

/**
 * amper, converts invalid & to &amp; in html.
 * <p/>
 *
 * @author Roedy Green, Canadian Mind Products
 * @version 2.0 2012-11-03 deal text inside <script is no longer
ampified.
 *                 new methods ampifyPossiblyScriptedString(String)
ampifyPossiblyCommentedString(String)
 *                 deprecated ampifyCommented.
 * @noinspection WeakerAccess
 * @since 1999
 */
public final class Amper
    {
    // ------------------------------ CONSTANTS
------------------------------

    /**
     * true if want extra debug output
     */
    private static final boolean DEBUGGING = false;

    /**
     * Longest an HTML 5 entity can be, at least in our tables,
including the lead & and trail ;.
     */
    static final int LONGEST_HTML5_ENTITY =
"&CounterClockwiseContourIntegral;".length();

    /**
     * undisplayed copyright notice.
     *
     * @noinspection UnusedDeclaration
     */
    public static final String EMBEDDED_COPYRIGHT =
            "Copyright: (c) 1999-2012 Roedy Green, Canadian Mind
Products, http://mindprod.com";

    /**
     * date this version released.
     *
     * @noinspection UnusedDeclaration
     */
    private static final String RELEASE_DATE = "2012-06-18";

    /**
     * how to use the command line
     */
    private static final String USAGE = "Amper needs a filename.html
or a space-separated list of filenames, " +
                                        "with optional -s -q -v
switches";

    /**
     * embedded version string.
     *
     * @noinspection UnusedDeclaration
     */
    public static final String VERSION_STRING = "1.9";

    /**
     * pattern to detect entity less lead & with trail ;   alpha, &#x
hex or &# numeric
     */
    private static final Pattern ENTITY_PATTERN = Pattern.compile(
"\\p{Alnum}{2," + ( LONGEST_HTML5_ENTITY - 2 ) +
 "};|#x[0-9a-fA-F]{1,8};|#\\p{Digit}{1,10};" );

    // -------------------------- PUBLIC STATIC METHODS
--------------------------

    /**
     * convert all & except ones in comments to &amp;.
     *
     * @param big string possibly containing & and comments, but no
<scripts
     *
     * @return compacted string.
     * @noinspection WeakerAccess
     * @see #ampifyPossiblyScriptedString(String)
     * @see #ampifyPossiblyCommentedString(String)
     * @see #ampifyUncommentedString(String)
     * @deprecated renamed to ampifyPossiblyCommentedString . You
probably really want
     *             ampifyPossiblyScriptedString.
     */
    public static String ampifyCommented( String big )
        {
        return ampifyPossiblyCommentedString( big );
        }

    /**
     * fix amps in one file.
     *
     * @param fileBeingProcessed the file currently being processed.
     * @param detail             0=out output at all, 1=just files
changed, 2=all files.
     *
     * @throws IOException if trouble reading or writing file
     * @noinspection SameParameterValue, WeakerAccess
     * @see #ampifyPossiblyScriptedString(String)
     * @see #ampifyPossiblyCommentedString(String)
     * @see #ampifyUncommentedString(String)
     */
    public static void ampifyFile( File fileBeingProcessed,
                                   int detail ) throws IOException
        {
        if ( !( fileBeingProcessed.getName().endsWith( ".html" )
                || fileBeingProcessed.getName().endsWith( ".htm" )
                || fileBeingProcessed.getName().endsWith( ".csv" ) ) )
            {
            out.println( "Cannot amp: "
                         + fileBeingProcessed.getName()
                         + "not .html .htm .csv file" );
            return;
            }
        String big = HunkIO.readEntireFile( fileBeingProcessed );
        String result = ampifyPossiblyScriptedString( big );
        if ( result.equals( big ) )
            {
            // nothing changed. No need to write results.
            if ( detail >= 2 )
                {
                out.println( "- " + fileBeingProcessed.getName() );
                }
            return;
            }
        // generate output into a temporary file until we are sure all
is ok.
        // create a temp file in the same directory as filename
        if ( detail >= 1 )
            {
            // it changed
            out.println( "* " + fileBeingProcessed.getName() );
            }
        final File tempFile = HunkIO.createTempFile( "temp", ".tmp",
fileBeingProcessed );
        FileWriter emit = new FileWriter( tempFile );
        emit.write( result );
        emit.close();
        // successfully created output in same directory as input,
        // Now make it replace the input file.
        Misc.deleteAndRename( tempFile, fileBeingProcessed );
        }

    /**
     * convert all & except ones in comments to &amp;.
     *
     * @param big string possibly containing & and comments, but no
<scripts
     *
     * @return tidied string.
     * @noinspection WeakerAccess
     * @see #ampifyPossiblyScriptedString(String)
     * @see #ampifyUncommentedString(String)
     */
    public static String ampifyPossiblyCommentedString( String big )
        {
        int originalLength = big.length();
        final StringBuilder sb = new StringBuilder( originalLength );
        // indexes which character we are working on
        int i = 0;
        while ( i < originalLength )
            {
            // search for start of comment
            int startCommentPlace = big.indexOf( "<!--", i );
            if ( startCommentPlace < 0 )
                {
                // no more comments, finish off this last chunk
                sb.append( ampifyUncommentedString( big.substring( i,
                        originalLength ) ) );
                break;
                }
            // we found the start of a comment
            // process html in front of comment, possibly empty
            sb.append( ampifyUncommentedString( big.substring( i,
                    startCommentPlace ) ) );
            // find the end of comment
            int endCommentPlace =
                    big.indexOf( "-->", startCommentPlace +
"<!--".length() );
            if ( endCommentPlace < 0 )
                {
                throw new IllegalArgumentException( "missing --> on a
comment" );
                }
            endCommentPlace += "-->".length();
            String commentText = big.substring( startCommentPlace,
endCommentPlace );
            // make sure the comments not malformed. Should be no
embedded start
            // comment marker
            String commentGuts = commentText.substring(
"<!--".length(), commentText.length() - "-->".length() );
            if ( commentGuts.contains( "<!--" ) )
                {
                throw new IllegalArgumentException( "<!-- ... --> not
balanced" );
                }
            // output the comment unchanged
            sb.append( commentText );
            i = endCommentPlace;
            }// end while
        return sb.toString();
        }

    /**
     * convert all & except ones in comments or inside <script to
&amp;.
     *
     * @param big string possibly containing & and comments and
<scripts
     *
     * @return tidied string.
     * @noinspection WeakerAccess
     * @see #ampifyPossiblyCommentedString(String)
     * @see #ampifyUncommentedString(String)
     */
    public static String ampifyPossiblyScriptedString( String big )
        {
        int originalLength = big.length();
        final StringBuilder sb = new StringBuilder( originalLength );
        // indexes which character we are working on
        int i = 0;
        while ( i < originalLength )
            {
            // search for start of <script
            int startScriptPlace = big.indexOf( "<script", i );
            if ( startScriptPlace < 0 )
                {
                // no more scripts, finish off this last chunk
                sb.append( ampifyPossiblyCommentedString(
big.substring( i,
                        originalLength ) ) );
                break;
                }
            // we found the start of a <script
            // process html in front of <script, possibly empty
            sb.append( ampifyPossiblyCommentedString( big.substring(
i,
                    startScriptPlace ) ) );
            // find the end of script
            int endScriptPlace =
                    big.indexOf( "</script>", startScriptPlace +
"</script>".length() );
            if ( endScriptPlace < 0 )
                {
                throw new IllegalArgumentException( "missing
</script>" );
                }
            endScriptPlace += "</script>".length();
            String scriptText =
                    big.substring( startScriptPlace, endScriptPlace );
            // make sure the <scripts not malformed. Should be no
embedded start  marker
            String scriptGuts = scriptText.substring(
"script".length(), scriptText.length() - "</script>".length() );
            if ( scriptGuts.contains( "script" ) )
                {
                throw new IllegalArgumentException( "<script ...
</script> not balanced" );
                }
            // output the script unchanged
            sb.append( scriptText );
            i = endScriptPlace;
            }// end while
        return sb.toString();
        }

    /**
     * convert all & to &amp; unless it has been done already. Leaves
existing
     * entities as is.
     *
     * @param chunk the string to process
     *
     * @return tidied string
     * @noinspection WeakerAccess
     * * @see #ampifyPossiblyScriptedString(String)
     * @see #ampifyPossiblyCommentedString(String)
     */
    public static String ampifyUncommentedString( String chunk )
        {
        // do a quick check. If chunk contains no &, we have nothing
to do,
        // guaranteed
        if ( !chunk.contains( "&" ) )
            {
            return chunk;
            }
        int length = chunk.length();
        final StringBuilder sb2 = new StringBuilder( length + 20 );
        int i = 0;
        while ( i < length )
            {
            int ampPlace = chunk.indexOf( "&", i );
            if ( ampPlace < 0 )
                {
                // all done, copy over the remaining chunk.
                sb2.append( chunk.substring( i, length ) );
                // don't need to increment i
                break;
                }
            // we found an &
            // copy over stuff before the & we just found
            sb2.append( chunk.substring( i, ampPlace ) );
            i = ampPlace;
            // is it an &amp; or &lt; or some other entity already?

            // get string without lead & but with trailing ; if it
exists.
            final String candidate = chunk.substring( i + 1, Math.min(
i + LONGEST_HTML5_ENTITY, length ) );

            final Matcher m = ENTITY_PATTERN.matcher( candidate );
            // quick test.  Just check pattern starting just after &
            if ( m.lookingAt() )
                {
                // this was an entity already, leave it alone.
                sb2.append( '&' );
                }
            else
                {
                // convert & to &amp;
                sb2.append( "&amp;" );
                }
            i++;
            }// end while
        return sb2.toString();
        }

    // --------------------------- CONSTRUCTORS
---------------------------

    /**
     * constructor, not used.
     *
     * @noinspection WeakerAccess
     */
    private Amper()
        {
        }

    // --------------------------- main() method
---------------------------

    /**
     * fixes ampersands in HTML files.
     *
     * @param args names of files to process, dirs, files, -s, *.*, no
wildcards.
     */
    public static void main( String[] args )
        {
        if ( DEBUGGING )
            {
            out.println( ENTITY_PATTERN.toString() );
            }
        // gather all the files mentioned on the command line.
        // either directories, files, with -s and subdirs option.
        // warning. Windows expands any wildcards in a nasty way.
        // do not use wildcards.
        // See http://mindprod.com/jgloss/wildcard.html
        out.println( "Gathering html files to &ampify..." );
        CommandLine commandLine = new CommandLine( args,
                new AllButSVNDirectoriesFilter(),
                new ExtensionListFilter( "html", "htm", "csv" ) );
        if ( commandLine.size() == 0 )
            {
            throw new IllegalArgumentException( "No files found to
process\n" + USAGE );
            }
        final boolean quiet = commandLine.isQuiet();
        for ( File file : commandLine )
            {
            try
                {
                // -q gives no output at all, otherwise just files
that changed.
                ampifyFile( file, quiet ? 0 : 1 );
                }
            catch ( FileNotFoundException e )
                {
                out.println( "Error: "
                             + file.getAbsolutePath()
                             + " not found." );
                }
            catch ( Exception e )
                {
                out.println( e.getMessage()
                             + " in file "
                             + file.getAbsolutePath() );
                }
            }// end for
        }// end main
    }



-- 
Roedy Green Canadian Mind Products http://mindprod.com
Ironically, even though the Internet was created by the US military 
[DARPA (Defense Advanced Research Projects Agency)]
to withstand a nuclear attack, it is almost defenceless against malice
from any of its users

[toc] | [next] | [standalone]


#2218

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-11-03 08:40 -0700
Message-ID<jiea98hulrmjaieksldham6m9e94npcs3p@4ax.com>
In reply to#2217
On Sat, 03 Nov 2012 01:37:20 -0700, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>This is a beginner's exercise in refactoring.

hint. This is an exercise whose point is both to demonstrate the key
values of refactoring as well as well as practicing the mechanics of
refactoring.
-- 
Roedy Green Canadian Mind Products http://mindprod.com
Ironically, even though the Internet was created by the US military 
[DARPA (Defense Advanced Research Projects Agency)]
to withstand a nuclear attack, it is almost defenceless against malice
from any of its users

[toc] | [prev] | [next] | [standalone]


#2219

Frommarkspace <-@.>
Date2012-11-03 11:26 -0700
Message-ID<k73nk0$o85$1@dont-email.me>
In reply to#2217
On 11/3/2012 1:37 AM, Roedy Green wrote:

>   * Version History:

Stylistically I don't like including the version history in a source 
file.  It's fine at first when you only have a dozen edits or so but it 
quickly grows unmanageably to several hundred edits.  Just the version 
of the file is fine; if you need the history you can request it from the 
SCCS.


[toc] | [prev] | [next] | [standalone]


#2224

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-11-04 06:57 -0800
Message-ID<j90d989bvkigf1gbfhmbuk1smn04paqk40@4ax.com>
In reply to#2219
On Sat, 03 Nov 2012 11:26:06 -0700, markspace <-@.> wrote, quoted or
indirectly quoted someone who said :

>
>Stylistically I don't like including the version history in a source 
>file.  It's fine at first when you only have a dozen edits or so but it 
>quickly grows unmanageably to several hundred edits.  Just the version 
>of the file is fine; if you need the history you can request it from the 
>SCCS.

I distribute all my source both as zips and as access to my
repository.

I find it useful when I do global searching in the IDE to have it scan
this information. 

The big problem with the way I do  things is there really should be
levels of version history, for the method, class, project, with only
one definitive copy that can't get out of sync.

Thankfully Intellij IDE is smart enough to most of the time hide that
giant version history. 
-- 
Roedy Green Canadian Mind Products http://mindprod.com
Ironically, even though the Internet was created by the US military 
[DARPA (Defense Advanced Research Projects Agency)]
to withstand a nuclear attack, it is almost defenceless against malice
from any of its users

[toc] | [prev] | [next] | [standalone]


#2225

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-11-04 09:32 -0800
Message-ID<EExls.4295$ND1.4268@newsfe08.iad>
In reply to#2224
On 11/4/12 6:57 AM, Roedy Green wrote:
> On Sat, 03 Nov 2012 11:26:06 -0700, markspace <-@.> wrote, quoted or
> indirectly quoted someone who said :
>
>>
>> Stylistically I don't like including the version history in a source
>> file.  It's fine at first when you only have a dozen edits or so but it
>> quickly grows unmanageably to several hundred edits.  Just the version
>> of the file is fine; if you need the history you can request it from the
>> SCCS.
>
> I distribute all my source both as zips and as access to my
> repository.
>
> I find it useful when I do global searching in the IDE to have it scan
> this information.
>
> The big problem with the way I do  things is there really should be
> levels of version history, for the method, class, project, with only
> one definitive copy that can't get out of sync.
>
> Thankfully Intellij IDE is smart enough to most of the time hide that
> giant version history.
>
Consider using OpenGrok for code searches which require history. It is 
an excellent tool. I use it frequently.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.help


csiph-web