Advise: A Refactoring Static Analysis Tool

 

 

 

By Jason Scherer

jason.scherer@gmail.com

jps19@columbia.edu

Columbia University

CS4117

Professor Alfred Aho


Abstract

Advise is a static analysis tool for ECMAScript which looks for code which should be refactored.  The current version supports detection of conditional statements which are complex enough to warrant being refactored into boolean functions or variables.  Advise has an extensible architecture based on the Prototype pattern which will allow for the addition of more refactoring rules in future iterations.  An "autorefactoring" feature is also planned, where Advise would interactively refactor code for the user.  Advise is implemented in C# using GPLEX and GPPG, and uses the Prototype design pattern.  Future iterations of the tool will integrate with the Microsoft Phoenix Compiler Framework.

Motivation

The Problem

Refactoring is the process of taking messy, but working, code, and cleaning it up so that it can be easily read and modified by developers.  The first draft of a program is often very clean and well-organized because the developer typically creates a well-thought-out and comprehensive solution to the problem as it is initially conceived.  Over time, however, code often becomes messy because the  initial problem changes (either because requirements change during development, or because it is discovered that the initial concept was incomplete or incorrect).  This often prompts the developer to create short-term fixes and "kludges" that make the code progressively uglier. 

Imagine for a moment the thought process of a developer who is introducing a kludge into the code:

"I forgot about Feature X.  Everything in this system was coded on the assumption that we wouldn't have to handle Feature X.  It's going to take me hours to rewrite everything to handle Feature X elegantly, and we're already behind schedule.  I'll just put in this little 'if' statement here, and be done."

Of course, it is not "done", and this developer will inevitably return to the subject of Feature X repeatedly.  With each kludge, the code becomes increasingly hairy to the point where a complete rewrite is required.  If a consistent refactoring discipline can be maintained in the first place, the problem can be effectively prevented.

The Need for Refactoring

Refactoring has been studied extensively.  It is well understood that maintaining a refactoring discipline during the development process results in easier maintenance and code with fewer bugs.  Martin Fowler, in [4], has compiled a catalog of different refactorings and grouped them into types (such as for organizing data, or for making method calls simpler).  For each one of these refactorings, there is a problem which necessitates the refactoring – whether it is code that is overly complex, exposes class members that should be private, is confusingly structured, etc.

Code that causes a program to execute incorrectly is called a defect.  But what do you call code that performs correctly, but violates one of Advise's refactoring rules?  Fowler, in [4], discusses the concept of "code smells," but I will attempt to create a more graceful analogy.  Leaving code in a messy state often leaves a problem for someone else to clean up.  Therefore, I will refer to these problems as "breaches of code etiquette" or just "breaches."

In some cases, finding breaches of code etiquette requires human intelligence.  For example, a method name that fails to accurately describe what the method does should be “refactored” (in this case, just renamed) to a more descriptive name.   However, in other cases, it is conceivable that a static analysis tool could find breaches in the code and report them to the user.

Enter Advise, the refactoring static analysis tool.

The Tool

Advise is an extensible platform for finding breaches in code etiquette.  It is a static analysis tool which takes a source program, written in ECMAScript, and compiles it to an Abstract Syntax Tree.  Once in AST form, the analysis engine iterates through the entire program looking for breaches of one of the “refactoring rules” supported by the tool.  The set of rules that the tool supports is represented in memory by a list of  prototype suggestion objects” (see Architecture section, below).  When a breach is found, one of the prototype suggestion objects is cloned and added to an output list.  Finally, the output list is dumped and the user receives a message for each breach explaining the problem and recommended change.

Figure 1

 

Additionally, plans exist for a phase two iteration of Advise which will perform source-to-source transformation on the AST data structure to rewrite the offending code in such a way that it is fully refactored and so that the breach is mended.

The decision to refactor is ultimately a human decision, and a compiler’s viewpoint into the deep semantics of a program must necessarily be limited.  We recognize that the criteria used by Advise to detect breaches must necessarily be imperfect.  However, this tool's utility can be increased in the future with a comprehensive set of controls for tool users that will enable them to tune the output of the tool on a per-rule or even per-statement basis.  It is our hope that, if a development team inserts Advise into its project toolchain and heeds the warnings given by our tool, it will approach the maintenance of a consistent refactoring discipline. 

Our Focus: Complex Conditionals

In order to create a proof of concept and to fully investigate the difficulty and inherent problems in creating refactoring rule tests, we have focused in our first version on supporting only one refactoring rule – “Decompose Conditional”.  In this case, the breach we need to detect is the existence of complex conditional statements.  A complex conditional statement qualifies as breach of code etiquette because although it may indicate exactly when a branch is taken, it often does not tell you why it is taken.  Additionally it is difficult to modify once created.  [4, p192] [7, p301]

The "Decompose Conditional" refactoring rule states that when a conditional clause is complicated, the conditional logic should be extracted from the test (the parentheses after the if) and removed to a method.  Alternately, a boolean variable could be initialized before the test to true or false depending on whether the test should succeed or fail – then, the “if” statement tests the boolean variable.  In either case, the name of the method or variable self-documents the meaning of the test. [4, p192] [8, p301].

This example, taken from [8], illustrates using a boolean variable to simplify a conditional: 

if ( ( elementIndex < 0 ) || ( MAX_ELEMENTS < elementIndex ) ||

   ( elementIndex == lastElementIndex )

) {

   ...

}

Becomes

finished = ( ( elementIndex < 0 ) || ( MAX_ELEMENTS < elementIndex ) );

repeatedEntry = ( elementIndex == lastElementIndex );

if (finished || repeatedEntry) {

) {

   ...

}

In order to create an automated test for complex conditionals, we needed to decide upon a metric to use for determining whether a particular conditional is complex enough to be considered a breach.  The naïve approach to this might seem to be to count the number of boolean tests in the conditional we are considering.  For example, if the test is:

(<term1> && <term2> && ( <term3> || <term4>))

then there are four terms.  Advise's algorithm would then compare this count to some (perhaps user-adjustable) threshold.  If the count is greater than the threshold, a message is generated indicating that a breach has been found. 

The problem with this method is that a complex conditional may have only a few boolean-valued terms, but still be complex for other reasons.  For example:

if ((

     Math.sqrt((

         v1[0] * v1[0]

       + v1[1] * v1[1]

       + v1[2] * v1[2]) / 3)

   - Math.sqrt((

         v2[0] * v2[0]

       + v2[1] * v2[1]

       + v2[2] * v2[2]) / 3)

   ) < err)

{

   doSomething();

}

Note that this conditional only computes one boolean value, yet (in addition to containing hard-coded vector lengths) it is clearly too complex and should be decomposed to something like the following:

if (floatEquals(RMS(v1), RMS(v2))) {

{

   doSomething();

}

Another heuristic for finding complex conditionals could be to count the number of AST nodes in the conditional test (the thing in parentheses after the "if") and use this count as a metric for complexity.  So, in the following example, the AST has 17 nodes (two logic operators, three equality operators, two literals, three member accessors, and seven identifiers):

if (

      segment == null

   || segment.type != SegmentType.COMMERCIAL

   || segment.payable == false

) {

   Throw new Exception("Should not send an invoice for a non-commercial (free) spot.");

}

Figure 2

Assuming the tool’s threshold was less than 17, this would be flagged as a breach.  This is obviously better decomposed as:

if (!validCommercial(segment)) {

   Throw new Exception("Should not send an invoice for a non-commercial (free) spot.");

}

which only contains five AST nodes (the “not”, the function invocation, the argument list for the function invocation, and the two identifiers).

Figure 3

This approach gets us closer to where we want to be, but it still presents a problem: there are potential situations where a satisfactorily refactored test could have more AST nodes than a complex test.

if (

      dateDiff("day", sale.date, today()) > 30

   && sale.paid == false

   && sale.client.id > 100

   && sale.price > 999

) {

   sendUrgentBillingNotice();

}

This conditional obviously needs refactoring.  But if we were to simply replace all conditions with one function name, what would it be called?

if (isOverdueAndExternalClientAndCriticalSale()) {

{

   sendUrgentBillingNotice();

}

This is obviously not workable.  A better solution is:

if (

      isOverdue(sale)

   && !isInternalClient(sale.client)

   && isCriticalSale(sale)

) {

   sendUrgentBillingNotice();

}

This conditional statement, though acceptably refactored, has 17 AST nodes, the same number as our previous example which needed refactoring!

Figure 4

In fact, this second algorithm can be made usable if modified only slightly.  It turns out that associating a “cost” or weighting factor with each type of ASTNode in the tree mitigates the problems in the second approach.  This is the algorithm that Advise uses.  In order to determine whether there is a breach, we iterate through the AST for the conditional test, and tally up the “cost” for each node in the subtree.  If the total cost exceeds the threshold, we add this to our list of breaches.  Negations, function invocations and their argument lists, and identifiers all have a cost of zero.  Other expression-level constructs which are likely to appear in conditionals have a cost of one.  Constructs such as “for” or “while” which can appear in conditionals only in pathological cases get a cost of int.MaxValue:

// this is valid javascript, but is pathological

if (function() { var i; for ( ; i < 10 ; i++); return i; }() == 10) {

}

Requirements and Rationale

ECMAScript

We targeted ECMAScript for our initial version of Advise for two reasons.  Firstly, ECMAScript (unlike C and C++) does not allow access to arbitrary memory via pointer variables.  We felt that this would simplify any data flow analysis that might be required in future iterations of the tool.  Secondly, ECMAScript, although it does allow for object-oriented programming, does not require it.  This simplified our task for phase one and allowed time to focus on the task of making refactoring recommendations.

Phoenix

The Phoenix Compiler Framework from Microsoft is a comprehensive platform that allows compiler writers to easily create back-end plugins for Microsoft compiler technologies.  With it, optimization “phases” can be created and placed into the overall sequence of code transformations that the intermediate representation undergoes, from the time it is created until the time it is finally “lowered” to machine code.  The Phoenix API has many different utility classes that can be used to easily perform common tasks, such as creating flow graphs, performing data flow analysis, etc. 

Our wish was to leverage the Phoenix API as much as possible.  However, we eventually found that writing a Phoenix plugin was not adviseable, given the specific goals of our application.  Phoenix plugins exist too far “downstream” from the source code for Advise to be able to determine whether refactoring is indicated, much less perform effective source code transformations (see phase two goals).  In order to look for breaches, Advise needs to have access to the code at the AST level, before it is turned into HIR (the highest-level IR in the Phoenix system) [9].

Although there are AST classes in the Phoenix framework, support does not currently exist for plugging in a module at the AST level of the compiler.  For this reason, we decided to create our own front end AST classes, in the hopes that a future iteration of Phoenix will have better AST support.  Meanwhile, we worked to create code that transforms our AST into Phoenix HIR (one of the four types of intermediate representation that Phoenix uses).  Our hope is that an HIR representation of the code would give us access to Phoenix API functionality that would help with tasks such as flow graph creation and data flow analysis.

Output Format and Controlling Output

We decided to create an output format where, in addition to indicating the file and line number of each warning, each suggestion type is tagged with an ID:

filename.js,31: [S3] message to user

Correspondingly, Advise allows a command line option to suppress warning types by ID:

Advise /S S3 filename.js

In phase two we plan to create more granular controls over warnings by allowing the user to insert special comments in their source code which instruct the Advise system to ignore certain rule types on a per-file or per-line basis. 

Project Toolchain

The Advise system was implemented in C# on the Windows platform.  We chose C# to allow easier integration with Phoenix.  We used a command line build model and a classic makefile (we used nmake.exe, the windows equivalent).  To create our scanner and parser, we used the Gardens Point Parser Generator (GPPG and GPLEX) from QUT [6][7].  These tools output a C# scanner and parser.

The project is broken up into several modules (parsing, AST generation, suggestion engine) and each of these modules compiles their code to an intermediate “netmodule” file (the C# equivalent of a .o file in C).  The final build combines the netmodule files with an entry point class to produce the final executable.

Unit testing is built into the toolchain.  The makefile default target depends on the “success” output textiles from the unit testing targets and these targets in turn depend on a successful compilation of advise.exe.  This means that if a unit test fails, the build fails.  Our unit tests involve invoking Advise on a carefully prepared input file and piping the output into a temporary output file, then comparing that temporary output file to a canonical (correct) version of the output.  If there are no differences, the test succeeds and a success file is generated for that target.

Architecture

The first step in processing the input file is to parse it and build an Abstract Syntax Tree from the parse output.  We use a commonly accepted model for an AST where subclasses are used to represent different language constructs [1].  The ASTNode class is the superclass for all nodes in the AST.  It contains a linked list of children and has methods for adding children, removing children, etc.  In some cases, distinct language constructs share the same representative subclass.  For example, all of the following literals share the same class (“Literal”):

3004

"hello world"

'3'

NaN

In some cases we created AST classes to represent language structures that can be ignored by conventional compilers.  For example, the following code would be parsed by many compilers to an AST like the one pictured in Figure 5.  However because Advise’s phase 2 functionality includes source-to-source transformations and rewriting code, its AST for this expression needs to retain the parentheses.  Hence, it resembles Figure 6.

a * (4 + foo())

 

Figure 5

Figure 6

Our basic approach was to make the Advise engine recurse through the AST tree and then create a “suggestion” object in each place it sees a breach.  These suggestion objects are queued and then output to the user on stdout in a manner similar to compiler output.  This problem is interesting because most of the work associated with a suggestion object is the work that determines whether or not it should be created.  Therefore, we decided to use the Prototype design pattern as our object creational style, though not for the reasons it is sometimes employed (e.g., performance reasons) [10]. 

The prototype design pattern is an object creational pattern which requires that objects participating in the pattern extend or implement a “prototype” interface which includes a “clone” method.  When a new instance of an object is desired, the client invokes the “clone” method on the prototype object rather than creating a new object using the “new” operator [5] [10].

In Advise's case, a vector of Suggestion objects is maintained, each of which is the prototype for its class.  When a new suggestion is required, it iterates through the prototype vector, and calls the “isApplicable” method on each prototype object.  If it returns true, then it calls “clone” and queues the returned suggestion object.  Advise then continues iterating over the prototypes (as more than one refactoring suggestion could apply to the same source code construct).  When the code is fully examined, it outputs the queue of accumulated suggestions to the user.

The most significant advantage to this approach is that the code for determining applicability for a suggestion resides strictly within that suggestion’s class.  This avoids crosscutting concerns [2]: the code for everything having to do with a particular suggestion is localized within its class.  Adding a new suggestion to the framework merely consists of adding it to the suggestion vector in the initialization function – everything else happens within the newly created class.

Figure 7

Gotchas

ECMAScript syntax is based on C, but there are some significant differences, some of which presented problems in the creation of a parser.  One of these was in the area of array and object initializers.

In C, a static initializer for an array looks like this:

int a[] = { 0, 1, 2, 3, 4 };

This statement creates an array of length 5 on the stack and initializes the array variable name a to point to it.  This array initializer syntax can only be used in a declaration statement, however.  The following code is illegal in C:

int *b;

b = { 0, 1, 2, 3, 4};

In ECMAScript, a similar array initialization construct exists, and it has an object initialization analog.  So:

var a = [0, 1, 2, 3, 4];

var car = {make:"Saab", model:"9-3", year:2001};

the first statement is analogous to the C statement, where the second creates an object called car with three fields (make, model, and year).

The “gotcha” is that in ECMAScript these array and object initializers are legal anywhere an expression is legal.  So, not only is it legal to write the ECMAScript equivalent of the disallowed C code above, all of the following statements are completely legal:

x = [ ‘a’, ‘b’, ‘c’, ‘d’, ‘e’ ][2];  // var gets the value of ‘c’

y = [0][0]; // sets y to zero.

z = [[0][0]][0]; // sets z to zero.

function({foo:y}.foo); // passes the value of y into the function

This introduces a blatant ambiguity into the ECMAScript grammar.  How should the following code be parsed?

function f(a, b, c) {

   var r;

   r += a;

   {}; // this line is the problem

   r += b;

   r += c;

   return r;

}

What is the pair of brackets followed by the semicolon?  Is it an array initializer (for an empty array) that gets created and then thrown away in the same statement (technically an “eval” statement on a non-assignment expression)?  Or is it an empty statement block, followed by the empty statement (the semicolon)?  The language specification is inherently ambiguous on this point.

The ECMAScript spec deals with this in its BNF by stating that the “ExpressionStatement -> Expression” production is only taken if the lookahead is not “{“. [3]  In effect, this means that the empty bracket above is always parsed as an empty statement block.  It also makes it such that, of the following two statements, the first (an array initializer that appears by itself in a statement "eval") is valid, but the second (the analogous object initializer) generates a compiler error:

['a', 'b', 'c', 'c'];

{0:'a', 1:'b', 2:'c', 3:'d'};

The upshot of this for Advise is that since GPPG, like Yacc, cannot perform conditional reduction based on lookahead token, there are four different reduce/reduce conflicts in the grammar which all can be traced back to this ambiguity.  In order to enable correct parsing, it was necessary to ensure that the default conflict resolution strategy of GPPG (choosing the lower-numbered reduction) would ensure correct parsing for all four of these conflicts.

Results

To test the tool on real-world code, we put together a test suite of 23 different Actionscript files taken from various sources.  Some statement terminators (semicolons) had to be added to some files since Advise does not currently support the optional-semicolon functionality in the ECMA standard (that is, statements can be separated by either semicolons or newlines).  Some results are worth highlighting here:

easing_equations.as

Robert Penner's "easing equations" is a well known download within the Flash community [11].  It is a set of short functions used to add "easing" (speeding up or slowing down at the beginning or end) to an animation.  This file received the second-highest count of complex conditionals (eleven).  In most cases this conditional was the same code, repeated in many places:

if ((t/=d/2) < 1) ...

This is a test that returns true if the animation has not yet reached the midpoint (t is the current time and d is the duration).  To get a sense for the flavor of the functions in this file, here is an entire function:

// cubic easing in/out - acceleration until halfway, then deceleration

Math.easeInOutCubic = function (t, b, c, d) {

   if ((t/=d/2) < 1) return c/2*t*t*t + b;

   return c/2*((t-=2)*t*t + 2) + b;

};

(Note the assignment to t and use of t within the same statement).  It would not cost much in download time and would increase readability immensely if the following function were created and used in place of the complex conditional:

function pastMidpoint(t, d) {

   return (t / (d / 2) >= 1)

}

mm_livepreview.as

This file is one of the many Actionscript files distributed with a Flash installation.  It is a fairly nondescript file and is typical of the type of Macromedia code that comes with Flash.  This code had the most warnings (twelve).  Although some of the flagged conditionals were confusing and did seem to warrant refactoring:

while ((index = str.indexOf('\'', strIndex)) != -1) ...

...

if (ch == undefined || ch == '}' || ch == ']' || ch == ',') ...

Others seemed fairly straightforward and easy to read:

if (typeof(_root.instanceID) == "undefined") {

In this case the typeof operator adds weight to the complexity calculation but it doesn't really contribute toward the difficulty-of-understanding of this construct.  Based on these results, we changed the weighting algorithm for our complex conditional test so that the typeof and instanceof operators were zero-weight constructs.

Future Enhancements

More Suggestions

We plan to support a more complete catalog of breaches in phase two.  As mentioned before, some refactorings do not lend themselves to automatic detection.  However, those that do are summarized in the table in Appendix A.  A short description of the detection method is also provided.  Most of the refactorings in Appendix A are taken from [4].

Autorefactoring

We plan to add “autorefactoring” functionality in phase two where Advise will refactor code automatically.

Naming will be a major hurdle.  If a block of code is extracted to a method, Advise cannot automatically generate a method name that describes what the code does; however, naming the method descriptively is an essential part of the refactoring process.  Similar issues exist with most other types of refactorings. 

Another problem is that of user control regarding which changes to apply.  Some of Advise’s suggestions will necessarily be spurious: how can the user easily select the changes they want, while ignoring the ones they don’t?

To solve both problems, we plan to create a second autorefactoring tool which runs on the output files created by the initial run of Advise.  Advise will continue to run as it does currently, generating output on stdout and exiting with a nonzero status if suggestions are made.  However (if this feature not suppressed by a command line flag), an additional output directory will be created and filled with “change files” – files which record all the changes that need to be made to the source in order to perform an autorefactoring operation.  If possible, these files will be in a standardized format (such as unified diff) to open them up to other tools besides Advise.

If the user wishes to perform autorefactoring on the source, he/she would invoke "correct.exe".  "Correct" would then look for change files in the change directory and interactively apply them, one by one, allowing the user to skip any refactorings that they decide are false positives.  Everywhere a new method or file name is needed, "correct" would prompt the user to type it.  Alternately, "correct" could display a change browser, in which refactorings could be selectively chosen from a list.

HIR and Phoenix integration

We plan to use the Phoenix framework to simplify some of the tasks involved in looking for refactoring problems.  We will convert our AST structure into Phoenix HIR.  Once the code is represented as HIR, we should be able to easily perform data flow analysis using Phoenix tools [9].  This becomes especially important within the realm of autorefactoring.  Consider this example:

void getIDRange(int startid, int endid) {

   Recordset rset;

   Connection ct;

   Query q;

  

   String query = "";

   query += "select * from products where ";

   query += "id > " + startid;

   query += "id < " + endid;

   q = new Query(query);

   c = Database.getConnection();

   rs = Connection.doQuery(q);

   for (var i = 0 ; i < rset.count ; i++) {

       int id = rset[i].getField("id");

       if (rset[i].getField("available")) {

          printf("The following product is available:");

          printf("\t%s\n", id);

          printf("\t%s\n", rset[i].getField("name"));

          printf("\t%s\n", rset[i].getField("date"));

          printf("\t%s\n", rset[i].getField("quantity"));

          printf("\t%s\n", rset[i].getField("price"));

          printf("\t%s\n", rset[i].getField("couponed"));

          printf("\t%s\n", rset[i].getField("VAT"));

       } else {

          printf("product %d not avail", id);

       }

   }

}

We want to extract the inner code into its own method, like so:

void getIDRange(int startid, int endid) {

   Recordset rset;

   Connection ct;

   Query q;

  

   String query = "";

   query += "select * from products where ";

   query += "id > " + startid;

   query += "id < " + endid;

   q = new Query(query);

   c = Database.getConnection();

   rs = Connection.doQuery(q);

   for (var i = 0 ; i < rset.count ; i++) {

       int id = rset[i].getField("id");

       if (rset[i].getField("available")) {

          printProduct(rset[i], id);

       } else {

          printf("product %d not avail", id);

       }

   }

}

 

void printProduct(Recordset r, int id) {

   printf("The following product is available:");

   printf("\t%s\n", id);

   printf("\t%s\n", r.getField("name"));

   printf("\t%s\n", r.getField("date"));

   printf("\t%s\n", r.getField("quantity"));

   printf("\t%s\n", r.getField("price"));

   printf("\t%s\n", r.getField("couponed"));

   printf("\t%s\n", r.getField("VAT"));

}

In order to do this, we need to look at each variable in the inner block and trace it back to its declaration to determine whether it needs to be made into a parameter of the new method.

Conclusion

Writing maintainable code requires communication with two parties at once using the same set of symbols -- on one hand with the compiler, giving it the instructions necessitated by the functional specification and on the other hand, with the next programmer to look at this code (who could also be yourself, in a week, or month, or year). 

Developers can often understand code that baffles the compiler: human minds easily forgive a missing brace or semicolon.  On the other hand, compilers often have no trouble understanding code that looks like spaghetti to a developer.  The path to superior code lies at the intersection of the strictures of human and machine.  Advise will attempt to lead developers down this path.


References

[1] Aho, Alfred, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman.  Compilers: Principles, Techniques, and Tools (2nd Edition).  Addison-Wesley, 2007.

[2] Eaddy, Marc.  Static Analysis and Program Transformation: Tools and Applications.  2007.  <http://www1.cs.columbia.edu/~aho/cs4117/lectures/07-10-11_Eaddy4117.pdf>.

[3] Ecma.  Standard ECMA-262: ECMAScript Language Specification, 3rd Edition (December 1999).  1999.  <http://www.ecma-international.org/publications/standards/Ecma-262.htm>.

[4] Fowler, Martin, et. al..  Refactoring: Improving the Design of Existing Code.  Addison-Wesley, 1999.

[5] Gamma, Erich, Richard Helm, Ralph Johnson, and John Vlissides.  Design Patterns: Elements of Reusable Object-Oriented Software.  Addison-Wesley, 1995.

[6] Gough, John.  The GPLEX Scanner Generator.  2007.  <http://www.plas.fit.qut.edu.au/gplex/>.

[7] Gough, John and Wayne Kelly.  The GPPG Parser Generator.  2007.  <http://www.plas.fit.qut.edu.au/gppg/>.

[8] McConnell, Steve.  Code Complete: A Practical Handbook of Software Construction.  Microsoft Press, 2004.

[9] Microsoft.  The Microsoft Phoenix Platform Software Development Kit (SDK) Preview Edition, July 2007 Release.  2007. 

[10] Various authors.  Prototype Pattern.  <http://en.wikipedia.org/wiki/Prototype_pattern>.

[11] Penner, Robert.  Easing Equations.  <http://www.robertpenner.com/easing/>.

 


Appendix A

This is the list of additional refactorings that we plan to support in phase 2.

 

Refactoring Name

Description

Breach Detection Algorithm

Extract Method

Take a chunk of code (for example, inside an inner loop) and put it in a separate method.  Then call the method from the original place where the code came from

Look for inner scopes that have many local variables declared in the inner scope and which use few variables from outside that scope

Duplicate Code

Find places where code has been copied-and-pasted, and put the duplicate code in a function.  Then call the function in the places where the duplicate code was found.

Create a hash value for each AST node that depends on its type and the hash values for its child nodes.  Look for hash collisions.

Assignments to Parameters

In a function, values of parameters to that function should never be changed (the parameter variables should never be assigned to)

Look for parameter assignments

Move Method/Move Field

Take a method or field in one class and move it to another class

Look for method or field which is referenced in outside of its class more often than it is referenced in its own class.

Extract Class

A class is too complex and does too many things: take some methods and properties and put them into a separate class.

Create graph (similar to flow graph) representing method and property accesses.  Examine connectedness of this graph to find islands of functionality that do not communicate with the rest of the class.

Replace Magic Number with Symbolic Constant

Magic numbers are generally considered harmful.

Look for literal values that appear inside expressions and which are never assigned to variables.

Encapsulate Field

Public fields are generally considered harmful as well.

Look for fields marked public.

Replace Global Variable with Parameter

If a variable has been made global only so it can be used within one called function, perhaps it should be a parameter instead

For each global variable, count usage across methods, and compare with flow graph.

Create Parameter Object

If a group of parameters is always passed together to a bunch of different methods, they should be made properties of an object that gets passed to these methods

Look for repeated passing of the same parameters to multiple methods.

Pull up Field/Method

A field or method is used in multiple subclasses. 

Similar to Duplicate Code

Push Down Field/Method

A field or method is only used by one or a few subclasses

For each field and method, get total usage by “type context”: i.e. from pointers to the type or code in the type.  If a method is only used in a subtype “type context”, push it down.