Thursday, April 21, 2011

Finding Primes

So, when I do interviews, I almost always ask a math question. I don't ask a "brain teaser"--I find even the basics of mathematics to be challenging enough for most interviewees.

(And I've interviewed plenty of non-Americans. So, this post is not a diss on the American math education system, though it could possibly use some refinement. Different post.)

One question that has really shown to be a litmus test is the problem of detecting whether or not a natural number is prime. I don't count the one interviewee who didn't know the definition of a prime number (oh, dear). For the rest of them, though, I've found that only a small percentage can do even the most basic solution.

My favorite response was just a short time ago. After I posed the problem to the candidate, he said, "Now, if you had given me the Fibonacci problem, I could do that!"

Why is this problem so hard?

Mostly, I think people get tied up with efficiency. There is a brute force way that works great:


for ( int i = 2; i < number; i++ ) {
if ( number % i == 0 ) {
return false;
}
}

return true;


The majority of people that can't find a good solution get hung up on how inefficient the algorithm is because you are testing unnecessary numbers. For example, does one really need to test divisibility by 4? Of course not--you already eliminated that when you tested 2.

The mind-bending response that I get from some is, "well I would just have a list of all the primes from 2 up to that number available and would test the number's divisibility against those". Good idea, but how do you create that list of primes? Isn't that why you are creating the method anyway?

For fun, we could create a really terrible recursive solution based on that:


public boolean isPrime(int num) {
for ( int i = 2; i < number; i++ ) {
if ( isPrime(i) && num % i == 0 ) {
return false;
}
}

return true;
}


That said, I have never once gotten that as a response even though so many people say they only want to test the primes beneath that number against it.

So, both are inefficient, though O(n) ain't bad. Can we improve it? You betcha.

First of all, we don't need to test everything up to that number. Other than the number 2, there is no number on the natural number line that is evenly divisible by the number immediately beneath it. 3 isn't divisible by 2, 53 isn't divisible by 52, etc. Also, other than 3 and 4, there is no number on the natural number line that is evenly divisible by the number - 2. 5 isn't divisible by 3, 74 isn't divisible by 72.

In fact, speaking of 72, let's see what its factors are:

2 x 36
3 x 24
4 x 18
6 x 12
8 x 9
9 x 8
12 x 6
18 x 4
24 x 3
36 x 2

The first thing that you noticed is that no factor is above half 72. That is true for all numbers. Thus, we don't need to test above half way because we know for sure that none of the numbers above half will be factors.

We can do better, though. Do you notice how we just invert and repeat ourselves when listing the factors after we pass a certain point? Here, it is at 8x9. We certainly don't need to test above that since multiplication is commutative. Where is that point for other numbers.

How about 64?

2 x 32
4 x 16
8 x 8
16 x 4
2 x 32

Or how about 154?

2 x 77
7 x 22
11 x 14
14 x 11
22 x 7
77 x 2

You see the pattern? Right in the center, the factors invert and repeat. That center is the square root.

So, a more efficient algorithm would be:

double lim = Math.sqrt(num);
for ( int i = 2; i <= lim; i++ ) {
if ( lim % i == 0 ) {
return false;
}
}
return true;

That's O(sqrt(n)), which I think is pretty good.

If you were bored up until now (why are you still reading, then?), here is where things get interesting.

The fact is that that is a very brute force way to do it. It's probably the best we have as a general approach for discovering if a particular number is prime. What about getting a list back of all the primes beneath a given number?

So, I've always wanted to ask this question, but I think I'd definitely get a blank stare from 99% of candidates.

Somehow, the size of this question gives more room for bits to compress into a faster algorithm that the brute force way.

Let's review the brute force way really quick, though. To return a list of primes beneath a given number, one needs to test each one for primacy and add them to the list if they pass the test:

public static boolean isPrime(int num, List primes) {
double sqrt = Math.sqrt(num);
for ( Integer prime : primes ) {
if ( prime > sqrt ) {
return true;
}
if ( num % prime == 0 ) {
return false;
}
}
return true;
}

public static List findPrimes(int max) {
List primes = new ArrayList();
for ( int i = 2; i <= max; i++ ) {
if ( isPrime(i, primes) ) {
primes.add(i);
}
}
return primes;
}

You'll notice that I am adding one efficiency here which is only testing against those that I have already discovered to be prime in previous iterations. Regardless, though, this is still about O(n*sqrt(n)).

Enter the awesome Eratosthenes. He offered a different way to discover primes. He recognized that with the premise that k was prime, we could assert the following inductive reasoning:

All numbers k*n are composite where n = {2, 3, ... }, so

k = 1 is composite
k = k + 1: If unmarked, k is prime: Mark all k*n as composite; otherwise, k is composite

Following his sage wisdom, we get the following algorithm:

public static List findPrimes(int max) {
boolean[] isComposite = new boolean[max];
isComposite[0] = true;
for ( int i = 2; i <= max; i++ ) {
if ( !isComposite[i-1] ) {
primes.add(i);
for ( int j = i*2; j <= max; j = j + i ) {
isComposite[j-1] = true;
}
}
}

return primes;
}

We could probably trim some cycles of off this to reduce its coefficient, but, overall this is a huge improvement.

Here are the numbers:

primes under 100000: 9592; 33 ms
primes under 100000: 9592; 8 ms

primes under 1000000: 78498; 277 ms
primes under 1000000: 78498; 18 ms

primes under 10000000: 664579; 3872 ms
primes under 10000000: 664579; 551 ms

primes under 100000000: 5761455; 88055 ms
primes under 100000000: 5761455; 8635 ms

Anyway, there's your nerdy math update for the day.

Wednesday, October 27, 2010

Problems with Defensive Collection Getters and JAXB

A good general practice with collection getters is that they a) return a reference to the collection or b) return an unmodifiable wrapper around the collection. Further, it's often desirable to have an addXXX and removeXXX instead of a collection setter to complete the encapsulation picture.



public class Order {
private List items;

public List getItems() {
Collections.unmodifiableList(items);
}

// .. no setter

public void addItem(Item item) {
items.add(item);
}

public void removeItem(Item item) {
items.remove(item);
}
}

This practice runs awry when using this object for marshalling XML with JAXB. What you will notice when it tries to unmarshal and XML message representing this object is that the resulting items list will be empty.

That's really too bad.

What is happening is that JAXB in the absence of a collection setter is calling the getter in the hopes of calling "add" on the reference that it gets back. Unfortunately, since we are returning an unmodifiable list, JAXB silently fails by not adding any items at all and moving on.

The solution is actually pretty simple. We just need to tell JAXB to look at the field instead of the method. You do this with two annotations:

public class Order {
@XmlElementWrapper("items") @XmlElement("item") private List items;

...
}

And that's all! Now, you can keep your defensive collection encapsulation and still allow JAXB to do it's unmarshalling. Enjoy!

Monday, October 11, 2010

Adding paths to your class loader

So, a class loader is immutable in Java, right? At least, there are no setter methods in the public API (except the assert stuff) and there is no obvious way to specify what class loader you might want to use where.

This came to bug me while creating a maven plugin the other day in which the plugin needed to read a specific classpath resource from the project it was running in. Since maven plugins run in their own class loader, I wasn't going to be able to access project classpath resources.

I might have been able to add the @requiresDependencyResolution metadata annotation to resolve the problem, but we really didn't want to box ourselves in to needing an enclosing project to run the plug-in.

The Maven Exec Plugin gave me an idea.

The Maven Exec Plugin is the maven-y way to run command-line java through a maven goal. It can either be run within the maven process as a separate thread or as a separate process. Either way, it has the same challenge of propagating the enclosing project's classpath on to a separate and distinct classpath context.

In the in-maven-process case, what they do is create their own classloader, and then run the invocation of the main method inside a separate thread, setting that threads context classloader along the way:



private void executeWithClassLoader(Runnable runnable, ClassLoader classLoader) throws MojoExecutionException {
IsolatedThreadGroup threadGroup = new IsolatedThreadGroup( runnable.getClass().getName() );

Thread bootstrapThread = new Thread( threadGroup, runnable, runnable.getClass().getName() + ".run()" );
bootstrapThread.setContextClassLoader(classLoader);
bootstrapThread.start();

try {
bootstrapThread.join();
} catch ( InterruptedException e ) {
Thread.currentThread().interrupt(); // good practice if don't throw
getLog().warn( "interrupted while joining against thread " + bootstrapThread, e ); // not expected!
}

synchronized ( threadGroup )
{
if ( threadGroup.uncaughtException != null )
{
throw new MojoExecutionException( "An exception occured while executing the Java class. "
+ threadGroup.uncaughtException.getMessage(),
threadGroup.uncaughtException );
}
}
}


The Runnable that is passed in is the section of code where you actually need access to the project's classpath. The ClassLoader is created like this:

URL outputDirectory = new File( project.getBuild().getOutputDirectory() ).toURI().toURL();
ClassLoader classLoader = new URLClassLoader( new URL[] { outputDirectory } );
this.executeWithClassLoader(runnable, classLoader);

Of course, there may be other directories or artifacts that you need to add for your runnable to function correctly, but that is the basic idea.

For more ideas, check out http://svn.codehaus.org/mojo/tags/exec-maven-plugin-1.2/src/main/java/org/codehaus/mojo/exec/ExecJavaMojo.java

Thursday, October 7, 2010

EasyMock and varargs

EasyMock is a neat tool for creating mocks at runtime for unit tests. It does some pretty cool things, including writing their main API using Fluent Pattern.

For example, when adding behavior to your mocks, you would call the following:


EasyMock.expect(...).andReturn(...).times(3).andReturn(...).times(2)...


Awesome.

There is one feature, though, that is lacking that can send you for a loop if you are unaware, which is varags.

If you need to add a behavior to a method that takes var args, you will need to how many parameters are going in at test time.

For example, say I have the following method:


Object myMethod(Object... args)


That I want to supply behavior for. In EasyMock, there isn't a way to say "this is a var args method". So, you will need to expand it according to your test case:


EasyMock.expect(myMethod(arg0, arg1, arg2)).andReturn(...)...


The reason is that EasyMock's strategy for method matching is to count the number of arguments in your behavior with the number of arguments in the method invocation. If you just say


EasyMock.expect((String[])EasyMock.anyObject())...


or something like that, it will see that there is only one parameter, whereas the method invocation parameters will never match.

Thursday, August 19, 2010

Cross-referencing plugins in Sonar 2.2

Formerly, we had three Sonar plugins, two "mavenly" dependent on the other. The parent plugin held the common code for uploading non-Java files into Sonar for reporting. The other two took care of analyzing xml and css, respectively, and tying violations to those files.

This worked great in Sonar 2.0.1, but when we upgraded to Sonar 2.2, the violation coloring stopped working on the server for these files.

What could be the problem? I walked carefully through the Sonar code and saw that the violations were making it into the database, but that two resources were getting created for each file in the project.

This didn't make a whole lot of sense since the files and their violations (and other metrics) are housed in the singleton DefaultSonarIndex as a map of Files to metrics (the actual class name for all the class metrics is called Bucket). What could be causing to records of the same file to make it into the map.

Enter the Resource equals method:


1. public boolean equals(Object o) {
2. if (this == o) {
3. return true;
4. }
5. if (o == null || getClass() != o.getClass()) {
6. return false;
7. }
8.
9. Resource resource = (Resource) o;
10. return key.equals(resource.key);
11.
12. }

This is a pretty standard looking equals method that doesn't really seem to be the suspect since I know that they keys are the same by verifying it in my debugger. The crazy thing is that it breaks on line 5.

What?? o is definitely not null and they are definitely the same class...oh, right...as of Sonar 2.2, each plug-in loads in its own classloader. The first loading of the class was by the parent plugin to load the source into Sonar and the second loading was by the child plugin to specify violations.

So the quick solution was to take the small amount of code in the parent project and distribute it to the others. The cleaner fix would be to refactor it so that only one project is actually referring to the classes (probably the children).

Phew. That was a tricky one.

Integration Tests in Sonar

Integration tests are another important aspect of analyzing a project's overall health that Sonar does not yet support out of the box. To get this functionality, you'll need to build a couple of Sonar plugins (or try using the ones that I built) that will instrument your integration test code, run the integration tests, and collect the integration test results as well as the new coverage data.

It sounds like a tall order, but a lot of the work has already been done for you.

Sonar runs the unit tests in your project automatically by way of surefire. Sonar has a surefire plugin which executes the surefire:test goal and then collects the results by reading the TEST-xxx.XML files that it produces.

So, why not do the same with failsafe, the maven plugin for running integration tests? Sounds pretty simple and there are only a couple of catches.

The first catch is that there are not pre-defined metrics in Sonar for integration tests. So, if you don't mind piggybacking on the unit test metrics--meaning that your success, failure, and coverage numbers will be aggregated across both unit and integration--then simply copy the SurefireSensor, changing only the part where you are saving the metrics to updating existing metrics.

The second is a corner case. What if you don't have any unit tests, but you do have integration tests? If the surefire sensor doesn't find any unit tests to run, then it preemptively sets one of the metrics that later on is overwritten by the UnitTestDecorator (read: things explode). The only way around this one (that I've found) is to add another sensor to our failsafe plugin that, when the Sonar batch process first starts up, creates a dummy surefire empty test result for the surefire sensor to catch. While temporary and dummy files give me an icky feeling inside, it does the trick.

One more involved way around both of these is to create your own set of metrics and a decorator to display them. In this way, you wouldn't collide with any of Sonar's efforts to run unit tests. I haven't done this, yet, and I think that there might be more involved than that, so I didn't touch that route.

Okay, the failsafe plugin takes care of running and reporting on integration test execution. What about code coverage?

My coverage tool of choice is Emma. A while ago, Sonatype wrote an article explaining how to use emma4it to add integration test code coverage to emma. So, we can follow the same pattern, creating a Sonar plugin to execute the appropriate maven commands.

The only tricky part here was telling Sonar when to run each command. As far as I understand it, one cannot specify maven lifecycle points at which to run each maven goal. Instead, Sonar invokes all the goals serially at the point when it's that sensor's turn to execute.

What we really want is emma:instrument and emma4it:instrument-project-artifact to happen together, then the failsafe sensor, and then emma4it:report. Hmm...

The way we solved it was to have three different MavenPluginHandlers and Sensors in our emma4it sonar plugin. The emma:instrument and emma4it:instrument-project-artifact are dependent on emma finishing and emma4it:report is configured as running in the Phase.Name.POST phase. Failsafe, also specified as being dependent on code coverage finishing falls by default in between the two.

If there were a closer mapping to post-integration-test, process-classes, etc. in the future, this piece would become a lot cleaner.

I'm going to spend a bit of time cleaning up my code before I upload it, but after that, you are free to code by example. :)
Enhanced by Zemanta

Tuesday, August 3, 2010

Reporting more than Java code in Sonar (Part I)

Of course, anyone that has done static analysis on their project in the past has found certain bad practices that are out of their tools reach to spot. Some examples are:
  • Front-end code, like CSS and HTML
  • Configuration files, like a Maven pom.xml or a Spring applicationContext.xml
  • Localization files
While not supported out-of-the-box, Sonar makes reaching and reporting on these areas of your project much easier. Basically, here are the steps to getting Sonar to report on additional languages:
  1. Make Sonar aware of your new language.
  2. Attach quality rules for that language to an existing Quality Profile.
  3. Create/use a tool that will detect the bad practices you are looking for.
  4. Hook that tool together with Sonar either via the tool's Maven build plug-in or by invoking it programmatically within the Sonar plug-in framework.

Make Sonar Aware of Your New Language

First, it is easy to make Sonar aware of an additional language. In our case, we wanted to address configuration found in various xml application files for Spring, Maven, JSF, and the like. The following seven files are required:
  • XmlPlugin.java - What you are making in the most basic sense is a Sonar plug-in. For each sonar plug-in, there is a main plugin file like this one. I won't go into this here. Instead, I'd recommend you look at the Sonar Plug-in Documentation.
  • Xml.java - This file is what the rest of Sonar will refer to when it asks what language a file is, etc. Ours looks like this:

    public class Xml extends AbstractLanguage {
    protected static final String[] EXTENSIONS = { "xml", "xhtml" };
    public static final Xml INSTANCE = new Xml();

    public Xml() {
    super("xml", "XML"); // 'key' and 'name', or, in other words, internal and external names
    }

    public String[] getFileSuffixes() {
    return EXTENSIONS.clone();
    }

    // ... some other helper methods
    }


  • XmlFile.java and XmlPackage.java - These two classes represent the xml file and directory metadata. They aren't a way to get at the contents of the file, but instead its name, location, etc. The both extend org.sonar.api.resources.Resource.

  • XmlSourceImporter.java - This file is in charge of looking up all the "xml" files in the project and notifying Sonar about them. Because we also want to include the main pom.xml file, which is outside of the source directory, this class is a little more complicated. However, you can easily pull out the basics from it:

    public class XmlSourceImporter extends AbstractSourceImporter {
    public XmlSourceImporter() {
    super(Xml.INSTANCE);
    }

    public void analyse(Project project, SensorContext context) {
    try {
    doAnalyse(project, context);
    } catch ( IOException e ) {
    throw new SonarException("Parsing source files ended poorly", e);
    }
    }

    protected XmlFile createResource(File file, List sourceDirs, boolean unitTest) {
    ... create an XmlFile object ...
    }

    /* Depending on needs, one might ask the kind of project that it is or something. In this case, though, we want to execute this importer on every project, since we are anticipating the existence of xml files in every project. */
    public boolean shouldExecuteOnProject(Project project) {
    return isEnabled(project);
    }

    protected void doAnalyse(Project project, SensorContext context) throws IOException {
    ProjectFileSystem fileSystem = project.getFileSystem();
    File root = fileSystem.getBasedir();
    List sourceDirs = fileSystem.getSourceDirs();
    sourceDirs.add(root);
    List xmlFiles = ...magical method call that looks recursively in the sourceDirs for xml files...
    for ( File xmlFile : xmlFiles ) {
    // if it is in a derived directory, like the build directory, we don't want it
    if ( DefaultProjectFileSystem.getRelativePath(xmlFile, fileSystem.getBuildDir()) == null ) {
    sourceFiles.add(xmlFile);
    }
    }

    }
    }


Attach Quality Rules

Now, what is not so obvious is how to get Sonar to report on all languages, Java and otherwise, on one dashboard. In fact, in an email that I sent to the the Sonar developers, they apparently don't officially support it, yet. But, we found a way! And it works great for us.

The key lies in creating a rules repository that contributes to the existing Java quality profile that you already have set up. A rules repository is just another Sonar extension, one that represents an xml rule configuration file and marshals the contents of that file into a RulesProfile object.

Here is an example of what the getProvidedProfiles might look like:


public List getProvidedProfiles() {
RulesProfile profile = new RulesProfile("My Profile", Java.KEY);
profile.setDefaultProfile(true);
profile.setProvided(true);

List rules = getInitialReferential();
List activeRules = new ArrayList();
for (Rule rule : rules) {
activeRules.add(new ActiveRule(profile, rule, rule.getPriority()));
}
profile.setActiveRules(activeRules);

return Arrays.asList(profile);
}


In the end, it's a little bit of a hack.

The only other thing that is necessary from Sonar's perspective is a way to read the violations file that your analysis tool creates. We modeled ours after PMD's violations file. Here, you can extend AbstractViolationsXmlParser and follow the pattern in the PMD Sonar Plugin.


Enhanced by Zemanta