What I Learned at JavaOne, Part III - Language Design

9:51 PM 0 Comments

This is part of a series explaining what I learned at JavaOne.

Language design was a big topic at JavaOne, and I found myself in several sessions that extolled the virtues of varying languages now running on the JVM as well as new features being added to Java itself.


Embedding Scripting Language into Java, by Paul Thwaite


I'm not sure exactly what I was expecting from this talk, but it was more basic than I had imagined.  It was still good information, but I probably could have gleaned it simply from reading up on JSR 223.

The talk was basically about the introduction of a standard API for evaluating java.next scripts from Java code.  You have probably evaluated Groovy or Javascript in the past from withing a Java program using the Groovy script engine or Rhino; this JSR is to standardize those interactions:


import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
ScriptEngine scriptEngine = scriptEngineManager.getEngineByName("jruby");
scriptEngine.eval("puts 'Hello, World!'");


There are some gotchas around configuring the scripting engine as configuration is not part of JSR 223. This means that if you want to tweak certain aspects of, say, the Groovy Scripting Engine, you will need to invoke it with the Groovy API instead of the javax.script one.

That was basically the extent of the session.

But, what about debugging? Stack traces? Script injection (since you can supply a context object to the script)? Performance? Compilation to binaries? I would have liked to see these covered.

Also, there is the really interesting feature of type inference by method names via the Invocable interface. You can read the Java Scripting Programming Guide for more information. It seems like there are some interesting design patterns that could be derived from that (duck typing?). It would have been nice to delve into that a bit.

Oh, well. Thwaite did a good job making it clear how get started in embedding scripts; I sort of wish we had reached a higher level of discussion, though.

 

Nashorn:  Javascript on the JVM


Nashorn is basically a replacement for Rhino, the existing standard for Javascript on the JVM.  This session was an exciting one for me because we use Rhino for a significant piece of our architecture, and I would be very happy to replace it with something faster, more debuggable, etc.

I suppose the session could be summed up in four words:  It's Rhino, but better.

This session was interesting because it was equipped with in-person testimonials from Twitter and NetBeans folks.  The fellow from Twitter said they were using Nashorn for doing server-side template rendering with mustache.js.  The individual from NetBeans said they were using Nashorn for tool support--code completion and the like.

Nashorn, as of the talk, was 99.99% compliant with ECMA test262.  They anticipated 100% compliance by the end of the week.  We don't do anything particularly complex with our javascript, so this doesn't matter as much to me, but I'm always happy to see organizations give high priority to standards!

Unfortunately, it is being written for JDK 8.  They have plans to backport it to 7, but not to 6 (though the NetBeans guy said that they did some hacking and got it to work with 6).  I would suppose that this is mostly because Nashorn is probably leaning heavily on InvokeDynamic to get high performance and it would be a bear to take all of that out to have it be JDK 6-compatible.  As happens with a number of big companies, we just got onto JDK 6, so it might be a while for us to make the change.

 

Meet the Java Language Team, by Brian Goetz, Joe D'Arcy, Michael Trudeau



While it might sound a bit romanticized, it was great to "sit at the feet" of some real language experts.  Language design is very intriguing for me, and these guys get to do it every day.

The panel didn't have any planned discussion; they just took questions.  Here are a few of the interesting ones:

What is happening with annotations? There are some new annotation APIs, like caching.  Annotations on type parameters is being considered:  List<@NonNull String>, for example.  There is the possibility of specifying a method literal in an annotation.  There might be a change to the rules for annotation inheritance on methods.  The restriction on repeating annotations on a given element is being removed.

Will Java add Union or Intersection types? Probably not right now--it would make the language more complex for very little gain.  JDK 7 does add union typing for try-catch blocks, and it is more a case-by-case approach than adding union and/or intersection types wholesale.

What about Design by Contract? Apparently this was the number one RFE for a while, but has since dropped off the radar for these guys.  C# has added support for it via Spec#, you can use JML if you like.  Apparently, there are some very tricky problems that I don't completely understand regarding object invariance in a multi-threaded environment, but what if I just want to say this method parameter needs to be non-null or have a length greater than 10?

Type inference improvements? Yes.

Could we skirt backwards compatibility for increased velocity, e.g. fork Java? Brian Goetz said, "There is a lot of life in Java without breaking backwards compatibility."  While I would agree with him there, I wonder if that is a big of marginal thinking on his part.  Anyway, other groups have already done what this question proposed, so I'm not sure what this individual wants.

Support for immutability? Yes, though it is tricky to understand what engineers would really use.  For example, there are a number of cases where during object composition, you don't want it to be immutable, but that there is some point at which you would like to "make" the object immutable at runtime.  What is the right way to support that?

Multiple Inheritance? No.  :)

And then there were a couple of comments from the panel that weren't super-related to the questions, but were insightful nonetheless.  One was the idea that there are a number of problems that are introduced in Java applications due to the fact that constructors have full access to the language, and maybe they shouldn't.  Of course, I'm not sure how you could make a backwards compatible change like that.  Another was a brief discussion of project Sumatra, which will allow Java applications to leverage GPUs and APUs for better performance.

 

Intro to Play Framework, by James Ward


This was pretty interesting for being a 101 talk.  The framework has what you would expect--rest support, server-side templates, etc.--which is great because it increases the prospect of using Scala at a broader level than Scala's niche of highly-concurrent applications.  Some nicer features are that it calls Google Closure and minifies Javascript and Coffeescript as part of the build process.  It also comes with its own web container; it leverages this to auto-refresh code in dev mode, give very detailed error information including displaying the code in context of the error in the browser, and make it possible to easily right functional, integration, and unit tests.

It looks intriguing, though I am cautious about one thing:  The controller methods need to be static.  Eww.  I'll bet that makes them harder to test in isolation, no? I suppose I'll have to try it out to see.

Scala Case Study, by Brian Tarbox


This session was much more interesting because of the software than the presenter's actual defense for using converting his software from Java to Scala.

Consider the tale of two Java libraries:  The first is a very popular logging framework, the second a MIDI sound player API.  With the first, you can do things like log.error("Something terrifyingly bad happened").  With the second, you can do things like player.play("E D C D E E E") (Mary Had a Little Lamb, as we all know).  The first is typically either the blessing that helps a developer find the needle-bug in the haystack-application or the cursing that is looking for hours and esoteric messages that previous engineers apparently thought would be helpful.  The second is the source of relaxing symphonies emanating from your JVM.

Now, what would you get if you put these two technologies together? You would get Log4JFugue, of course, or a library that puts your logs to music.

Wow! How cool is that?? Brian's hypothesis is that you can listen to the sound of your car and tell what is wrong, so why not your application? And, since you can code while listening to music, you can suddenly be more efficient, too.  He played a segment of his log files for us, and it was great.

Anyway, I know that most people felt the same way as I because nearly all the questions after the presentation centered around the library as opposed to his choice to port it to Scala.

Since he was trying to make a case for making the jump, though, I should probably say a little bit about that. If I weren't already a huge fan of Scala--the actor model, immutability, currying, tuples, traits, DSLs, and on and on--he would not have convinced me.  Most of his examples were "Look at how much more compact my code is!" which, while a good reason, is certainly only one of many and not the most important for me.  It is true that the fewer lines of code you write, the fewer places you can have an error, but there's usually an API or a compiler for that.

Show me how it scales better (actors), is more secure (immutability), is easier to design code with (BDD), is easier to enhance the language for greater expressiveness (DSLs), and these will be more convincing for me than being able to write code in fewer lines.  The language features and APIs in Scala are definitely persuasive, but when I talk to most enginners at bigger companies, they say, "but the junior developer can't read it!".  I believe that Scala is a good move for enterprise-level companies to make, but it can't be just because the engineers would like to be more terse with their code.

 

Who's More Functional?, by Andrey Breslau


Breslau is a funny guy, which made this talk a pretty enjoyable listen.

His point was basically, "Once Java has closures, all the languages are equally functionally expressive, so can't we all just get along?" This was ironic since he is the lead designer on yet another functional-oo-language-hybrid-language, Kotlin.

He has his reasons, of course.  One of them being that he feels Scala is very good but 1) too complicated to ever give good tool support and 2) too hard to follow with the implicit keyword in place.  While Breslau may be right on the first one (I am more optimistic), I unfortunately like the implicit keyword since it allows me to add methods to String, etc. instead of having a zillion StringUtils classes all over my code.

What I think that Kotlin may need is a feature to differentiate it from the rest.  Maybe they can get traction with the "simplified Scala" approach, but honestly that's what Java seems like to me.  Once Java has closures, what does working with Kotlin buy you? (Actually, before I give my review, I should probably try it out, shouldn't I? :))

Kotlin aside, Breslau made two statements that I would have probably liked him to clarify.  First, he said that no alleged functional language was purely functional because they all had side effects.  This seems to be a pretty debatable topic as several folks see Haskell as a pure functional language even if it does do I/O (as I would suppose all languages must).  I suppose the same could be said for Lisp.  Are all four he was comparing (Java, Kotlin, Scala, Groovy) capable of programming with side-effects? Yes.  However, it might have been a bit of a controversial way to explicate his thinking.

Second, he said that this:

collection.filter().sort()

was backwards from this:

sort(filter(collection))

which I disagree with.  I think that it is probably true that the first one is more readable, but the latter is definitely a composition of functions (think f o g and g o f).  He might have been speaking in pragmatic terms though, not functional ones.

Of course, I think there is room for both.  I think that Breslau gave a good example in the beginning when he showed how the Fibanocci numbers could be evaluated more quickly iteratively than recursively, where recursion is the standard functional approach.

Anywho, I'll have to give Kotlin a shot some time.  I would still say that Scala is way more functional than Java.

 

Annotation Processors, by Ian Robertson


Even though they've been around since Java 5, I somehow had never heard of the Annotation Processor API.  With annotation processors you can:
  •  Add compile time warnings and errors of your own
  • Generate code and other resources 
While I guess auto-generation of DTOs or the like might be cool, I see the potential for adding your own compiler errors as having a lot of potential.

Currently, when we want to enforce a particularly important development need, like using the right class name when naming a logger or not having a Spring-managed bean be final, the best we have are Sonar violations.  While Sonar violations are great from a technical debt standpoint, they aren't a good fit for alerting the developer to when he is doing something that is just not going to work.

Enter Annotation Processors.  The Annotation Processor API basically gives you access to the AST for Java files that your configuration indicates you would like to process.  There are a bunch of rules regarding using Elements vs. Mirrors, which I didn't get a complete grasp of during the discussion, but the essence is that you use the API to analyze each Java file during the compilation phase to see whether it complies.

Robertson gave the example of building two Annotation Processors.  One checks to make sure a POJO annotated with the JPA @Entity annotation has a default constructor.  JPA will not correctly deserialize to this POJO without it.  This was about 30 lines of code, which will save the developer a bunch of time since missing constructors will now be caught at compile time.  The second one checked to see if POJOs using the @OneToMany annotation have the correctly named property referenced in the annotation.  This one was much more complicated being upwards of 200 lines or so.

It sounds very verbose, and it makes me wonder what improvements might be made on it through a DSL and/or case classes since they are a bit more suited to compiler-type logic.

Anyway, I'm very excited to try my hand at it.

0 comments:

What I Learned at JavaOne - Part II, Performance Tuning

12:01 PM 0 Comments

This post is the second in a series.

While I have worked for about three years professionally in the realm of security and feel that I now know a thing or two, I know very little in-depth about performance tuning, so the classes I attended were typically chock-full of new information from my perspective.

JVM Bytecode for Dummies, by Charles Nutter

I was really excited to go to a session by Charles Nutter, one of the lead developers on JRuby, especially one that talked about something so low-level as JVM bytecode, which he certainly knows a lot about by now.

First, there are some n00b things that I somehow hadn't gleaned over the last 13 years of coding in Java.  Don't judge me.

Opcodes.  There are over 200 opcodes currently supported by the JVM.  Wow! Years ago, I was tasked with writing some FindBugs rules where I had to learn about ldc, ifeq, ifne, and several others, but I really had no idea that there were so many.  Apparently, they are one byte long which means that there are about 50 possible opcodes left.

CLR.  CLR, the C# virtual machine, does not interpret C# code at runtime, which has the interesting consequence that it can't optimize out things like null checks via profiling at code during runtime.

Finally.  The are only two opcodes that have ever been deprecated:  jsr and ret.  Due to the deprecation of these opcodes, the contents of finally blocks are replicated at every possible exit point.  This could have footprint and optimization implications should your finally blocks be big (hopefully they are not; I don't recall ever having a finally block more than a half-dozen cleanly-spaced lines).

Double vs. AtomicLong.  Because the JVM stack elements are 32-bits each, it means that doubles take up two adjacent stack elements.  This means that double d = 32d; (for example) is not atomic.  So, use AtomicLong instead if you are in a highly-concurrent app.

So, Nutter mostly talked about three things.  The first was a very quick overview of the most common opcodes and how a stack-based interpreter works.  This was an interesting review, though I was already familiar with the majority of them due to my FindBugs efforts (note:  I'm sure I don't have the deep understanding of them that language designers do as most of my work was running javap -c and representing the generated bytecode with the FindBugs API).  

The second was interlaced with the opcode overview, which was his demonstration of the bytecode DSL BiteScript.  This looked really interesting as kind of a bytecode++.  It looked a lot like bytecode (format and commands), but there were several enhancements like commands such as iftrue and iffalse instead of the ever-confusing ifeq and ifne.  It also allowed for macro definitions to make certain things like static invocations less wordy.  He also briefly mentioned ASM, Groovy and Scala support, and JiteScript.  I haven't had occasion to try any of these yet, but they definitely appear to be worth a shot.

The third was his rationale for writing bytecode in the first place.  Writing bytecode, first of all, can be a very good performance enhancement for heavily reflective code since it makes it possible to have reflection-less code.  Also, bytecoded data objects, like what Hibernate does, would be more efficient/cleaner, etc. at the byte code level.  Another example is the ability to support certain language features that are supported on the JVM but not yet available in Java (shameless plug for JRuby). 

In the end, I was happy, but somewhat bummed because I thought there would be more about performance tuning.  Though, I was elated to find out that there was a companion session the following day that was going to address it in more detail.  Wohoo!

JVM JIT For Dummies, by Charles Nutter

Nutter, again, offered a wealth of deep understanding about how the JVM works.  Beforehand, I was familiar with the basics of profiling and optimizations, but almost everything was brand-new for me.

There are a number of things that javac will do to unpackage syntactic sugar and other high-level expressiveness where it can from a static analysis standpoint.  A common case is loop unrolling where if it is statically apparent that the loop will always run x times, and the source of the index is resolvable statically (like int i = 0, etc.), then the compiler can just remove the for loop and replace it with inline statements.

The JIT does many of the same things and more via profiling.  Once a given method has been run over 10000 times, then the JIT considers that method as hot.  At that point, it will start looking at some of the common optimizations that it can do based on which branches in the code are being executed.  For example, if after 10000 times a certain if condition has never been hit, then the JVM can optimize it away, inserting guard logic to allow an optimization rollback just in case.

I won't go into a lot of detail here (though it was very intriguing), but here are some of the things that the JIT looks for:
  • Loop Unrolling:  Observing that a loop always completes in a known number of times and unrolling it into inline statements accordingly
  • Lock Eliding:  Observing locks where synchronization isn't adding benefit and removing them.
  • Escape Analysis:  Observing where certain objects aren't necessary and removing that outlining structure
  • Call Site Analysis:  Observing monomorphic or bimorphic call sites and performing an according optimization.  Briefly, monomorphic means that A calls B and A will only ever call B.  Bimorphic means that A could call B or C, e.g. two implementations of the same class.
The JIT will ignore code that is too big for it to analyse.  Boo to long methods! The really, really interesting thing is that the JIT can be encouraged to optimize more complicated stuff just by breaking the big method up into smaller methods.  Nutter told of a case with the JRuby parser where performance was starting to tank.  They learned that a main parsing method had gotten too big, so the JIT had stopped optimizing it.  Simply by breaking the parsing method into smaller methods, they got a big performance boost.

So, what is one to do if your performance is slow, and you'd like to see if JIT is able to optimize your code?

Monitoring the JVM

  • -XX:+PrintCompilation - This prints out methods as the JIT is optimizing them.  The output details what the JIT is doing with certain methods:
    • "made zombie" - This method is just about to be optimized out by the JIT
    • "made non entrant" - This method is now optimized out by the JIT
    • "uncommon trap" - Woah! I had already optimized this, but apparently someone just called it unexpectedly!
    • "!" - exception handling - Nutter didn't go into this, but the JIT apparently does some interesting optimization by finding the actual catch block up the call stack that is ultimately called when a given exception is thrown
    • "%" - on-stack replacement (OSR) - Sometimes the JIT will come up with an optimization for an entire method that it will then compile and replace at runtime.  I'm not sure when this is applicable, but maybe for things like chip-arch-specific implementations?
There are several options that are so secret that they need two JVM options.  This first is -XX:+UnlockDiagnosticVMOptions.  The seconds are below:
  • -XX:+PrintInlining - More detail about specific inlining that the JIT is doing.  e.g. "intrinsic" means the JIT knows something special about this method and is going to replace it with best-known native code to support it.  Examples are Object#hashCode, String#equals, Atomics, and Math operations.
  • -XX:+LogCompilation - A lot more information, but specifically information about whether a method is too big for the JIT to optimize.  Use something like http://github.com/headius/logc to make the output of this option more readable.
  • -XX:+MaxInlineSiz, -XX:+InlineSmallCode, -XX:FreqInlineSize, -XX:MaxInlineLevel, -XX:MaxRecursiveInlineLevel - Use these to tweak the default levels with the caveat that these are  set by the JVM guys after much research on your specific chip architecture.
  • -XX:+PrintOptoAssembly - Lots of detail, including the assembly to which the code is getting compiled to.  (Wow!)  Nutter demonstrated with this tool how much assembly goes into calling a single method.  The unoptimized assembly was nine instructions vs. the one instruction of just inlining the contents of the (one-line) method.  Here, Nutter also talked about two important outputs:  
    • CALL - This means that the JIT cannot find a way to optimize the method
    • LOCK - This means that the JIT is performing a lock in this part of the code.  This was an interesting one because Nutter explained that at one point, he was seeing an enormous amount of LOCK statements during object construction.  It turns out that there was a private static reference that was being accessed in all JRuby constructors, which was causing a volatile write on every construction.  Removing the line of code gave them 4x performance gains.  Wow!
There was sort of an ominous note at the end that said that any kind of agent can seriously affect the JITs ability to optimize.  Obviously this would include debuggers, but it also includes profiling products like Dynatrace or AppDynamics.  Something to look into.

Do your GC logs speak to you?, by Richard Warburton

This was a great overview of the Java memory model as well as several tips on how to evaluate how your Garbage Collection is performing.  Richard had so much information to disperse, and I'm not completely sure that I understood all of it, but here goes:

First, a few JVM parameters:
  • -Xloggc: {logfile} - verbose:gc doesn't come with timestamps, so use this parameter instead to get more useful information, and in a different file altogether
  • -XX:+PrintGCDetails - Lots more detail
  • -XX:+PrintSafePointStatistics - A safepoint is a line in the sand that the gc draws where all threads stop and wait for the gc to say it's okay to go again.  You want these at a minimum
And now, an overview of what the heap looks like.  There are basically four parts:  Eden space, S0 (survivor space), S1 (survivor space), and Tenured space.  The first three can be together called "young memory" and the last one can be called "old memory".

When a GC runs, objects age.  Eden is completely evacuated on every GC (see the metaphor?) and is promoted to one of the two survivor spaces, whichever one is inactive.  The active survivor space is completely evacuated to Tenured space, and the inactive survivor space is changed to be the active one.  All evacuated spaces are empty after each GC, and once it is in Tenured, a Full GC is required to get it out.

Because it requires a full GC to recover tenured memory, it is best to make sure that not too much is going into tenured.  A full GC will run once tenured is about 69% full, so it is good to try and keep it under that number.  Other indicators are a spiky CPU time graph, an average GC pause > 5% or a full GC-to-GC ratio of >30%.  These numbers can be garnered from the VisualVM product on java.net.

To tune the VM, then, it is typically a matter of making sure that Eden is big enough, the Eden-to-Survivor space ratio is right (to ensure the survivor spaces are big enough), and the max tenuring threshold is the right value (usually sits around 4).
  • -XX:NewRatio=x - The size of the Young memory vs. Old Memory.  Young memory will be 1/x the size of Old memory.  Richard recommended 1, though it probably has a lot to do with what you observe in the data.
  • -XX:SurvivorRatio=x - The size of the survivor spaces relative to eden space.  Each survivor space will be 1/(x+2) of the memory allocated in Young memory.  Richard recommended 1 here as well with the same caveat.
  • -XX:MaxTenuringThreshold=x - The number of collections that an object can survive before it is automatically promoted to tenured space.  You can lower this to make GCs less frequent since the JVM will be able to promote more memory to Tenured more quickly.  Raise it to keep things in Young memory for longer.  In Richard's case study, he set this to 5.
Richard's case study showed a dramatic improvement in GC pause times, GC ratios, etc.  I definitely want to try these out!

There were a couple of interesting notes at the end about GCing in general:
  • concurrent mode failure means that the Young memory is filling up two fast.  The concurrently running tenured collection (full GC) failed to complete before tenured was completely full.
  • mark/sweep doesn't do memory compaction, meaning that the data can get fragmented
  • slab allocators are a way to allocate a distinct amount of memory on the Java heap.  One strategy against memory fragmentation if your data set is very large is to create several slab allocators of varying sizes at once so they are adjacent in memory and then draw from those memory allocations.  Richard warned that these will cause you to develop many of your own GC semantics (a later talk mentioned that this is what memcached does)

Big RAM, by Neil Ferguson

This was basically a case study on alternatives to using out-of-the-box JVM garbage collection for huge applications (250GB-1TB RAM).

The first approach was to use an alternative Garbage Collector like Azul Zing.  For Ferguson's benchmark, Azul performed better than G1, the new garbage collector from Java.

The second approach was to use the Java APIs that allow you to allocate memory off-heap:  ByteBuffer.allocateDirect and SlabAllocators

The third approach was to pick from a list of off-heap vendors.  Ferguson talked about Cassandra, Terracotta, BigCache, and Apache DirectMemory.

This was really interesting, but I'll be honest that this was the last session of the day, and I wasn't paying really close attention.  I'll have to re-watch it.  :)

Summary

All in all, these were really motivating, largely because it is a new space that I know (knew?) nearly zero about.  I look forward to trying a few of these back at work when I get the chance.

0 comments:

What I learned at Java One, Part I - Security

10:31 PM 2 Comments

This week, I had the opportunity to attend my first JavaOne conference.  In a word:  Firehose.  Over the next few days, I'm going to try and summarize by topic the incredible amount of information I was given.  This is the first in the series.

There were a lot more talks on security than I expected, and I went to several of them (a couple of them to my dismay).

Cloud Identity Management, by Anil Saldhana

Possibly the most interesting thing that I got from Anil's talk was his by-the-way-reference to PicketLink, a JBoss Identity Federation server.  I took a look after the session, and found what looks like a very promising application.  We currently license a commercial, closed-source product, and PicketLink appears to have a great deal of what we would need in order to switch:
  • Management Console for configuring SPs, IDPs, Certificates, Attributes, etc.
  • Support for SAML 2.0, 1.1, and WS-Trust
  • Support for Username/Password auth (via PicketBox APIs)
Plus, the added bonus of future support for OAuth, which we are not using yet, but plan to in the future.

It really is tricky, though, to find one product that does everything I'd like it to.  For example, I like the approach of an Authentication Reverse Proxy, kind of like what Oracle WAM does where it authenticates the user and then passes along HTTP headers that specify the user's identity down to the proxied application.  I'd also like it to have some support for Mobile Devices.  Maybe a REST service that takes the device's UDID, username, and password, and then UDID thereafter.  Or make it easy for me to add a service like that myself.

Anyway, I digress.  Anil mostly talked about IdM 101 type stuff, so I didn't learn a whole bunch in that regard, but the product he referred to really got me thinking.

Protecting Machine-Only Processed Data, by Noam Camiel

I didn't realize that I was stepping into a vender session until I sat down.  I didn't want to be impolite, so I figured I would stick around to hear what he had to say.

Basically, he has these innovative black boxes that have no external dependencies and contain within them secure, encrypted information, like passwords, certs, etc.  Any operation where access to those values is needed, like password comparison, is done in the box as well.  Noam's concept is that the secure data goes in, but it never comes out (other than for replication in aggregate).

Cool idea.  Worth checking into more.

IdM Expert Panel, Ludovic Poitou, Matt Hardin, Petr Jakobi, and Shawn McKinney

I was really excited about this one, but it turned out to be a facade for a bunch of vendors to brag about their products.  Ugh.  Then afterwards, they completely shut down one of the attendees (Anil from JBoss, no less) basically saying that the business requirement he gave them was not their problem.  McKinney seemed to really want to control the narrative.  Double ugh.  Sounds like the panel basically knows what they want to build already, and they are going to do it...

Anyway, each expert had their own product that they were pulling for, and a couple sounded compelling.  I downloaded OpenAM (Poitou) and read through a tutorial; I was excited to hear that it was a continuation of the OpenSSO product.  I also perused the Fortress site (McKinney) reckless reserve.  The identity space is definitely one where open source is far behind the commercial folks; we'll see if we can catch up.

Securing Apps with HTTP Headers, by Frank Kim

I was excited to hear from this guy as I had taken a SANS certification class from him in the past, and I knew that he was a good teacher.  I was not disappointed.  I thought I knew something about securing web applications, but I was wrong.  :)

Kim talked about three kinds of attacks that engineers can defend against simply by using certain HTTP headers in the server response.

XSS

XSS is when a hacker finds a way to execute arbitrary javascript on your website.  By executing arbitrary Javascript, the hacker can steal session cookies, initiate Cross-Site Request Forgery, and a host of other nasty things.  Kim mentioned three headers here to prevent XSS.

1.  HttpOnly flag

This is one that we already use in our company for sensitive cookies like session cookies.  HttpOnly is a flag that you place in your Set-Cookie header whenever you are sending a cookie down from the server.  This flag makes so that Javascript cannot read it, which will prevent an XSS attack from being about to steal it.

2.  X-XSS-Protection

This header is one that notifies the browser to detect reflective XSS attacks.  (I just fixed one of these the other day!) These are attacks where a given http parameter's contents are written to the page without any intermediate evaluation or encoding.  It turns out that the latest browser versions have this turned on by default (wohoo!), so you should already be benefiting from it.

Set it to "0" to turn it off (not recommended).  "1" means to just not render the bad part of the page if it detects reflective XSS.  "1;mode=block" means to render none of the page if it detects reflective XSS.

3.  Content Security Policy, X-Content-Security-Policy, X-Webkit-CSP

This is cool.  This tag allows you to specify hefty restrictions on how the browser will process javascript and stylesheets.  At its most secure, it will not render any inline javascripts or styles on the page! There are all kinds of directives that allow you to tweak it to your needs including where resources like scripts, stylesheets, images, and fonts can come from.

Because this one can have such a dramatic effect on a large website with millions of lines of code invested, there is also the X-Content-Security-Policy-Report-Only header, which does the same thing, except it only reports the violations to a specified uri instead of not rendering that content.

Session Hijacking

Session Hijacking is when a hacker is able to sniff your connection, like on a public wifi, and steal your session cookie.  The defense here comes in two parts.

First, setting the Secure flag on your session cookie.  This is like setting the HttpOnly flag on the Set-Cookie header.  The Secure flag means that the cookie will only be sent on HTTPS requests, but not for HTTP requests.

Apparently, there is a program out there written by Moxie Marlinspike (who wouldn't want to have that name?) called sslstrip which can strip off the SSL in a request (not quite sure how that works, but I will be watching this video about sslstrip to get a better understanding).  So, a second defense is needed called Strict-Transport-Security.

This second header mandates that all traffic for a website, regardless of the protocol specified in the link, must be HTTPS.  The format is like this:

Strict-Transport-Security:  max-age=seconds[; includeSubdomains]

This header must be specified over a legitimately-certified HTTPS response and thereafter (until max-age expires) the browser will send all requires for that domain in HTTPS.

Clickjacking

The concept of clickjacking had to sink in for a bit before I understood it.  The idea is that there is a button that you would like an individual to click on, say your Like button on Facebook.  On an unprotected site, you and trick users into clicking your button by setting another button underneath it and making the desired button transparent.

This is done with iframes.  The site that contains the desired button (let's say Facebook in this case), is referenced in an iframe on the attacker's site.  The attacker's site looks like something completely different, like maybe a signup form for something which has a button at the end.  So, the attacker has his bogus page load with an iframe into Facebook.  He makes the Facebook frame transparent and positions it in just the right spot so that where you click on the bogus page, it actually clicks the button in the invisible iframe!

Makes sense? It didn't to me for a bit.   Anyway, the defense is another header, X-Frame-Options.

X-Frame-Options is a header to indicate what sites are allowed to have your site inside their iframe.  It can take values DENY, SAMEORIGIN, or ALLOW-FROM.
  • DENY means that no one can put your site in an iframe
  • SAMEORIGIN means that sites from the same domain can put your site in an iframe (recommended)
  • ALLOW-FROM whitelist means that sites from the whitelist can put your site in an iframe
Phew! That's a lot of content.  Kim had more to say about a lot more, but I'll probably have to leave that to another day.

You are Hacked, by Karthik Shyamsunder and Phani Pattapu

I'll be brief on this one.  I was very excited to go to this one because I thought it was going to be a narrative of network forensics, guerilla warfare, and the like.  Sadly, it was two very smart individuals charging through a huge, dry slide deck about the standard JEE 6 security model.  Oi!  Afterwards, I went up to Phani and told him that his presentation inadvertently explained very well why people ought to just use Spring Security!

New Security Features in JDK 8+, by Jeffrey Nisewanger and Brad Whetmore

At this one I found out that the JKS keystores aren't encrypted! Apparently, Java has another kind of keystore called JCEKS that is encrypted.

There are a bunch of things that are slated for JDK 8.  Here are just a few:
  • AEAD cert support.  (AEAD extended certs are certs which are apparently harder to acquire and thus more trusted than regular certs.)
  • doPrivileged is going to be changed to allow code to only assert certain privileges instead of all of them at once.
  • PCKS#11 API spec
  • Better support for PCKS#12 keystores
Awesome.

Cross-Build Injection Attacks, by Sander Mak

This session creeped me out a bit.  What Mak did was create a simple "Hello, World!" application and then built it with the typical "mvn clean compile".  He ran the class and instead of printing the familiar "Hello, World!" it said "You've been p0wned at JavaOne!"  What a neat trick! How did he do it?

It turns out that he purposely corrupted his local maven repository with a poisoned maven-compiler-plugin, which performed the compilation trick.  Then, he posed the questions:
  • What if someone compromised the central maven repository and uploaded a poisoned version of some broadly-used dependency?
  • What if someone hacked your dns to make the central repo url point to their own hacker repository?
  • What if someone stood up a proxy in between you and your connection to the central maven repo and replaced the jar in flight to your local machine?
Honestly, these were questions I'd never thought about before, but in that moment I understood why we have an internal repository at work.  To this day, I'd thought it was simply for performance, but it now made sense that it was necessary for security reasons.

He emphasized that there are three defenses that should be applied in sequence to guard against hacks like this:

Have an internal Maven Repo with no automatic mirroring

This one is pretty simple.  While automatic mirroring is very convenient, it sets you up for a problem should the central maven repo ever get compromised.

Verify PGP signatures

As of three years ago, new jars coming into Maven Central are required to be signed with a PGP certificate and be published with a .asc signature file.  The public key is loaded to http://pgp.mit.edu where Repository Managers can verify the signature against its public certificate.

It takes a few steps to do this manually, but apparently Sonatype offers automatic PGP signature verification as part of its paid edition.

Enter into a Web of Trust

PGP signatures aren't verified with certificates issued by certificate authorities like other protocols.  Instead, there is a Web of Trust where you specify the people who you trust in the web.  These people (you included) indicate which signatures you trust.  This web of trust is overlayed onto the key repository so that you can verify the signature with a public key that is trusted by at least one person that you have already specified that you trust.

Wow.  Security doesn't come easily does it?

Unfortunately, this got me thinking about things like the singularity and the possibility that my consciousness could one day be digitized and uploaded into the brain of another person.  How's that for corrupting a central repository?

Security in the Real World, by Ryan Sciampacone

It is hard to say whether this talk or Kim's talk was my favorite.  Sciampacone's was more fascinating to see the ingenuity of hackers, but I walked away from Kim's with more tools in my pocket to use.

Anyway, I'm getting really tired of typing, so here are the four vulnerabilities that he highlighted:

Hashcode DoS Attack

The basic idea is that it is trivial to come up with an infinite number of very long strings that hash to the same value.  The hack is to use this information to create very big hashmap keys that are all then inserted into the same hashcode bucket, creating a lot of CPU cycles for the server and more than effectively bringing it down.  Hackers could very easily issue this attack by sending in a long parameter list to a servlet, like this

http://yourjavasite.com/context/page.jsp?param1=BlahBlahBlah..1 MB worth of charaters..Aa&param2=BlahBlahBlah..1 MB worth of characters..BB&...16000 parameters

Such a url would, in the past, take any Java servlet container down.

There were a few interesting take-aways from this.

The first interesting takeaway was the defense mechanism that was put into the JDK.  A random value is now inserted as part of the hash code to make sure that it is much harder to guess what parameters will hash to the same value.  The problem with this is that there is existing code all over the world that relies on hashmap#keySet returning the keys in the same order (even though the spec tells you not to).  If the Java engineers had simply introduced this random value for all hashcodes, a great deal of code across the world would have broken.   Because of this, the random value is only inserted for very high large keys.  This value can be tweaked the the command-line property -Djdk.map.althashing.threshold=x

The second is the defense mechanism for Tomcat.  Apparently, Tomcat originally had no limit to the number of parameters that it would take in from the request.  This meant that any sufficiently long url could take down a tomcat server regardless of the hashcode part of things.  So, at the same time, they set the limit to 10000 parameters and introduced a property for the web.xml called maxParameterCount.  Can you imagine a legitimate case where 10000 parameters are needed??? 100 is probably more reasonable.

Gondvv Vulnerability

The idea here is that it was temporarily possible in Java applets to use sun.awt.SunToolkit to call a public getField method that returned a read/write handle to any arbitrary method in any class on the classpath.  Eww...  The hacker would get his code deployed into an applet, use this trick to get the SeucrityManager, change it to grant untold access to the user's computer, set the SecurityManager, and boom the hacker could run arbitrary programs on the user's computer.  Wow!

Apparently this was fixed pretty quickly.  The lesson here was to make sure you know what you are returning from any method, especially if it is public.

Invokespecial Security Fix

This one is a bit more theorhetical, but it was apparently once the case that the BytecodeVerifier did not enforce the rule that you can't skip a parent's constructor in the construction process.  The compiler will stop you if you do that, but if you generate a class file that skips your parent's constructor, other versions of the JDK will actually allow it.

The idea, then, is that if your parent is setting some security roles or something and you are able to arbitrarily skip that construction process, you may be able to gain unauthorized access to the trusted portions of your parent's business logic.

MethodHandles

Another theorhetical but very tangible place for security holes are MethodHandles.  javax.lang.reflect.Method is a class that represents a method that can be invoked on a Java object.  Each time the invoke() method is called, though, the SecurityManager runs to make sure that you have access to do so.  MethodHandles are faster than Methods in part because they only run the SecurityManager when you are first getting a reference to the MethodHandle.  After that, you have the keys to the kingdom.  This means that if you get a MethodHandle reference and you return it from your API, either "wo be unto you" or "you best be sure you know what you are doing!"

Secure Coding Guidelines for Java, by Marc Shoenfeld

This was my last security class that I went to, and it nearly made me cry.  It was basically a person who had taken the table of contents from a university textbook, put them on some slides, and read them.  Oh.  My.  Goodness.  That was the first time I walked out from a presentation.

I did learn about the CVSS vulnerability scale in the first three minutes, so I suppose it wasn't a complete waste.

Summary

Wow! I hadn't realized how many security talks I'd gone to until I tried writing it all down.  I can't believe you made it all the way to the bottom! You should be doing something productive.  Like hacking into the university's transcript database and changing your grades or something!

2 comments: