piątek, 13 października 2017

Notes on Java (hash) maps

Maps

Maps are one the most common data structures, used everyday by many programmers. In Java, the most popular implementation is HashMap, which uses open hashing (chaining) over an array to achieve constant-time (o(1)) performance of get/put operations (barring degenerated situation - for example when hashCode() call returns constant number).

Java HashMap

There are few things concerning this topic that should be interesting even if you're not that into algorithms / data structures ;-)

First of all, the underlying implementation and possible performance benefits. As said, Java HashMap is an open hashing one, which means that entries with the same bucket (array index calculated from hash) are put on a list. Therefore, in order to find a value for a particular key, the list needs to be iterated. This is of course o(n) operation (where n means number of entries in the list).

Java 8, however, implements a nice solution for this problem. According to this blog: http://www.nurkiewicz.com/2014/04/hashmap-performance-improvements-in.html Java switches bucket implementation from list to a balanced tree after a specific threshold is met (TREEIFY_THRESHOLD = 8 entries as of current). Thus, worst case performance improves from o(n) to o(log n). What is particularly interesting is balancing of the tree. For entries with different hashes, the hash value is used to compare entries. What if hash values are equal? Well, in this case the implementation hopes that keys implement Comparable interface. If this doesn't hold true - well, tree will be linearised, so in case of heavy collisions no performance benefits should be expected.

Robin Hood (RH) hashing 

As already mentioned, Java stdlib implementation uses open hashing approach. There is however a competing closed hashing (open addressing) approach. Closed hashing algorithm in case of collision calculate new hash (and array index) till free spot is found. This obviously means a performance penalty in high-collision scenarios as compared to open hashing in which the entry would be simply put at the end of the list. For get it's similar as the algorithm needs to go one-by-one comparing keys. Performance gets of course worse with higher loads of the underlying table.

I'd like to show here a neat idea how to improve over typical linear search, called Robin Hood hashing. The idea is quite old - published for the first time in 1986 (paper here: https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf). Why such name? Well, the idea is basically that some entries will get a free (insertion) slot sooner (with lower number of tries) than others, due to their "better" hash code value. Therefore, the algoritthm can put entries with "worse" hash codes closer (and move "better" to further spots). Thus, it's like taking from "richer" and giving to "poorer" :-)

Implementation itself is really straightforward - for each entry probe length (distance from originally calculated and current position) is kept. In case of collision, if probe length of the new entry is bigger than probe length of the current one, the new one is inserted (as poorer) and insert continues with the previously inserted one. If probe length of the new one is smaller, new index is calculated (and probe length for that entry increases). Therefore prob lengths will gradually even out.
According to this article https://www.sebastiansylvan.com/post/robin-hood-hashing-should-be-your-default-hash-table-implementation/ Robin Hood is faster as for current architectures it will has less cache misses (due to lower probe count variance). There were some attempts to measure the performance benefits of such approach. One of the best can be found here - http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion. Up to date, only Rust language implements Robin Hood as the standard hash map.

Key takeaways


  • Java hash map implementation is very fast, in Java 8 even faster (balanced tree)
  • it's worthwhile to implement Comparable for keys used in hash map 
  • Robin Hood hashing is an interesting idea that's worth trying, especially in case of memory constraints (high load factors - 0,9 or even 0,95 without big performance hit) or other non-functional requirements (seems reasonable for disc storage due to linear access pattern) 

Further reading


piątek, 15 września 2017

Java 8 continued - cheat sheet ;)

Continuing the topic of Java 8. Folks at Zeroturnaround (authors of JRebel) wrote a nice article regarding:

  • default methods in interfaces
  • best practices with lambdas usage
  • proper Optional usage
or
download/print the cheat sheet in PDF: 
 http://files.zeroturnaround.com/pdf/zt_java8_best_practices.pdf

Enjoy! :-) 

czwartek, 14 września 2017

Java 8 for late-adopters - Venkat's 5 cents

Some time ago I wrote a short post regarding some Java 8 features that may be interesting for current pre-8 developers and may motivate them to switch to the latest version. It was really basic, more like teaser than tutorial.

As one of the projects I'm in is currently moving to Java 8, I recently did a presentation on the topic. During research I found out a number of great tutorials - ranging from simple introductions to deep-in-details encyclopedias. One, however, caught my eye.

Venkat's "Java 8 programming idioms" series, published here: http://blog.agiledeveloper.com/2017/09/java-8-programming-idioms-series-at-ibm.html Touches mainly functional side of the Java 8. It can particularly handy, as I see that some developers that up to date lived in an object-oriented programming world have a hard time switching to functional style of thinking. Enjoy!

wtorek, 25 lipca 2017

[memo] hibernate's import.sql

There is a neat, heavily undocumented feature oh Hibernate - import.sql.

If Hibernate creates DB schema from scratch - "hibernate.hbm2ddl.auto" is set to either "create" or "create-drop", it will load "import.sql" file located in the root classpath. If Spring Boot is used, the property controlling the behaviour is "spring.jpa.hibernate.ddl-auto".

But why would anyone like to import some static data in a webapp? Well, for prototyping purposes for example. Or for "static" webapp mocking some external resources for integration testing. 

poniedziałek, 24 lipca 2017

Continuous Integration for (non)hipsters

Continuous Integration? Soooo ooold, so ninetees. Not hot anymore, not trendy. Currently even Continuous Delivery sounds like something regular, typical, borig. Everybody does CI... or do they?

In order to answer this question, let's refer to the excellent article, published by Martin Fowler here. Headline states:
Continuous Integration is a software development practice where members of a team integrate their work frequently (...). Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible.
That's it.
Code (yes, this includes tests!)
Commit.
Pull/push (integrate).
Let something (a CI server for instance) run tests and confirm that everything is in order

Four easy steps. However, the devil is in the detail.

First of all, the steps should be executed at least once a day, preferably multiple times a day by each team member. This way, any integration issues will be discovered on the spot. If your team has a habit of keeping private branches for a long time, without source integration and pushes (triggering a full CI build), well, you're not doing CI! If you use feature branches, which are a good thing, keep them public, so that CI server can watch them during development time.

Second of all, tests are the cornerstone of CI. Usually not all tests are run during build - in order to make it faster, only quick, small unit ones. If project build isn't fast, developers won't build before pushing changes (we're all always super busy, aren't we?). Therefore all expensive tests - especially integration ones - should be left for CI server. It's a general rule for a bigger builds. This way if an issue fits into this 1% not covered by unit tests, it should be found by the end of the day. By the way it's really a good habit to have a blame mails enabled in the CI server. The name is a bit misleading - it's not about the blame, it's all about fixing broken build (or bad tests!) as soon as possible.

All of what was above can be found in the aforementioned Martin Fowler's article - in a more elaborate way. Yes, it's a quite old one, but still very, very relevant.

czwartek, 13 lipca 2017

[memo] purging broken JARs from Maven repository

Ever encountered dreaded "java.util.zip.ZipException: invalid LOC header (bad signature)" while building a Maven project? If yes, it means that Maven (somehow) downloaded a broken JAR into your repository. There is not much that can be done besides purging all of the "broken" JARs from the repository.

Two options are possible:

  • delete whole Maven repository - which means downloading all of the JARs, of all projects (possibly few gigs) again (!!)
  • delete only offending JARs - list can be obtained for example by issuing following command:
find  /home/me/.m2/repository/ -name "*jar" | xargs -L 1 zip -T | grep error | grep invalid

 Both options, well, suck :-) Fortunately there is a way to purge all of the JARs from local repository that pertain to only one project. It's provided by the dependency Maven plugin. Just issue
mvn dependency:purge-local-repository
and it's done. NEAT!

poniedziałek, 12 czerwca 2017

Hazelcast - CLI access

Ever wondered how to quickly take a peek into your Hazelcast structures, for example on a UAT / CIT environment? Look no further! :-) I've recently had a similar task. Debugging an issue on a test environment I reckoned it might be caused by a faulty data in the Hazelcast cluster. Fortunately we can use what comes with a Java application using Hazelcast (in a client mode) - class ClientTestApp.

First of all we're going to need two Hazelcast JARs (here - version 3.2.5):

  • hazelcast-client-3.2.5.jar
  • hazelcast-3.2.5.jar

Both JARs should be copied into empty directory, let's call it "lib" for this demo purposes. Next, a shell script used to run the test app needs to be created:


In order to have the client connect with your cluster, a configuration needs to be provided as well. The file needs to be named hazelcast-client.xml and reside in the same directory as the run.sh script. Sample content looks as follows:


After running the client, you should get a Hazelcast console able to query / manage the cluster. Full manual of the CLI is here: http://docs.hazelcast.org/docs/2.3/manual/html/ch17s09.html

Enjoy querying your data! 

środa, 31 maja 2017

GeeCon part two

A little late (almost two weeks!) but finally completed. Second day of GeeCon (https://www.geecon.org/). How was it? Continue reading to find out.

Improving Java EE with reactive - Ondrej Mihalyi

The presenter started with an outlook of what "reactive" in terms of a web application really means. The idea he focused on was that in order to improve the latency, execution of a request should not block - never, even on I/O. Also, in order to utilise the resources (CPU) as efficiently as possible, there should be ideally as many threads running as there are cores in the system. In a typical, J2EE based application, there goals are impossible to achieve, as each request is served completely by a separate thread which blocks if there is eg a dependent service or a DB call. There are however libraries that can help to make such application more "reactive". In his demo application ("cargo"), Ondrej showed how to use built-in Java 8 CompletableFuture or external RxJava (https://github.com/ReactiveX/RxJava) observables in order to avoid blocking the execution thread. He also mentioned (although he was short on time) that the UI part should be improved as well by using technologies like WebSockets or CDI events.
Whole presentation was quite good content-wise. It was nice to learn how a performance can be improved in an existing application by refactoring it to use some "reactive" stuff. The presenter could improve a bit on a technical side. Keeping an eye on the clock and minimising "live coding" favouring ready, refactored examples would make whole presentation better from reception point of view.

Reactive Spring - Josh Long 

START. DOT SPRING. DOT IO. I could write that this was the second "reactive" presentation, taking on the topic from new Spring library perspective. But no, this was not an ordinary presentation. This was a show. A Josh show. The topic itseflf is very interesting and the Spring implementations are interesting (and impressive) as well. Reactive versions of repositories (Mongo DB implementation here), Reactive Streams (Subscriber, Publisher, Processor) and Reactor (Flux, Mono) concepts, Web Sockets or Server Sent Events for publishing. They even have (in the upcoming release) a reactive, non-blocking security! I had however a strong impression that Josh would be able to make an interesting and involving presentation on any topic... even PHP ;-) Besides strong interest in the Spring reactive what took out of the presentation is - START. DOT SPRING. DOT IO! :-D

Microservices - stages of maturity - Jarosław Pałka / Jakub Marchwicki

The multi-threaded presentation run on two cores in parallel ;-) It was a mixed one - a bit about the main topic (microservices), a bit about a working in a tech organisation. The microservices-oriented part dealt a bit with such principles as being event-driven (publish, publish, publish!) and asynchronous, even in terms of application logging. Also, the Customer Driven Contracts for better APIs were mentioned. The message from the presenters here was that many of the microservices world ideas can be applied to the world of monoliths. The organisation-oriented part was, from my point of view, much more interesting. Jarek and Jakub said many things that may seem obvious, but need to be stated aloud from time to time, just to remind everyone about them. Amon others, the words of wisdom I remembered from the presentation were:
- legacy products - usually nobody to work with these, but usually these are the ones that bring MONEY!
- 3rd Newtonian law of management - if you push people, they will push back even harder
- ignoring the infrastructure is a wide road to failure
- developers should be assigned responsibilities, not roles
- during development focus should be on building a resilient system, not a perfect one
- in general, it's better to fix what's failing than on blaming others (famous "witch hunt")
- never ever do any manual changes on production - if you need to, your product is crap ;-)
Overall, I liked the presentation, especially that my work experience results in similar conclusions as voiced by Jarek and Jakub.

Consumer-Driven Contracts to enable API evolution - Marcin Grzejszczak

Consumer-Driven Contracts is the pattern to enable service evolution, as described by Martin Fowler (https://martinfowler.com/articles/consumerDrivenContracts.html). In his presentation Marcin focused on how CDC can be enabled for a web application using Spring project called Spring Cloud Contract (http://cloud.spring.io/spring-cloud-contract/spring-cloud-contract.html). The idea behind is quite simple - allow writing Contract on an API using a statically typed Groovy DSL, that can be later on automagically converted into stubs used for (client) intergration testing (in a most hyped microservices architecture ;) ). As an added bonus, the same tool will generate (server) acceptance tests out of the same Contracts.
To make long story short - easier decoupling of services for testing and less boilerplate code provided by one nice tool. I like that! :-)


To sum up - this year's edition of GeeCon was both entertaining and educational. I learned few new things, I met some friends from good old days, I ate some cookies. One of the most important aspects for me was an opportunity to see how different things are done in different companies. Having a broader overview is never a bad thing!

czwartek, 18 maja 2017

GeeCon has arrived! Day one

GeeCon 2017 is finally here!. Packed with presentations, ranging from higher management talking about how organisations work in a big picture to a "geeky" CPU internals. Everything in a Multikino, providing what's needed for a presenter (and attendees) spiced up with a really good catering. 

Here's my very short report on what I saw today.

Keynote - David Moore

A very interesting presentation, especially taking into account that usually higher management talks are a bit.. boring (so to speak ;-) ). He talked about approaching refactoring / replatforming on an organisational level. Most of his observations were exactly aligned with my experiences, both concerning things that work and things that don't.
What is worth remembering is Convay's law - "products mirrors organisations". Tech organisation with bad communication or bad structure will always create lousy products. Period! Leaders of any sort should also take with themselves the statement that their role is usually to hire great people, trust them, let them loose and get the hell out of the way ;-)

Caching - Michael Plod 

General introduction to caching in typical business web applications, that is applications that can't allow "eventual consistency". Some basic concepts (local cache, clustered, distributed, local off-heap), some examples, some real-world cases (issues). Overally, nice presentation from both technical (way the presentation was delivered) and meritorious point of view. Some (salut, Grzegorz!) would say that nothing new / leading edge was presented, but in my opinion it was worth attending anyways. Perhaps the title - "best practices and gotchas" suggested something different, that is much more detailed, technical presentation? I personally especially liked that whole speech was focused around what basic questions should be asked when introducing a cache into a system and how these should be answered, step by step.
Last but not least - quoting the Author, remember that a cache will not solve performance issues of an application - first optimise, than introduce local cache, then eventually a distributed one (to tackle scalability issues etc.). And never ever implement your own cache! :-)

Domain Driven Design - Cyrille Martaire

Brief introduction to DDD, mixed with funny cats and some jokes (5 minutes into presentation - we're done, thank you!). Basic concepts (ubiquitous language, Value Objects, Bounded Context). Some examples to help the audience visualise the ideas behind DDD. From my perspective the presentation lacked a clear example, comparing small application (MVP) written with and without DDD. Such exercise would be very helpful to see a value that can be brought by introducing DDD and which problems can be solved. Then it would be much easier to decide if benefits outweigh additional costs.

How SCRUM moved away from developers - Matthew Brylka

By reading a title - and just a title - I understood that the presentation will be a critic of SCRUM implementations in many companies, where the process no longer helps deliver value but s rather a good excuse for added paperwork loved by all sorts of bureaucrats. What I got instead was mostly overview of SCRUM principles (with some emphasis on what is not there) along with particularly interesting "how to succeed" section. Two thoughts from the presentation to be remembered:
  • Use agreements with stakeholders instead of contracts / demands
  • Technical excellence, debt backlog and refactoring are constant practice - every manager should understand this, instead of pushing refactoring further, deprioritising it or asking for a separate, estimated task (that could be saved in the backlog forever)
After this presentation I got an impression that besides developers also line managers and allmighty architects of all sorts should attend the GeeCon conference! ;-)

Developer Plantations - Wojciech Seliga

Writing code (aka "coding") is becoming a commodity. Soon everyone after few months of training will be able to write *something*, just as now literacy is common compared to XIX century. In order to stay ahead of the crowd, we (meaning - skilled engineers) need to understand why is a product created, besides knowing how and what it is. These are main thesis for the future of our industry. A bit of a "lifestyle" presentation, without any technical aspects. Worth listening to just for a glimpse of a different perspective (different than another tool, library or architecture).


To sum up - an interesting day. On day two I plan to attend some more "technical" presentations. Stay tuned!

piątek, 5 maja 2017

[memo] Accessing files in a web application

Sometimes the easiest things are also the easiest to miss (or to forget :-) ). Therefore, in order to have this once and for all in one place - the quickest guide how to access files in a servlet environment (Spring Boot application in my case).

1. Classpath resource

A file that is present in a classpath (typically /WEB-INF/classes, /WEB-INF/lib).
Can be anything, most probably a configuration file that is packed with WAR during package preparation stage. In case of good old Maven - all of the resources (by default src/main/resources) will end up in the classpath. Access - via context class loader.
For example:

2. Web resource 

All of the web application files - including static content (images, styles, html, ...) and files that are loaded into classpath (JARs from /WEB-INF/lib etc.) can be accessed this way. The only requirement is the presence of a ServletContext. 
PRO TIP: In a Spring application, the ServletContext can be simply autowired into a service requiring it. 
For example: 
As a last word - be __very__ cautious when planning to use ServletContext.getRealPath(). Why? Just take a look here: http://stackoverflow.com/questions/12160639/what-does-servletcontext-getrealpath-mean-and-when-should-i-use-it

wtorek, 11 kwietnia 2017

Java 8 for late-adopters ;-)

I've just realised that Java 8 is already 3 years old! For a general-purpose language version, this happens to be a lot.

What's even more striking is that there are still a lot of projects that haven't been migrated to the newest (but already mature) version. There are in fact many projects that still use first "real" JRE, that is version 5 which introduced generics, enums, autoboxing, improved concurrency and all of the stuff nowadays recognised as a minimum standard.

Why is that? Well, sometimes is because cost of migration, especially for big, monolithic systems, is deemed to large. Sometimes that may be true. However, often the cost is overestimated or the benefits are underestimated. Therefore in this short post I'd like to present some Java 8 features that in my opinion can really bring substantial benefits and at the same time provide good arguments to justify the migration:

1. Lambdas aka functional programming for the masses.

Yes, we have Guava's Function. Yes, it's a bit of functional programming. But let's just compare this:
and this:

How cool is that ? And it's only a beginning! Java 8 added a real functional programming  API, as advanced as it can be taking into account object-oriented nature of the language and dreaded "backwards compatibility".
In order to fit some functional programming into Java, following concepts had to be introduced:
  • functional interfaces - basically a "type" for a "function"
  • new "->" notation, with following syntax rules:
    • types of the parameters are optional
    • parentheses around the parameter are optional if you have only one parameter
    • curly braces are optional (unless multiple statements are present)
    • return” keyword is optional in case of a single expression that returns a value
  • method references (with "::" operator)
It's really a lot of fun and a lot less of a boilerplate code. You simply need to try it for yourself :-) 

2. Streams and parallel collections

Basically streams (implementing Stream interface, who would guess?) are Iterators on steroids. Why on steroids? Because streams support:
  • out of the box parallel execution!
  • map / filter / reduce pattern
  • functional "iteration" as old friend - forEach method
  • primitive (Int / Double / Long) counterparts for even greater performance
Map / filter / reduce together with lambdas effectively eradicates the need for Guava's FluentIterable and consortes. Sorry! The standard library always wins!
Built-in parallelization and lazy evaluation brings the need to collect, join, group or partition. This is also provided. Nice! Just take a look at this example - joining User names:

3. Joda (well, almost) date / time

The only thing that can be said is at last! LocalTime, LocalDate, LocalDateTime and ZonedDateTime along with a reasonable API (copied from Joda :-) ). Plus fast-and-easy creation (now(), toInstant()). Especially the last one is nice, allowing for a fast migration:
 I wish this had been added years ago!

4. Misc (Optional, Base64 and others)

  • Everyone knows (or at least should know) the Optional class. Now available in the standard library! 
  • Everyone had at least once a need to use Base64. Now available in the standard library as well! 
  • Tuples! At last!
  • No more Permanent Generation (one JVM parameter less, heh)
Maybe it's not something that would change one's life, but definitely something that should have been in Java, like, since beginning.

Java 8 brings a lot. Really, in my opinion it's the first "REAL" update since version 5.0. There is much more than mentioned in this short post, I encourage everyone to go and try (and then try to persuade the management to migrate projects as well...).