Escolar Documentos
Profissional Documentos
Cultura Documentos
MAY/JUNE 2016
FROM
BIG DATA
TO INSIGHTS
14 20 26
APACHE SPARK: SPARK FOR 100 GB OF DATA
GETTING PERSONAL IN MEMORY
STARTED PROJECTS BUT OFF HEAP
ORACLE.COM/JAVAMAGAZINE
//table of contents /
20 26 30
USING SPARK BUILDING A BIG DATA BEST
AND BIG DATA FOR MASSIVE OFF-HEAP PRACTICES FOR
HOME PROJECTS DATA QUEUE JDBC AND JPA
By Nic Raboy By Peter Lawrey By Josh Juneau
Create a small personal How one company built Focus on the fundamentals
project using big data a data queue that scales so you’re not overwhelmed
pipelines. to more than 100 GB by large amounts of data.
14
APACHE SPARK 101
By Diana Carroll
Getting up to speed on the popular big data engine
COVER ART BY WES ROWELL
04 38 50 61
From the Editor Testing JVM Languages Cloud
With more companies leaving the busi- JUnit 5: A First Look Ceylon Language: Getting Onboard Oracle
ness and the survivors in an intense price By Mert Çalişkan Say More, More Clearly Java Cloud Service
war, is the model of free open source The long-awaited release of By Stéphane Épardaud By Harshad Oak
hosting sustainable? JUnit 5 is a complete redesign A low-ceremony, high-productivity A hands-on, step-by-step guide
with many useful additions. JVM language that integrates to trying out an enterprise cloud
06
45 easily with Java and also runs
Letters to the Editor on JavaScript VMs
37
Comments, questions, suggestions, New to Java User Groups
and kudos Understanding Generics 56 Barcelona JUG
09 By Michael Kölling Fix This 66
Use generics to increase By Simon Roberts
Events type safety and readability. Our latest code challenges Contact Us
Upcoming Java conferences and events Have a comment? Suggestion?
11 Want to submit an article
proposal? Here’s how.
Java Books
Review of Java Testing with Spock
01
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
EDITORIAL PUBLISHING
7 Billion
Editor in Chief Publisher
Andrew Binstock Jennifer Hamilton +1.650.506.3794
Managing Editor Associate Publisher and Audience
Claire Breen Development Director
Copy Editors Karin Kinnear +1.650.506.1985
Karen Perkins, Jim Donahue Audience Development Manager
Technical Reviewer Jennifer Kurtz
Stephen Chin
ADVERTISING SALES
DESIGN Sales Director
Senior Creative Director
Francisco G Delgadillo
Design Director
Richard Merchán
Senior Designer
Tom Cometa
Account Manager
Mark Makinney
Account Manager
Marcin Gamza
Devices Run Java
Arianna Pucherelli Advertising Sales Assistant
Designer Cindy Elhaj +1.626.396.9400 x 201
Jaime Ferrand Mailing-List Rentals
Senior Production Manager Contact your sales representative.
Sheila Brennan
Production Designer
Kathy Cygnarowicz RESOURCES ATMs, Smartcards, POS Terminals, Blu-ray Players,
Oracle Products
+1.800.367.8674 (US/Canada)
Oracle Services Set Top Boxes, Multifunction Printers, PCs, Servers,
+1.888.283.0591 (US)
Routers, Switches, Parking Meters, Smart Meters,
Lottery Systems, Airplane Systems, IoT Gateways,
ARTICLE SUBMISSION
If you are interested in submitting an article, please email the editors.
SUBSCRIPTION INFORMATION
Programmable Logic Controllers, Optical Sensors,
Subscriptions are complimentary for qualified individuals who complete the
subscription form. Wireless M2M Modules, Access Control Systems,
MAGAZINE CUSTOMER SERVICE
java@halldata.com Phone +1.847.763.9635
Medical Devices, Building Controls, Automobiles…
PRIVACY
Oracle Publishing allows sharing of its mailing list with selected third parties. If you prefer
that your mailing address or email address not be included in this program, contact
Customer Service.
Copyright © 2016, Oracle and/or its affiliates. All Rights Reserved. No part of this publication may be reprinted or otherwise
reproduced without permission from the editors. JAVA MAGAZINE IS PROVIDED ON AN “AS IS” BASIS. ORACLE EXPRESSLY
DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED. IN NO EVENT SHALL ORACLE BE LIABLE FOR ANY
DAMAGES OF ANY KIND ARISING FROM YOUR USE OF OR RELIANCE ON ANY INFORMATION PROVIDED HEREIN. Opinions
expressed by authors, editors, and interviewees—even if they are Oracle employees—do not necessarily reflect the views of Oracle.
The information is intended to outline our general product direction. It is intended for information purposes only, and may not
be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied
#1 Development Platform
upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s
products remains at the sole discretion of Oracle. Oracle and Java are registered trademarks of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective owners.
Java Magazine is published bimonthly and made available at no cost to qualified subscribers by
Oracle, 500 Oracle Parkway, MS OPL-3A, Redwood City, CA 94065-1600.
02
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//from the editor /
page 35), Cédric Beust wrote, “Since 2004, there has variable with the speciied type” to “here’s a new
been only one major update to annotations in the JDK: variable.” Yes, the irst approach potentially saves
JSR 308, which added more locations where annota- me a few keystrokes, but for whom are those key-
tions could be placed.” strokes really a problem? I’ve been coding in Java
Inside Java
and the JVM However, it’s worth knowing that Java 8 improves awhile and never thought (or heard anyone else say),
14 20 24 28 35 brevity by allowing multiple annotations of the same “Gee, I wish I didn’t have to deine the type of a local
UNDERSTANDING G1 AND INSIDE THE HOW JAVA PROCESSING
type to be written on an element without the need for variable—that would save me so much time and
WHAT THE JIT IS SHENANDOAH: JVM’S CODE COLLECTIONS ANNOTATIONS
DOING THE NEW GARBAGE CACHE BECAME LAZY
COLLECTORS
“What’s New in JPA: The Criteria API,” the code Also, many type names are shorter—sometimes
MARCH/APRIL 2016
signiicantly so—than HaydnSymphony. In fact, I
@NamedQueries({ suspect that the somewhat longish name was delib-
@NamedQuery(name=..., query=...), erately chosen to exaggerate the supposed beneits of
... })
the proposal. Plus there’s also the fact that something
like this is fairly common:
could drop @NamedQueries if NamedQuery was
repeatable, and simply use multiple @NamedQuery Symphony surprise = new HadynSymphony();
annotations directly on the class. The javadoc for
java.lang.reflect.AnnotatedElement has more Using the var approach here would not only elimi-
info for interested readers. nate useful information (the type I really care about
—Alex Buckley versus the implementation that was instantiated), but
Speciication Lead, Java Language and VM, Oracle all I’d get in exchange for the loss of readability is the
time I saved not typing a whopping ive characters.
var vs. val And that, in a nutshell, seems to be the problem with
Re: your editorial on JDK Enhancement Proposal 286 this proposal: the loss of code readability isn’t ofset
(March/April 2016, page 3), I don’t agree that by a comparable gain in the amount of time it takes
to create or maintain the code. It seems like a solu-
var surprise = new HadynSymphony();
tion in search of a problem and a desperate attempt to
ind some way to change Java rather than something
is better than
that will actually be beneicial for either beginning or
HadynSymphony surprise = new HadynSymphony(); experienced programmers.
Regarding val versus const, I have to disagree
In the example you provided, you didn’t really cut there, too. In all honesty, when I irst saw your val
down on the verbosity in a meaningful way; you just example, I wondered what it was that identiied it as a
constant in val normalTemp = 98.6;.
06
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//letters to the editor /
Then I realized that the earlier example (Haydn- Editor Andrew Binstock responds: Thank you for your very
Symphony) used var while this one was using val. thoughtful, detailed note. You make some excellent points,
In other words, without careful scrutiny it’s easy to even though my preferences run counter to yours. However,
get them confused. Part of the problem is the visual I do disagree with your dismissal of the value of fewer
similarity, but it’s also an issue of semantics: the keystrokes. You see little beneit as the initial programmer.
name val by itself doesn’t imply something that’s However, as a reader of code, being faced with long listings
constant. From a Java perspective, var doesn’t just that repeat unneeded type information is a tiring exercise
mean something that can vary, because both oicially that delivers little value. While you view my example as
(that is, in the Java Language Speciication, which talks purposely lengthy to help make my point, my concern at
about “constant variables”) and informally (in your the time was that it was unrealistically short. Developers
editorial), variable is used as an umbrella term that writing enterprise apps, especially ones that rely on frame-
includes both constants and true variables. That’s also works, are familiar with (and tired of) dealing with data
how it’s used in the vernacular by Java programmers, types that carry much longer names. For example, from the
making var a poor choice for indicating something Spring framework: AbstractInterceptorDriven
that’s mutable and noninal. BeanDefinitionDecorator. That, I grant you, is per-
The one thing I do think would be an improvement haps an extreme case; however, long class names are the
is something that wasn’t even presented, speciically norm in those applications.
something like this:
Shorten from the Other End
const normalTemp = 98.6; Regarding your editorial in the March/April issue
about the proposed additions to Java, why is the vari-
This provides the abbreviated grammar without the ability removed from the left side to be inferred from
vagueness of val and seems like a no-brainer when the right to mimic dynamically typed language syn-
compared to its current equivalent: tax? Consider your example:
Here the const keyword stands in for final, and versus my recommendation:
it indicates that the type is to be inferred from the
assigned value. There’s still a loss of explicit infor- HaydnSymphony surprise = new();
mation (the variable being deined is a float), but at
least now you’ve gone from two tokens down to one I prefer the fact that a statically typed language has a
instead of trading one token for a diferent and less strong reference to the type a variable is declared to
meaningful one. be. The verbosity comes in when you have to repeat it
—Brett Spell for the new operator. My approach also makes it easier
in debugging, and not just in local scope use.
07
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//letters to the editor /
Similarly for the native or ixed-value types, I am Short Books for Learning Java
less conlicted but still prefer the type information to I’m contemplating becoming a mobile game devel-
be added to the variable declaration: oper, and I’m struggling to grasp Java as a irst-time
programming language. Why is it so diicult to learn?
val normalTemp = 98.6; The books that I have been studying are more than
1,000 pages long. It’s a serious commitment to make.
would convey more information if it used the literal Are most professional Java programmers self-taught?
speciier of f: How should I train my mind for this? What should I be
thinking of?
const normalTemp = 98.6f;
—Frederick Piña
From South Africa, thanks for a wonderful magazine.
—Rudolf Harmse Editor Andrew Binstock responds: Only a fraction of pro-
grammers are formally trained. Many are self-taught, and
Why Not Left-Side “new”? even those who are formally trained eventually need to be
Instead of the var keyword, I propose an idea I sub- self-taught to keep up with advances in software develop-
mitted a long time ago to Project Coin. It’s the use of ment. When learning a new language, a common approach
new on the left side of assignments. Instead of is to begin by writing small programs that do simple tasks.
A classic, for example, is a program that accepts a centi-
HaydnSurprise surprise = new HaydnSymphony(); grade temperature on the command line and displays the
equivalent Fahrenheit temperature. If this approach of
I suggest learning by doing—using small, incremental programs—
appeals to you, then I recommend Murach’s Beginning
new surprise = HaydnSymphony(); Java. It comes in two lavors depending on which develop-
ment environment you use—Eclipse or NetBeans—and it
This avoids an additional keyword and is the simplest weighs in at well under 1,000 pages. I reviewed this book
statement. in the January/February 2016 issue, on page 12.
As for val, I think using the reserved const key-
word is a better solution, if in fact it’s truly a inal Contact Us
constant and not just an initial assignment. I think We welcome comments, suggestions, grumbles,
more Java programmers come from the C and C++ kudos, article proposals, and chocolate chip cookies.
languages than from Scala, although for younger pro- All but the last two might be edited for publication.
grammers, Java or JavaScript is now their initial lan- If your note is private, indicate this in your message.
guage. In any case, there’s less need for simplifying Write to us at javamag_us@oracle.com. For other
constant declarations than object declarations. ways to reach us, see the last page of this issue.
—Barry Kleinman
08
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//events /
O’Reilly OSCON conference in Eastern Europe.
MAY 16–19 The annual conference focuses
AUSTIN, TEXAS on Java technologies for appli-
The popular open source con- cation development. This year
ference moves to Texas this ofers ive tracks and 45 speak-
year, with two days of training ers on modern approaches in
and tutorials before the two- the development of distrib-
day conference. Topics this uted, highly loaded, scalable
year include Go unikernels, enterprise systems with Java,
scaling microservices in Ruby, among other topics.
Apache Spark for Java and Scala
developers, and an Internet jPrime
of Things (IoT) keynote from MAY 26–27
best-selling science iction SOFIA, BULGARIA
author and Creative Commons jPrime is a relatively new con-
champion Cory Doctorow. ference with talks on Java,
various languages on the JVM,
JavaCro mobile and web develop-
MAY 18–20 ment, and best practices. Its
ROVINJ, CROATIA second edition will be held in
JavaOne Latin America JUNE 28–30 GeeCON JavaCro, hosted by the Croatian the Soia Event Center, run
SÃO PAULO, BRAZIL MAY 11–13 Association of Oracle Users by the Bulgarian Java User
The Latin American version of the pre- KRAKOW, POLAND and Croatian Java Users Group, and backed by the big-
mier Java event includes presentations by GeeCON is a conference Association, will once again gest companies in the city.
the core Java team, tutorials by experts, focused on Java and JVM-based be held on St. Andrew Island, Scheduled speakers this year
and numerous information-rich sessions, technologies, with special also known as the Red Island. include former Oracle Java
all focused tightly on Java. attention to dynamic languages Touted as the largest Java com- evangelist Simon Ritter and
such as Groovy and Ruby. The munity event in the region, Java Champion and founder of
event covers topics such as JavaCro is expected to gather 50 JavaLand Markus Eisele.
software development meth- speakers and 300 participants.
odologies, enterprise archi- IndicThreads
tectures, design patterns, and JEEConf JUNE 3–4
distributed computing. More MAY 20–21 PUNE, INDIA
than 80 sessions are slated. KIEV, UKRAINE IndicThreads enters its 10th
JEEConf is the largest Java year featuring sessions on the
PHOTOGRAPH BY DIEGO TORRES SILVESTRE/FLICKR
09
ORACLE.COM/JAVAMAGAZINE ////////////////////////////////////////////// MAY/JUNE 2016
//events /
JBCNConf JCrete
JUNE 16–18 JULY 31–AUGUST 7
BARCELONA, SPAIN KOLYMBARI, GREECE
The Barcelona Java Users Group This loosely structured “uncon-
hosts this conference dedicated to ference” will take place at the
Java and JVM development. Last Orthodox Academy of Crete. A
year’s highlights included tracks JCrete4Kids component intro-
on microservices and Kubernetes. duces youngsters to programming
and Java. Attendees often bring
The Developer’s Conference (TDC) their families.
JULY 5–9
SÃO PAULO, BRAZIL JavaZone
Celebrating its 10th year, TDC is SEPTEMBER 7–8
one of Brazil’s largest conferences OSLO, NORWAY
for students, developers, and IT This event consists of a day of
professionals. Java-focused content workshops followed by two days
latest in software development More than a dozen sessions are on topics such as IoT, UX design, of presentations and more work-
techniques and technologies, from planned on topics such as Java SE, mobile development, and func- shops. Last year’s event drew
IoT to big data, Java, web technol- JVM languages and new program- tional programming are featured. more than 2,500 attendees and
ogies, and more. ming paradigms, web develop- (No English page available.) featured 150 talks covering a wide
ment and Java enterprise technol- range of Java-related topics.
Devoxx UK ogies, big data, and NoSQL. Java Forum
JUNE 8–10 JULY 6–7 JavaOne
LONDON, ENGLAND Java Enterprise Summit STUTTGART, GERMANY SEPTEMBER 18–22
Devoxx UK focuses on Java, web, JUNE 15–17 Organized by the Stuttgart Java SAN FRANCISCO, CALIFORNIA
mobile, and JVM languages. The MUNICH, GERMANY User Group, Java Forum typi- The ultimate Java gathering,
conference includes more than Java Enterprise Summit is a large cally draws more than 1,000 par- JavaOne features hundreds of ses-
100 sessions, with tracks devoted enterprise Java training event ticipants. A workshop for Java sions and hands-on labs. Topics
to server-side Java, architecture held in conjunction with a con- decision-makers takes place on include the core Java platform,
and security, cloud and containers, current Micro Services Summit. July 6. The broader forum will security, DevOps, IoT, scalable
big data, IoT, and more. Together, they feature 24 work- be held on July 7, featuring 40 services, and development tools.
shops covering topics such as the exhibitors and including lectures,
JavaDay Lviv best APIs, new architecture pat- presentations, demos, and Birds Send us a link and a description of
JUNE 12 terns, JavaScript frameworks, and of a Feather sessions. (No English your event four months in advance
LVIV, UKRAINE Java EE. (No English page available.) page available.) at javamag_us@oracle.com.
PHOTOGRAPH BY DAVID HOLT/FLICKR
10
ORACLE.COM/JAVAMAGAZINE ////////////////////////////////////////////// MAY/JUNE 2016
//java books /
JAVA TESTING WITH SPOCK
By Konstantinos Kapelonis
Manning Publications
The Spock testing framework But as Groovy has grown in principal capabilities and the
has been around for many years. popularity (driven in good part advanced options. To overcome
It was used originally by devel- by Gradle), this design is seen the language barrier for devel-
opers who wanted to leverage increasingly as an advantage. opers mostly familiar with Java,
Groovy’s uniquely good support Correctly written, Spock tests Konstantinos Kapelonis provides
for scripting unit tests. Over can work at the unit level (for a 30-page introduction to Groovy
its lifetime, it has evolved into test-driven development, for that presents the basic knowl-
a comprehensive framework example), for integration tests, edge needed to write and run
that includes a unit-test runner for doing BDD, and for system- tests, even complex ones.
(which drives JUnit), a mock- wide tests. Spock is one of the The author’s style is clear
ing framework, and a behavior- few tools that can be used by and rarely lacking. In addition,
driven development (BDD)–style both developers and QA teams, the examples he provides dis-
testing tool. As a result, you without the excess complexity play his deep familiarity with
can add Spock to your test- such a product would imply. Spock and the role it plays in
ing without the need for sepa- What Spock has lacked, how- modern development. Although
rate mocking/stubbing tools or ever, is good documentation. the book is written for devel-
behavior-testing add-ons. Even today, its website is dei- opers, Kapelonis dips into QA
I irst started using Spock cient in many ways (despite perspectives on testing for the
years ago because of its excel- active product development and issues raised. He also com-
lent support for data-driven fairly active mailing lists). This bines BDD with mocking from
tests (a leadership position it still book ills that gap. It provides which he elaborates an inter-
retains). Since then, my admi- what you need to get started esting mock-based approach to
ration has only grown. The big (including integration with design—all of which gives this
knock on it during the interven- IDEs and build tools) and then book a commendable applicabil-
ing years was the need to learn pushes onward into exploit- ity. Recommended.
Groovy scripting to write tests. ing Spock’s features, both its —Andrew Binstock
11
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
Your Destination for Java Expertise
Written by leading Java experts, Oracle Press books offer the most definitive,
complete, and up-to-date coverage of Java available.
Raspberry Pi with Java: Introducing JavaFX 8 Java: The Complete Reference, OCA Java SE 8 Programmer I
Programming the Programming Ninth Edition Study Guide (Exam 1Z0-808)
Internet of Things (IoT) Herbert Schildt Herbert Schildt Edward Finegan, Robert Liguori
Stephen Chin, James Weaver Learn how to develop dynamic JavaFX Fully updated for Java SE 8, this Get complete coverage of all
Use Raspberry Pi with Java to create GUI applications quickly and easily. definitive guide explains how to objectives for Exam 1Z0-808.
innovative devices that power the develop, compile, debug, and run Electronic practice exam questions
internet of things. Java programs. are included.
DIANA CARROLL
I n recent years, the amount of data that organizations pro-
cess has grown astronomically—as much as hundreds of
terabytes a day in some cases, and dozens or even hundreds
What Is Spark?
Spark is a fast, scalable, general-purpose distributed process-
ing engine. It provides an elegant high-level API for in-
of petabytes in total. This “big data” far exceeds what can be memory processing and signiicant performance improve-
stored or processed on a single computer. Handling the vol- ments over Hadoop MapReduce.
ume, velocity, and variety of data required by modern applica- Spark includes not only the core API but also a rich set of
tions has prompted many organizations to move to distributed libraries, which includes Spark SQL for interacting with struc-
systems, where clusters of dozens or hundreds of computers tured or tabular data; Spark Streaming for processing stream-
work together to solve their data ingestion, processing, and ing data in near real time; MLlib for machine learning; and
analysis needs. GraphX for graph processing.
But distributed programming is challenging: the complex- Spark is written in Scala, a scalable JVM language with a
ity involved with keeping data and processes in sync while Java-inspired syntax. In addition to Scala, the Spark API also
dealing with the reality of limited bandwidth and individual supports Java, Python, and R, making it easy to integrate with
system failures initially meant programmers were spending third-party libraries and accessible to developers with a wide
more time on the plumbing of distributed systems than actu- range of backgrounds.
ally processing or analyzing their data. Spark was originally conceived and developed at Berkeley’s
One of the most successful approaches to solving these AMPLab. Now, it is an Apache project and is available
issues has been Apache Hadoop and its principal core com- directly from Apache or preintegrated with several Apache
ponents: the Hadoop MapReduce data processing engine and Hadoop distributions including those from Cloudera and
the Hadoop Distributed File System (HDFS) data storage plat- other vendors.
form. Hadoop has been widely adopted and is used today by
many organizations. But Hadoop MapReduce has limitations: Spark Architecture Overview
it is cumbersome to program; it fully supports only Java (with Although Spark can be run locally on a single computer for
limited support for other languages); and it is bottlenecked by testing or learning purposes, it is more often deployed on a
the requirement that data be read from disk and then written distributed computing cluster that includes the following:
to disk after each task. ■■ A distributed data storage platform. This is most often
Apache Spark is designed as the next-generation distributed HDFS, but Spark is increasingly being deployed on other
computing framework, and it takes Hadoop to the next level. distributed ile storage systems such as Amazon Simple
14
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
Storage Service (S3). As Spark emerges as the new default YARN Worker Nodes
execution engine for the Hadoop ecosystem, support for
$ spark-submit
it is increasingly included in new projects such as Apache --master yarn-cluster Spark Spark
Kudu (an Apache incubator project). --class MyClass Executor Executor
■■ A cluster resource management platform. Hadoop YARN
MyProgram.jar
(which is part of core Hadoop) is the most common such
platform for enterprise production systems. Spark also Spark
YARN Executor
includes a small built-in cluster system called Spark
Resource
Standalone, which is suitable primarily for small clusters,
Manager
testing, or proof-of-concept deployment. Spark supports
Spark Driver
Apache Mesos as well. Because YARN is the most com- Program
mon, I have chosen to use it with the examples for this $ spark-shell
article, but the concepts are similar for all three supported --master yarn
cluster platforms. Spark
Spark can be invoked interactively using the Spark shell Executor
(available for Scala or Python), or you can submit an appli-
cation to run on the cluster. Both are shown as options in
Figure 1. Spark and YARN working together
Figure 1. In both cases, when the application starts, it connects
with the YARN Resource Manager. The Resource Manager
provides CPU and memory resources on worker nodes within If you want to work through the examples on your own, you
the cluster to run the Spark application, which consists of a irst need to download and run Spark. The easiest approach is
single driver and a number of executors. to download the “Pre-built for Hadoop 2.6 and later” package
The driver is the main program, which distributes and of the latest release of Spark and simply unpack it into your
manages tasks that run on the executors. The driver and each home directory. Once it is unpacked, you can run the spark-
executor run in their own JVM running on cluster worker shell script from the package’s bin directory. For simplic-
nodes. (You can also conigure your application so that the ity, these instructions start the shell locally rather than on
driver runs locally rather than on the cluster, but this is less a Hadoop cluster. Detailed instructions on how to download
common in production environments.) and launch Spark are on the Spark website.
You also need to download and unpack the example data,
Code Examples from the Java Magazine download area. The code examples
To help demonstrate how the Spark API works, I walk through assume that you have unpacked the data directory (weblogs)
two code examples below. While Spark does support Java, the into your home directory.
vast majority of installations use Scala. The examples are writ-
ten in Scala and run in the interactive Spark shell. If you don’t Example 1: Count Unique Visitors
know Scala, don’t worry; the syntax is similar to Java and my The example data set is a collection of web server log iles
explanations will make the function of the code clear. from the customer support site of a ictional mobile provider
15
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
called Loudacre. This is a typical line from the sample data you download Spark as described previously, the default ile
(wrapped here to it): system is simply your computer’s hard drive. If you install
Spark as part of a Hadoop distribution, such as Cloudera’s, the
3.94.78.5 - 69827 [15/Sep/2013:23:58:36 +0100] default ile system is HDFS.)
"GET /KBDOC-00033.html HTTP/1.0" 200 14417 Resilient distributed data sets. RDDs are a key concept in Spark
"http://www.loudacre.com"
programming. They are the fundamental unit of computa-
"Loudacre Mobile Browser iFruit 1"
tion in Spark. RDDs represent a set of data, such as a set of
The task in this irst example is to ind the number of unique weblogs in this example.
site visitors—that is, the number of user IDs in the data set. RDDs are distributed because the data they represent may be
User IDs are in the third ield in the data ile, such as 69827 distributed across multiple executors in the cluster. The tasks
in the example data above. to process that data run locally on the executor JVM where
The Spark context. Every Spark program has a single Spark that data is located.
context object, an instance of the SparkContext class. When RDDs are resilient because the lineage of the data is pre-
using the interactive Spark shell, this object is automati- served and, therefore, the data can be re-created on a new
cally created for you and given the name sc. When writing an node at any time. Lineage is the sequence of operations that
application, you will create one yourself. was applied to the base data set, resulting in its current state.
The Spark context provides access to Spark functional- This is important because one of the challenges of distrib-
ity, such as deining and loading data sets and coniguring uted computing is dealing with the possibility of node failure.
the application. The Spark context is the entry point for all Spark’s answer to this challenge is RDD lineage.
Spark functionality. RDDs can contain any type of data, including objects and
nested data, such as arrays and sets. In this example, the
scala> val weblogs = weblogs RDD contains strings—each element is a string cor-
| sc.textFile("weblogs/*") responding to a single line in the iles from which the data
is loaded.
(Note that the Spark shell scala> prompt is shown at the RDDs provide many methods to transform and interact
beginning of each line. Line continuation is indicated with a with the data they represent. Below are two simple examples:
pipe character: |. Do not enter the prompt or pipe character count(), which returns the number of items in the data set,
in the shell. Also, Microsoft Windows users might need to and take(n), which returns an array of the irst n items in
substitute the full Windows pathname of the example data the data set.
directory, such as C:\\Users\\diana\\weblogs\\*.)
Because my code example uses the interactive Spark scala> weblogs.count()
shell, I can use the precreated Spark context sc. The line of Long = 574023
code above creates a new Resilient Distributed Dataset (RDD)
scala> weblogs.take(2)
object for the data in the speciied set of iles, and assigns
» Array[String] = [3.94.78.5 - 69827
it to a new immutable variable called weblogs. The iles are [15/Sep/2013:23:58:36 +0100]
located in your home directory in the default ile system. (If "GET /KBDOC-00033.html HTTP/1.0" 16
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
200 14417 "http://www.loudacre.com" immutable. Once loaded, the data associated with an RDD
"Loudacre Mobile Browser iFruit 1", does not change; rather, you perform one or more transfor-
mations to create a new RDD with the data you need in the
19.38.140.62 - 21475 form you need.
[15/Sep/2013:23:58:34 +0100]
Let’s examine one of the most common transformations,
"GET /KBDOC-00277.html HTTP/1.0"
200 15517 "http://www.loudacre.com" map(), continuing with the previous example. map() applies
"Loudacre Mobile Browser Ronin S1"] a function, which is passed to each item in the RDD to pro-
duce a new item:
The character » indicates the result of a command executed
in the Spark shell. [The blank line in the output was added to scala> val userids = weblogs.
| map (item => item.split(" ")(2))
highlight the two array elements. —Ed.]
Transformations and actions. In addition to those shown in this
You might be unfamiliar with the double-arrow operator (=>)
example, Spark provides a rich set of dozens of operations you
in the call to the map method. This is how Scala represents
can perform with an RDD. These operations are categorized
lambda functions. Java 8, which also provides lambda func-
as either actions or transformations.
tions, uses similar syntax with a single arrow (->).
Actions return data from the executors (where data is pro-
In this case, the map() method calls the passed function
cessed) to the driver (such as the Spark shell or the main
once for each item in the RDD. It splits the weblog entry (a
program). For example, you saw above that the count()
String) at each space character, and returns the third string in
action returns the number of data items in the RDD’s data set.
the resulting array. In other words, it returns the user ID for
To do this, the driver initiates a
each item in the data set.
task on each executor to count
its portion of the data, and then
Single-item operations The results of the transformation are returned as a new
it adds those together to produce are powerful, but RDD and assigned to the variable userids. You can take a
look at the results of the new RDD by calling the action meth-
the inal total. Other exam- many types of data ods described above, count() and take():
ples of actions include min(),
max(), first() (which returns
processing and analysis scala> userids.count()
the irst item from the data require aggregating » Long = 574023
set), take() (seen earlier), and data across multiple
takeSample() (which returns a Note that the total count is the same for the new RDD as
random sampling of items from
items. Fortunately, it was for the parent (base) RDD; this is always true when
the data set). the Spark API using the map transformation because map is a one-to-
A transformation is an RDD provides a rich set one function, returning one new item for every item in the
operation that transforms the existing RDD.
data in the base or parent RDDs
of aggregation take(2) returns an array of the irst two items—here, the
to create a new RDD. This is functions. user IDs from the irst two lines of data in the data set:
important because RDDs are 17
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
scala> userids.take(2) aggregating data across multiple
» Array[String] = [69827,21475] items. Fortunately, the Spark API Spark is a powerful,
At this point, this example is almost complete; the task was
provides a rich set of aggrega- high-level API
tion functions. The irst step in
to ind the number of unique site visitors. So another step accessing these is to convert the
that provides
is required: to remove duplicates. To do this, I call another RDD representing your data into programmers the
transformation, distinct(), which returns a new RDD
with duplicates iltered out of the original data set. Then
a pair RDD. ability to perform
only a single action, count(), is needed to complete the
A pair RDD is an RDD consist-
ing of key-value pairs. Each ele-
complex big data
example task. ment in a pair RDD is a two-item processing and analysis
scala> userids.distinct().count()
tuple. (A tuple in Scala is a col- tasks on a distributed
» Long = 12582
lection similar to a Java list that
contains a ixed number of items,
cluster, without having
And there’s the answer: there are 12,582 unique site visitors in in this case exactly two.) The irst to be concerned with
the data set. element in the pair is the key, and the “plumbing” that
Note the use of chaining in the previous code snippet. the second is the value. For this
second example, you need to con-
makes distributed
Chaining a sequence of operations is a very common tech-
nique in Spark. Because transformations are methods on an struct an RDD in which the key is processing diicult.
RDD that return another RDD, transformations are chainable. the user ID and the value is the IP
Actions, however, do not return an RDD, so no further trans- address for each item of data.
formations can be appended on a chain that ends with an
scala> val userIPpairs = weblogs.
action, as in this example. | map(item => item.split(" ")).
| map(strings => (strings(2), strings(0)))
Example 2: Analyze Unique Visitors
I’ll move on to a second example task to demonstrate some Several things are going on in the code snippet above. First,
additional Spark features using pair RDDs. The task is to ind note the two calls to map in the same line. This is another
all the IP addresses for each site visitor (that is, each unique example of chaining, this time the chaining of multiple
user ID) who has visited the site. transformations. This technique is very common in Spark. It
Pair RDDs. The previous example (inding the number of does not change how the code executes; it is simply a syntac-
distinct user IDs) involved two transformations: map and tic convenience.
distinct. Both of these transformations work on indi- The irst map call splits up each weblog line by space, simi-
vidual items in the data set, either transforming one item lar to the irst example. But rather than selecting just a single
into another (map), or retaining or iltering out an item element of the array that is returned from the split, the whole
(distinct). Single-item operations such as these are power- array is returned containing all the ields in the current data
ful, but many types of data processing and analysis require item. Therefore, the result of the irst map call is an RDD con-
18
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
sisting of string arrays, one array for each item (weblog line) 190.196.39.14, 67.250.113.209,
in the base RDD. 67.250.113.209, …))
The input to the second map call is the result of the irst map
call—that is, the RDD consisting of string arrays. The second In the snippet above, the groupByKey transformation results
map call generates a pair (tuple) from each array: a key-value in a new pair RDD, in which the pair’s key is the same as it
pair consisting of the user ID (third element) as the key and was before, a user ID, but the pair’s value is a collection of all
the IP address (irst element) as the value. the values (IP addresses) for that key in the data set.
Use first() to view the irst item in the userIPpairs
RDD: Conclusion
As you can see, Spark is a powerful, high-level API that
scala> userIPpairs.first() provides programmers the ability to perform complex big
» (String, String) = (69827,3.94.78.5) data processing and analysis tasks on a distributed clus-
ter, without having to be concerned with the “plumbing”
Note that the irst item is, in fact, a pair consisting of a user that makes distributed processing diicult. This article
ID and an IP address. The userIPpairs RDD consists of the has barely scratched the surface of Spark’s capabilities and
same number of items as the base RDD, each paired with a those of its add-on libraries, such as Spark Streaming and
single IP address. Each pair corresponds to a single line in the Spark SQL. For more information, start at the Apache Spark
weblog ile. website. To explore Spark and the Hadoop stack indepen-
Data aggregation. Now that the RDD is in key-value form, dently, download the Cloudera QuickStart Virtual Machine or
you can use several aggregation transformations, including Docker image. </article>
countByKey (which counts the number of values for each
key), reduceByKey (which applies a function you pass to sum Diana Carroll is a senior curriculum developer at Cloudera. She has
or aggregate all the values for a key), join (which groups been working with and teaching about Spark since version 0.9 in
values associated with keys from two diferent RDDs), and late 2013. Carroll wrote and taught one of the irst commercially
mapValues (which applies a function to transform the value available courses on Java in 1997, and has also produced more than
of each key). To complete the task for the second example, use 20 courses for software developers, system administrators, data
the groupByKey transformation, which groups all the values analysts, and business users on subjects ranging from ecommerce
for each key in the RDD into a single collection: to business process management.
expecting their irst child, my wife and I came across a situa- ■■ Name
a name for our baby. Being the software developer that I am, ■■ Gender
for a name that my wife and I might both like. Before this data can be queried and analyzed, it must irst be
A data set on the Kaggle data science website contains a processed into a more manageable format. There are several
list of baby names that have been chosen in the US for almost ways to transform it. For this example I’m using both RxJava
a century. This data set—one of many on the internet—greatly and Apache Spark.
20
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
What is RxJava? If this is store complex data using JSON, the same data format com-
your irst time hearing of Spark ofers in-memory monly used in public-facing APIs. For example, the following
Reactive Java, otherwise data storage and would be considered a JSON document:
known as RxJava, it is
deined this way in the oi-
processing, which makes {
cial documentation: it significantly faster "first_name": "Nic",
"last_name": "Raboy",
“RxJava is a Java VM imple- than competing big data "address": {
mentation of ReactiveX
(Reactive Extensions): a library
technologies. "city": "San Francisco",
"state": "California",
for composing asynchronous "country": "USA"
and event-based programs by }
using observable sequences.” }
What does this mean? RxJava is reactive programming,
which in this case means programming with asynchronous Notice that nested objects and arrays are fair game in a
data streams. These data streams can be anything from schemaless database such as this. The beneit of using a
application variables to data structures. It doesn’t matter. You schemaless database instead of a relational database is lex-
can listen for these streams and react appropriately when ibility in how the data is stored. You don’t need to declare any
they come in. constraints beforehand, and if the data is irregular you can
In the example of processing a CSV ile, each line of the add or omit ields as needed. Raw data can be a mess, and
CSV ile could be a data stream. When the line is read, RxJava sometimes it’s easier to store it without forcing it into a rela-
reacts and transforms it as necessary. tional schema.
What is Apache Spark? Apache Spark is a data-processing NoSQL document databases are not the only type of NoSQL
engine designed to handle massive amounts of data. While database platforms, and NoSQL is not the only type of data-
RxJava works well, it wasn’t meant to handle the petabytes base platform.
of information in an enterprise application. Spark ofers in- In the NoSQL database market segment, there are several
memory data storage and processing, which makes it signii- kinds of options: document, key-value, tabular databases, and
cantly faster than competing big data technologies. I’ll use specialized products such as graph databases—all of which
it for that purpose. [An accompanying article in this issue, store data diferently. Unlike relational databases, NoSQL
“Apache Spark 101” (page 14), presents a deep dive into using databases store data in an unstructured way, as demonstrated
Apache Spark. —Ed.] in the JSON example.
Before querying and analyzing the loaded and transformed Couchbase is distinguished by its memory-centric architec-
data, the goal is to get the unstructured data into a NoSQL ture. It is composed of a RAM layer and a persisted disk layer.
database. For this example, I’ll use the NoSQL document Data is written to and read from the memory layer and asyn-
database, Couchbase. chronously written to disk. This makes Couchbase very quick
What is Couchbase? Couchbase is an open source NoSQL and a good resource to use with big data.
document database. Document databases such as Couchbase Couchbase Apache Spark Connector. Apache Spark also does
21
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
most of its data processing in new CSVReader(new FileReader("PATH_TO_CSV_FILE"));
memory, using hard disk only Apache Spark was
when necessary. Couchbase uses a designed to process Because the data will eventually be written to the database,
Spark connector to move data into I must connect to my server and open the bucket, which is a
and out of the database directly.
massive amounts collection of NoSQL documents.
This uses a feature of Spark called of data.
resilient distributed data sets, most Bucket bucket =
CouchbaseCluster.create("http://localhost:8091").
often referred to as RDDs.
openBucket("default", "");
Loading and Transforming the CSV Data into Couchbase This code assumes that Couchbase is running locally and the
As mentioned a few times earlier, the CSV data for the baby data will be saved in the default bucket without a password.
names irst needs to be loaded into a database before I start To process the CSV data set, I must create an RxJava
to process it. There are many ways to do this, but when work- Observable:
ing with potentially massive amounts of data, it would make
the most sense to load this data either through RxJava or Observable
Apache Spark. .from(reader)
Coming from a Java background, you might not be famil- .map(
iar with big data tools such as Apache Spark, and that’s not a csvRow -> {
problem. This CSV data set, with roughly 2 million records, JsonObject object = JsonObject.create();
object
can be loaded successfully using Java.
.put("Name", csvRow[1])
The requirements for loading with RxJava. There are a few .put("Year", csvRow[2])
dependencies that must be included in the Java project before .put("Gender", csvRow[3])
attempting to load the CSV data into Couchbase: .put("Count", csvRow[4]);
■■ A CSV reader such as OpenCSV return JsonDocument.create(
■■ RxJava csvRow[0], object);
■■ The Couchbase Java SDK }
You can obtain all of these dependencies through Maven by )
.subscribe(document -> bucket.upsert(document),
including them in the project pom.xml ile.
error -> System.out.println(error));
Developing an RxJava CSV loader. I’ll create a class—it doesn’t
matter what I call it—that represents the RxJava way for pro- Let’s look at what is happening in the Observable. The CSV
cessing the CSV ile. I’ll also create a class for the Apache Reader creates an Iterable<String[]>. The Observable
Spark way later. will use the Iterable<String[]> as the source of data for
To load, but not read, the CSV ile, I’ll create a new the .from() method.
CSVReader object, as follows: The data that is read will be an array of strings, not some-
thing that can be stored directly in the database. Using the
CSVReader reader =
22
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
.map() function, the array of strings can be transformed ■■Couchbase Apache Spark Connector
into whatever I decide. In this case, the goal is to map each You can obtain all of these dependencies through Maven by
line of the CSV ile to a database document. During this including them in your pom.xml ile.
mapping process, I could do some data cleanup. For exam- Developing an Apache Spark CSV loader. To use Apache Spark in
ple, while I don’t do it here, I could use something such as the Java project, I must irst conigure it in the code:
csvRow[*].trim() to strip out any leading and trailing
whitespace in each CSV column. SparkConf conf = new SparkConf()
Finally, I must save to the database each read line that’s pro- .setAppName("Simple Application")
.setMaster("local[*]")
cessed. The .subscribe() method from the previous snip-
.set("com.couchbase.bucket.default", "");
pet of code subscribes to notiications that the Observable JavaSparkContext javaSparkContext =
emits—in this case, the manipulated data record. new JavaSparkContext(conf);
After executing the RxJava class for loading CSV data,
there should be almost 2 million documents created con- The application name will be Simple Application, and the
taining baby names, containing the ields speciied in the master Spark cluster will be the local machine, because Spark
Observable. An example of one of the documents might will be running locally in this example. The Couchbase bucket
be this: I will use is once again the default bucket.
To proceed, I need to create a Spark DataFrame. DataFrames
{ are distributed collections of data that are organized similarly
"Name": "Nicolas",
to a relational database table and are particularly useful
"Year": "1988",
"Gender": "M", when it comes to CSV data because CSV data closely resembles
"Count": 400 a table.
} To create a Spark DataFrame, I need to create a SQL
Context. I can do this by using the JavaSparkContext.
Transforming raw data into JSON using Apache Spark. RxJava The JavaSparkContext represents a connection to the
works great, but what if you’re working with hundreds of mil- Spark cluster, which in this case is being run locally. With
lions of CSV records? Apache Spark was designed to process the JavaSparkContext, a SQLContext can be created
massive amounts of data. that allows for the creation of DataFrames and the usage of
Using the same sample data as in the RxJava segment, I SparkSQL:
give Apache Spark a try.
The requirements for loading with Apache Spark. I must include SQLContext sqlContext =
a few dependencies in the Java project before I attempt to load new SQLContext(javaSparkContext);
the CSV data into Couchbase:
■■ Spark Core
Using the SQLContext, the CSV data can be read like this:
■■ Spark CSV
DataFrame dataFrame = sqlContext.read()
■■ Spark SQL
.format("com.databricks.spark.csv")
23
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
.option("inferSchema", "true") thing like the following:
.option("header", "true")
.load("PATH_TO_CSV_FILE"); /path/to/apache/spark/bin/spark-submit \
--class "com.app.Main" \
The read process will use the Spark CSV package and preserve target/project-jar-with-dependencies.jar
the header information that exists at the top of the CSV ile.
When read into a DataFrame, the CSV data is now something Querying the Data for a Perfect Baby Name
Couchbase can understand. There is no need to map the data, Until now, the raw CSV data containing the baby names has
as was done with RxJava. been transformed and saved as JSON data in the database.
I must make an adjustment to the ID data. Spark will rec- The goal of this project hasn’t been met yet. The goal was to
ognize it as an integer or numeric value because this data set come up with some nice baby name options. The project is in
has only numeric values in the column. Couchbase, however, a good position at the moment, because the data is now in a
expects a string ID, so this bit of Java code using the Spark format that can be easily queried.
API solves the problem: Choosing a great name with RxJava. With the naming data
loaded into Couchbase, it can now be queried. In this
dataFrame = dataFrame.withColumn( instance, I’m going to use RxJava to query the data to try to
"Id", df.col("Id").cast("string")); come up with a good baby name.
Let’s say, for example, that I want to name my baby using
I can now prepare the DataFrame for saving to the database: one of the most popular names. I could create the following
RxJava function:
DataFrameWriterFunctions
dataFrameWriterFunctions =
public void getPopularNames(
new DataFrameWriterFunctions(
String gender, int threshold) {
dataFrame.write());
String queryStr =
Map<String, String> options =
"SELECT Name, Gender, SUM(Count) AS Total " +
new Map.Map1<String, String>("idField", "Id");
"FROM 'default' WHERE Gender = $1 GROUP BY " +
"Name, Gender HAVING SUM(Count) >= $2";
With the DataFrame data piped into the appropriate Data JsonArray parameters = JsonArray.create()
FrameWriterFunctions object, I can map the ID value to a .add(gender)
document ID. At this point, I can save the data: .add(threshold);
ParameterizedN1qlQuery query =
dataFrameWriterFunctions. ParameterizedN1qlQuery.parameterized(
couchbase(options); queryStr, parameters);
this.bucket
By calling the Couchbase function of DataFrameWriter .query(query)
Functions, massive amounts of Couchbase documents will .forEach(System.out::println);
now be saved to the bucket. }
I can execute the project after I package it by doing some-
24
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
In the previous function, two this.couchbaseSparkContext
parameters can be passed: the By using Apache .couchbaseQuery(query)
gender I am looking for and a Spark and Couchbase, .foreach(queryResult ->
System.out.println(queryResult));
threshold to determine popular-
ity. Let’s say, for example, my
you can process }
wife and I ind out we’re having a massive amounts Notice that the setup and parameterization is the same. The
boy (I don’t know yet), and we want of raw data quickly diference comes in where the query is actually executed.
to choose from a list of names that
have been used more than 50,000
even on systems Instead of the built-in RxJava features of the Couchbase Java
times. I could pass M and 50000 designed for personal- SDK, I use the Apache Spark Connector, which makes it pos-
sible to use Spark to run our application.
into the function and the param- scale projects.
eterized query would be executed.
Conclusion
Each row that satisies the
Let me be clear. I haven’t actually chosen a name for my irst-
query would be printed to the
born child, but if I wanted to make a decision in a statistical
console logs.
way through the use of RxJava and Apache Spark, I could.
Because the Couchbase Java SDK supports RxJava and a SQL-
By using big data tools such as Apache Spark combined with
like language called N1QL, there are many more options when
a NoSQL database such as Couchbase, which ofers a memory-
it comes to querying the name data.
centric architecture, you can process massive amounts of raw
Choosing a great name with Apache Spark. What if the data set
data quickly even on systems designed for personal-scale
were in the hundreds of millions of documents? While RxJava
projects. </article>
might perform well, it is probably best to use something such
as Spark because it was designed for massive amounts of data.
Nic Raboy (@nraboy) is a developer advocate for Couchbase in the
To use Spark to get a list of popular baby names based on a San Francisco Bay Area. He has released several native and hybrid
gender and a threshold, I might do something like this: mobile applications to iTunes and Google Play and writes about his
development experiences with a focus on making web and mobile
public void getPopularNames(
app development easier to understand.
String gender, int threshold) {
String queryStr = "SELECT Name, Gender, " +
"SUM(Count) AS Total FROM 'default' " +
"WHERE Gender = $1 GROUP BY Name, " +
"Gender HAVING SUM(Count) >= $2";
JsonArray parameters = JsonArray.create()
.add(gender) learn more
.add(threshold);
ParameterizedN1qlQuery query = Code download from the Java Magazine download area
ParameterizedN1qlQuery.parameterized(
OpenCSV, a Java parser for CSV files
queryStr, parameters);
25
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
disappear after they’re read. This message retention has mul- lion messages, by replaying all the messages that led to the
tiple advantages when compared with the usual queue opera- bug triggering—but, more importantly, you can have coni-
tion of message removal: dence that the bug, rather than a bug, has been ixed.
■■ A message can be replayed as many times as needed. ■■ You can test a microservice replaying from the same input
■■ A day of production messages can be replayed in testing ile repeatedly without the producer or downstream con-
months later. sumers running.
26
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
■■ You can test every microservice independently because there memory limit of your machine, which is usually between
is no low control interaction between them. If you have 128 TB and 256 TB, depending on your operating system.
low control between, let’s say, 20 services, any one of those How do you load data into memory and save data? This is
services could slow down any producer and, in turn, its what the operating system does for you. It zeroes out memory
producer until the entire system locks up. This is something pages you haven’t used, loads from disk pages that have been
you need to test for. Without low control, this can’t happen. used before, and saves data to disk asynchronously without
the JVM being involved or even running. For example, if
Inside Chronicle Queue you write some data and your process dies, the data will still
What might be surprising is that Chronicle Queue is written appear in cache and be written to disk. (That is, if the operat-
entirely in pure Java. It can outperform many data storage ing system didn’t also die. To protect from operating system
solutions written in C. You might be wondering how that’s failure, we use TCP replication.)
possible, given that well-written C is usually faster than Java. Linux (and Windows similarly) allows up to 10 percent of
One problem with any complex application is that you need main memory to be dirty/written to, but not saved to disk
a degree of protection between your services and your data (with the default setting). A machine with 512 GB of main
storage to minimize the risk of corruption. As Java uses a memory can have up to 51 GB of data uncommitted to disk.
JVM, it already has an abstraction layer and a degree of pro- This is an enormous amount of data—and you can accept a
tection. If an application throws an exception, this doesn’t burst of data this size with minimal impact on the application.
mean the data structure is corrupted. To get a degree of iso- Once this threshold is reached, the application is prevented
lation in C, many data storage solutions use TCP to commu- from dirtying any new pages before some are written to disk.
nicate with large data structures. The overhead of using TCP
(even over loopback) can exceed the performance beneit of Other Design Considerations
using C. Because Chronicle Queue supports sharing of the What if you don’t have a big server? Even a PC with 8 GB will
data structure in memory for multiple JVMs, it eliminates the allow up to 800 MB of unwritten data. If your typical message
need to use TCP to share data. size is 200 bytes, this is still a capacity for a sudden burst of
Memory management. Chronicle Queue is built on a class 4 million messages. Even while these messages are being
called MappedBytes in the package Chronicle-Bytes. It captured, data will be written to disk asynchronously. As
implements the message queue as a memory-mapped ile. writes are generally sequential, you can achieve the maxi-
MappedBytes in turn maps the underlying ile into mem- mum write throughput of your disk subsystem.
ory in chunks, with an overlapping region to allow messages Chronicle Queue also supports a memory-mapped Hash-
to easily pass from one chunk to another. The default chunk Map. This is a good option if you need random access and
size is 64 MB, and the overlap is 16 MB. When you write just your data set its in memory; however, once your working set
one byte, 80 MB (64 + 16) of virtual memory is allocated. of data exceeds the main memory size, performance can drop
When you get to the end of the irst 64 MB, another region is by an order of magnitude.
mapped in and you get another 64 MB. It drops mappings on a A feature of the JVM that was essential for building this
least recently used (LRU) basis. The result is a region of mem- library was the use of sun.misc.Unsafe. While its use is
ory that appears to be unbounded and can exceed the virtual highly discouraged, there wasn’t a practical alternative up to
27
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
Java 8. In Java 9, I hope, we will see tions and use of-heap memory as the storage and transport
a replacement in the form of cus- By using a between processes. For example, the header of the Chronicle
tom intrinsics, which promise to be transparent Queue has a ield for the last acknowledged index so replica-
a more powerful approach. What tion can be monitored. This is wrapped as a LongValue inter-
makes intrinsics so important for
format, we can face. When this object is read, written, or atomically updated,
performance is that you don’t pay validate the the of-heap memory it points to is accessed. This value is
the cost of using the Java Native data structure both shared between processes and persisted by the operat-
Interface (JNI). While the cost is low, ing system without the need for a system call.
it’s not low enough if you’re calling
and focus on The data structure in detail. Each entry is a blob with a preix of
it around a billion times per second. portions of it at four bytes. The preix contains one bit indicating whether the
An intrinsic is a method that the a time, allowing us entry is user data or metadata needed to support the queue
JVM recognizes and replaces with
machine code to do the same thing.
to implement much itself, another bit indicates whether the message in the entry
is complete or not, and the remaining 30 bits contain the
Usually this is a native method, but it more complex data length of the message.
can even be a Java method. structures. When the message is not complete, it cannot be read. But
The key beneit of using memory- if the length is known, a writer can skip such messages and
mapped iles is that you are no lon- attempt to write after them. If Thread1 is in the middle of
ger limited by the size of your heap, writing a message but it doesn’t know how long it will be, it
or even the size of your main memory. You are limited only by can write four bytes, which contains the length. Thread2 can
the amount of disk space you have. The cost of disk space is see that there will be a message and skip over it looking for
usually still much lower than main memory. a place to write. This way multiple threads can write to the
A surprising beneit of memory-mapped iles is that you can queue concurrently. Any message that is detected as bad, such
use them in a mode in which there is only one copy in mem- as a thread that died while writing, can be marked as bad
ory. This means that in the operating system’s disk cache, metadata and skipped by the reader.
the memory in process A and the memory in process B are As the iles grow without limit, you may need to compress
shared. There is only one copy. If process A updates it, not only or delete portions while the data structure is still in use. To
is this visible to process B in the time it takes the L2 caches to support this, we have time-based ile rolling. By default you
become coherent (that is, synchronize the data, which today get one ile per day. This allows you to manage data simply by
typically takes around 25 nanoseconds), but the operating compressing, copying, and deleting this daily ile. The rolling
system can asynchronously write the data to disk. In addition, can also be performed more or less often, as required.
the operating system can predictively read ahead, loading data There is a special value that is a “poison pill” value, which
from disk when you access the memory (or iles) sequentially. indicates that the ile has been rolled. This ensures that all
Finally, an important beneit of using a memory-mapped writers and readers roll to the new ile at the same point in a
ile is the ability to bind a portion of memory to an object. Java timely manner.
doesn’t have support for pointers to random areas of memory. For the message data itself, we use a binary form of YAML
We turn to an interface of getters, setters, and atomic opera- to store the messages because it’s a format that is designed to
28
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
be human-readable. We view JSON and XML as technologies
aligned with JavaScript and SGML, respectively, and not as
readable. We support JSON for messaging with browsers. The
binary form of YAML is used for performance reasons, and it
can be automatically translated to YAML for viewing.
Learn Java 8
We didn’t always use such a transparent format, and that
placed a limit on how complex the data structure could be. By
From the Source
using a transparent format, we can validate the data structure
and focus on portions of it at a time, allowing us to imple-
ment much more complex data structures as well as support
backward compatibility for older formats.
Oracle University
Append-only benefits. Chronicle Queue is designed for sequen-
tial writes and reads. It also supports random access and
updates in place, although you can’t change the size of an
existing entry. You can minimize this limitation by padding
an entry for future use. New Java SE 8 training and certification
Sequential reads and writes are more eicient for per-
sistence to HDD and SSD disks. The append-only structure Available online or in the classroom
makes replication much simpler as well.
Taught by Oracle experts
Conclusion
Using native memory to complement managed, on-heap
memory can allow your Java application to exploit all the
memory of your server. It can even extend the available data
structure to exceed main memory by lushing to disk.
By using a simple data lifecycle and a transparent data
structure, you can move a signiicant portion of your data out
of the heap. This can improve the performance, stability, and
scalability of your application. </article>
Learn More
Peter Lawrey (@PeterLawrey) has answered the most ques-
tions about Java, JVM, concurrency, memory, performance, and
ile I/O on StackOverlow.com (roughly 12,000), and he writes the
Vanilla Java blog. He is the architect of Chronicle Software and a
Java Champion.
29
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//big data /
MERT ÇALIŞKAN
J Unit, the widely used Java unit testing framework, has
just seen the irst alpha release of version 5 after 10 years
on version 4. JUnit 5 consists of a revamped codebase with
Configuring Tools to Use JUnit 5
JUnit 5 dependency deinitions are available for both Maven
and Gradle. For this article, I used Maven and its dependency
a modular architecture, a new set of annotations, an exten- deinition for the JUnit 5 API. The following shows the Maven
sible model for third-party library integration, and the ability inclusion of JUnit 5:
to use lambda expressions in assertions. The predecessor of
JUnit 5 was the JUnit Lambda project, which sowed the irst <dependency>
ideas for the next generation of unit testing and was crowd- <groupId>org.junit</groupId>
<artifactId>junit5-api</artifactId>
funded until October 2015.
<version>5.0.0-ALPHA</version>
Through the years, JUnit has captured the essence of what <scope>test</scope>
a unit testing framework should be. However, its core mostly </dependency>
stayed intact, which made it diicult for it to evolve. This new
version is a complete rewrite of the whole product that aims JUnit 5 now consists of multiple modules including the
to provide a suicient and stable API for running and report- junit5-api module, which provides a transitive depen-
ing tests. Implementing unit tests with JUnit 5 requires Java 8 dency, and opentest4j, which is named after the Open Test
at a minimum, but it can run tests on code written for earlier Alliance for the JVM project. Its aim is to provide a minimal
versions of Java. common foundation for testing libraries, IDEs, and other
In this article, I describe the principal features of this early tools such as TestNG, Hamcrest, Spock, Eclipse, Gradle,
release of JUnit 5, illustrating them with detailed examples. Maven, and many others.
All the code in this article is based on JUnit version 5.0.0- In addition, JUnit 5 has the following modules:
ALPHA, which was released in February 2016. The complete ■■ junit5-api, an API module that contains classes for
source code for the examples in this article is available at the implementing tests.
Java Magazine download area. ■■ junit4-engine, a JUnit 4 engine implementation. It
The JUnit team is planning to ship a release candidate of locates and runs JUnit 4–based tests.
the framework in the third quarter of 2016. Milestone 1 is ■■ junit5-engine, a JUnit 5 engine implementation module.
one of the last steps before JUnit 5 oicially ships. This will It locates and runs JUnit 5-based tests.
surely be one of the most consequential releases ever in the ■■ junit-engine-api, an abstraction API module for test-
build tools and IDEs to run tests. It makes it possible to run For a simple JUnit 5 test class, such as the one shown in
JUnit 4 and JUnit 5 tests in a single run. Listing 1, there is almost no diference to be seen at irst
■■ junit-console, an API module for running JUnit 4 and
glance when compared with a JUnit 4 test class. The main dif-
JUnit 5 tests from the command line. It prints execution ference is that there is no need to have test classes and
results to the console. methods deined with the public modiier. Also, the @Test
■■ junit-commons, a common API module that is being used
annotation—along with the rest of the annotations—has
by all modules. moved to a new package named org.junit.gen5.api,
■■ junit4-runner, an API module for running JUnit 5 tests
which must be imported.
on JUnit 4. This eases the migration of JUnit 4–based imple-
mentations to JUnit 5, because the IDEs and the build tools Capitalizing on the Power of Annotations
don’t support JUnit 5 tests yet. JUnit 5 ofers a revised set of annotations, which, in my view,
■■ surefire-junit5, a module that contains JUnitGen5
provide essential features for implementing tests. The anno-
Provider, which integrates with the Maven Sureire plugin tations can be declared individually or they can be composed
for running JUnit 5 tests on JUnit 4. to create custom annotations. In the following section, I
■■ junit-gradle, a module that contains JUnit5Plugin,
describe each annotation and give details with examples.
which integrates with Gradle builds for running JUnit 5 @DisplayName. It’s now possible to display a name for a test
tests on JUnit 4. class or its methods by using the @DisplayName annotation.
One of the main goals of JUnit 5 modules is to decouple the As shown in Listing 2, the description can contain spaces and
API for executing the tests from the APIs for implementing special characters. It can even contain emojis such as J.
the tests.
Listing 2.
Anatomy of a JUnit 5 Test @DisplayName("This is my awesome test class")
Let’s look at some JUnit 5 tests, starting with the simple JUnit class SimpleNamedTest {
test shown in Listing 1.
@DisplayName("This is my lonely test method")
Listing 1. @Test
import org.junit.gen5.api.Test; void simpleTestIsPassing() {
assertTrue(true);
class SimpleTest { }
}
39
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//testing /
@Disabled. The @Disabled annotation is analogous to the is described in detail shortly. With ConsoleRunner, you can
@Ignore annotation of JUnit 4, and it can be used to disable use the –t parameter for providing required tag names or the
the whole test class or one of its methods from execution. The –T parameter for excluding tag names.
reason for disabling the test can be added as description to @BeforeAll, @BeforeEach, @AfterEach, and @AfterAll. The
the annotation, as shown in Listing 3. behavior of these annotations is exactly the same as the
behavior of JUnit 4’s @BeforeClass, @Before, @After, and
Listing 3. @AfterClass, respectively. The method annotated with
class DisabledTest {
@BeforeEach will be executed before each @Test method,
and the method annotated with @AfterEach will be exe-
@Test
@Disabled("test is skipped") cuted after each @Test method. The methods annotated with
void skippedTest() { @BeforeAll and @AfterAll will be executed before and
fail("feature not implemented yet"); after the execution of all @Test methods. These four annota-
} tions are applied to the @Test methods of the class in which
} they reside and they will also be applied to the class hierar-
chy, if any exists. (See the next section on test hierarchies.)
@Tags and @Tag. It’s possible to tag test classes, their meth- The methods annotated with @BeforeAll and @AfterAll
ods, or both. Tagging provides a way of iltering tests for exe- need to be deined as static.
cution. This approach is analogous to JUnit 4’s Categories. @Nested test hierarchies. JUnit 5 supports creating hierarchies
Listing 4 shows a sample test class that uses tags. of test classes by nesting them inside each other. This option
enables you to group tests logically and have them under the
Listing 4.
@Tag("marvelous-test") same parent, which facilitates applying the same initializa-
@Tags({@Tag("fantastic-test"), tion methods for each test. Listing 5 shows an example.
@Tag("awesome-test")})
class TagTest { Listing 5.
class NestedTest {
@Test
void normalTest() { private Queue<String> items;
}
@BeforeEach
@Test void setup() {
@Tag("fast-test") items = new LinkedList<>();
void fastTest() { }
}
} @Test
void isEmpty() {
assertTrue(items.isEmpty());
You can ilter tests for execution or exclusion by providing tag
}
names to the test runners. The way of running ConsoleRunner
40
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//testing /
@Nested assertEquals(2 == 2, true);
class WhenEmpty { }
@Test
public void removeShouldThrowException() { @Test
expectThrows( void assertionShouldBeTrueWithLambda() {
NoSuchElementException.class, assertEquals(3 == 2, true,
items::remove); () -> "3 not equals to 2!");
} }
} }
With JUnit 5, it’s also possible to assign the exception to a Listing 10.
variable in order to assert conditions on its values, as shown class ResolversTest {
in Listing 9.
@Test
Listing 9. @DisplayName("my awesome test")
class Exceptions2Test { void shouldAssertTrue(
TestInfo info, TestReporter reporter)
@Test {
void expectingArithmeticException() { System.out.println(
StringIndexOutOfBoundsException exception = "Test " + info.getDisplayName() +
expectThrows( " is executed.");
StringIndexOutOfBoundsException.class,
() -> "JUnit5 Rocks!".substring(-1)); assertTrue(true);
reporter.publishEntry(
assertEquals(exception.getMessage(), "a key", "a value");
42
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//testing /
} for dynamically resolving method parameters at runtime.
} So the parameter annotated with the marker annotation
@InjectMock will be mocked and stored through methods
Instances of TestInfo or TestReporter can also be injected annotated with @BeforeEach and @Test.
into methods annotated with @BeforeAll, @BeforeEach, You can access the source code for MockitoExtension and
@AfterEach, and @AfterAll. @InjectMock online.
43
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//testing /
Listing 12.
@RunWith(JUnit5.class)
public class RunWithTest {
@Test
public void simpleTestIsPassing() {
org.junit.gen5.api.Assertions.
assertTrue(true);
}
}
Conclusion
The JUnit team has succeeded in ofering a new, redesigned
version of JUnit that addresses nearly all the limitations
of previous versions. Note that the JUnit 5 API is still sub-
ject to change; the team is annotating the public types
with the @API annotation and assigning values such as
Experimental, Maintained, and Stable.
Give JUnit 5 a spin, and be prepared for the release
that’ll hit the streets in late 2016. Keep your green bar
always on! </article>
learn more
JUnit 5 oicial documentation
44
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//new to java /
Understanding Generics
Use generics to increase type safety and readability.
MICHAEL KÖLLING
I n this installment of the “New to Java” series, I want to talk
about generics.
If you have programmed for a little while in Java, it is
A Bit of History
Before Java 5 was released in 2004, Java did not have generic
types. Instead, it had what are called heterogeneous collections.
likely that you have come across generics, and you have Here is what an example of a heterogeneous list looked like in
probably used them. (They are hard to avoid when using those days:
collections, and it is hard to do anything really interesting
without collections.) If you are coming to Java from C++, you List students = new ArrayList();
might have encountered the same concept as generics under
the name of parameterized types or templates. (Templates in (Knowing this history is important, because you can still
C++ are not the same as generics in all aspects, but they are write this code in Java today—even though you shouldn’t.
closely related.) And you might come across these collections in legacy code
Many novice Java programmers use generics without a full you maintain.)
understanding of how they work and what they can do. This In the example above, I intend to create a list of Student
gap is what I address in this article. objects. I am using subtyping—the declared type of the vari-
In Java, the concept of generics is simple and straight- able is the interface List, a supertype of ArrayList, which
forward in principle but tricky in the details. There is much is the actual object type created. This approach is a good idea
to say about the corner cases, and it is also interesting to look because it increases lexibility. I can then add my student
into how generics are implemented in the Java compiler and objects to the list:
the JVM. This knowledge helps you understand and anticipate
students.add(new Student("Fred"));
some of the more surprising behaviors.
My discussion is spread over two parts. In this issue, I When the time comes to get the student object out of my list
discuss the principles and fundamental ideas of generic again, the most natural thing to write would be this:
types. I look at the deinition and use of generics and pro-
vide a basic, overall understanding. In the next issue of Java Student s = students.get(0);
Magazine, I will look at the more subtle parts, advanced uses,
and implementation. If you read both articles, you will arrive This, however, does not work. In Java, in order to give the
at a good understanding of how generics can help you write List type the ability to hold elements of any type, the add
PHOTOGRAPH BY
JOHN BLYTHE better code. method was deined to take a parameter of type Object, and
45
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//new to java /
the get method equally returns an Object type. This makes (at runtime, when the application might already have been
the previous statement an attempt to assign an expression delivered to a customer), but also that the source location
of type Object to a variable of type Student—which is of the detected error might be far removed from the actual
an error. mistake: you are notiied about the problem when getting the
In the early days of Java, this was ixed with a cast: element out, while the actual error was made when putting
the element in. This might well be in a diferent part of the
Student s = (Student) students.get(0); program entirely.
Now, writing this cast was always a mild annoyance. You usu- Java and Type Safety
ally know what types of objects are stored in any particular Java was always intended to be a type-safe language. This
list—why can’t the compiler keep track of them as well? means that type errors should be detected at compile time,
But the problem goes deeper than mere annoyance. and runtime type errors should not be possible. This aim was
Because the element type of the list is declared to be Object, never achieved completely, but it’s a goal that the language
you can actually add objects of any type to the same list: designers strove for as much as possible. Casting breaks this
goal: every time you use a cast, you punch a hole in type
students.add(new Integer(42));
students.add("a string"); safety. You tell the type checker to look the other way and
just trust you. But there is no guarantee that you will get
The fact that this is possible—that the same list can hold things right.
objects of diferent types—is the reason it is referred to as Many times, when a cast is used, the code can be rewrit-
heterogeneous: a list can contain mixed element types. ten: often better object-oriented techniques can be used to
Having lists of diferent, unrelated element types is rarely avoid casting and maintain type safety. However, collections
useful, but it is easily done in error. presented a diferent problem. There was no way to use them
The problem with heterogeneous without casting, and this jarred with the philosophy of Java.
lists is that this error cannot be It is useful to That such an important area of programming could not be
used in a type-safe way was a real annoyance. Thus in 2004,
detected until runtime. Nothing
prevents you from accidentally
understand one the Java language team ixed this problem by adding generics
inserting the wrong element aspect that changed to the Java language.
type into the student list. Worse, slightly when
Type Loss
even if you get the element out of
the list and cast it to a Student,
generics entered The term for the problem with heterogeneous collections is
the code compiles. The error sur- the Java language: type loss. In order to construct collections of any type, the ele-
faces only at runtime, when the the relationship ment type of all collections was deined to be Object. The
cast fails. Then a runtime type add method, for example, might be deined as follows:
error is generated.
between classes
The problem with this runtime and types. public void add(Object element)
error is not only that it occurs late 46
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//new to java /
This works very well to put elements—say, objects of type new ArrayList<Student>()
Student—into the collection, but it creates a problem when When generics
you want to get them out again. Even though the runtime Just as with parameters for were introduced,
system will know that the element is of type Student, the methods, you have a formal
compiler does not keep track of this. When you later use a parameter speciication in the
a useful shortcut
get method to retrieve the element, all the compiler knows is deinition (the E) and an actual notation—the
that the element type is Object. The compiler loses track of parameter at the point of use diamond notation—
(Student). Unlike method
the actual element type—thus the term type loss.
parameters, the actual parameter
was provided to ensure
Introduction to Generics is not a value but a type. that the increased
The solution to avoid type loss was to give the compiler By creating an ArrayList readability does not
<Student> (which is usually
enough information about the element type. This was done
by adding a type parameter to the class or interface deini- read out loud as “an ArrayList of
lead to unnecessary
tion. Consider an (incomplete and simpliied) deinition of an Student”), the other mentions verboseness.
ArrayList. Before generics, it might have looked like this: of the type parameter E in the
speciication are also replaced
class ArrayList { with the actual type parameter
public void add(Object element); Student. Thus, the parameter type of the add method and
public Object get(int index); the return type of the get method are now both Student.
}
This is very useful: now only Student objects can be added
as elements, and you retrieve Student objects when you get
The element type here is Object. Now, with the generic
them out again—no casting is needed.
parameter, the deinition looks as follows:
This greatly reduces verbosity browsers and Node.js A Novel Friendly Type System
while remaining statically typed. virtual machines. Ceylon features a powerful type system based on the follow-
In Ceylon, functions can be ing parts:
invoked either with parenthe- ■■ Most types will be inferred (you’ve already seen this with
ments by name (path and service, in this code). In this certain type or is not null, the type checker will remember
case, the service argument is passed a trivial function that it and allow you to treat it as such.
writes hello world to the client. ■■ The type of null and of object values should be distinct.
Something unique to Ceylon is that out of the box, the tools An Object can never be null, and its type is distinct from
know how to compile and run this module. The IDE and even Null (the type of null).
the command-line version know how to deal with modules; ■■ Union types allow you to specify that a value should be of
resolve and fetch dependencies; set up classpaths; compile one of several types. For example, you can say that a value
and package modules—locally or even to remote module is of type String or of type Null with the String|Null
repositories; and all the tasks that other languages usually union type. You can access members common to both
need many tools to perform. This is all you need to do on the types, or narrow the type to one of the cases with low
command line to compile and run your module: typing: if(is String foo) then foo.lowercased
else "<null>".
$ ceylon compile,run hello ■■ Intersection types let you describe a value that should
class Counter(){
Optional Parameters // starts at zero, does not change on access
There are many great reasons to love Java. However, one fea- shared variable Integer count = 0;
ture I miss is the lack of optional parameters. Consider this // increments count every time it is accessed
Java code: shared Integer increase => ++count;
}
void log(String message){
log(message, Priority.INFO); // no need for "new":
} // classes are functions in Ceylon
void log(String message, Priority priority){ value counter = Counter();
log(message, priority, null); print(counter.count); // 0
} print(counter.increase); // 1
void log(String message, Priority priority, print(counter.count); // 1
Exception exception){
... There are many such examples of common Java itches that
} Ceylon scratches—and I haven’t even talked about how awe-
some function types are.
log("foo");
log("bar", debug);
// ouch, need to know the default The Ceylon SDK and Interoperability with
// value of "priority" Other Languages
log("gee", info, x); Ceylon compiles to both Java bytecode and JavaScript (a Dart
back end is nearing completion, too). This multiplatform
The corresponding code in Ceylon is this: support enables you to run Ceylon programs not just on the
JVMs you are used to, but also on browsers and Node.js virtual
void log(String message, Priority priority = info, machines (VMs). Once you have learned the Ceylon language
Exception? exception = null){
(which was designed to be easy to learn), you can reuse that
}
knowledge in your web applications on both the front and
log("foo"); back ends.
log("bar", debug); Even better, you’re not limited by what APIs you ind in
log{ a particular back end. When running on the JVM, you have
exception = x;
52
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//jvm languages /
excellent interoperability with every library out there: not Reified Generics
just the JDK, but also every OSGi and Maven module. (Ceylon When using generics in languages that erase type-argument
also knows how to resolve Maven modules from the Maven information at compile time (that is, languages such as Java
central repository.) People have run Ceylon code interacting without reiied generics), it is often frustrating that all this
with Spring, WildFly, Vert.x, and even Apache Spark (written information is lost at runtime. For example, you cannot
in Scala). When compiled for the JVM, Ceylon produces idio- use those type arguments anymore to reason about them.
matic Java APIs that adhere to conventions (JavaBeans, for Without reiied generics, you cannot ask if a value is of an
example) in most cases, which makes interoperability from argument’s type, or use introspection on that type argument.
Java to Ceylon easy. For example, the following code can only be implemented in
When running on JavaScript VMs, you can easily access languages that have reiied generics, such as Ceylon:
whatever APIs your browser provides, by using dynamic
blocks, which relax the type checker to access untyped APIs. shared ObjectArray<T> toArray<T>(List<T> list){
At the moment, the Ceylon team is working on npm integra- // if you have a List<Foo>, you create a
//JVM array of Foo[], not just Object[]
tion (allowing you to use npm modules and publish to npm
value ret = ObjectArray<T>(list.size);
repositories) and on TypeScript interfacing, so that you can
variable value x = 0;
provide a typed interface to many JavaScript APIs, such as the for(elem in list){
DOM and browser APIs. ret.set(x++, elem);
If you want to write libraries that work on both JVM and }
JavaScript VMs, though, interoperability is not always enough return ret;
(although it is possible to write a module that delegates to }
diferent interoperability code depending on the back end). If
you had to limit yourself to the JDK for collections, or to the Nor is it possible to write the following without reiied
corresponding Node.js module, you would never be able to generics:
write portable Ceylon modules. For that reason, Ceylon comes
shared List<Subtype>
with a full-ledged modern SDK containing most of what you
narrow<Type,Subtype>(List<Type> origin)
need to write portable Ceylon applications. given SubType extends Type {
Ceylon runs out of the box on OpenShift, Vert.x (where you
can even write your verticles in Ceylon), WildFly (produce a value ret = ArrayList<Subtype>(origin.size);
WAR ile from your Ceylon modules with the ceylon war for(elem in origin){
command), and OSGi environments, or simply from the java // Here we can check that the run-time
command line. // type is an instance of Subtype, which
// is a type parameter
Additionally, because the Ceylon JVM back end is based on
if(is Subtype elem){
the Java compiler (javac), it can compile both Java and Ceylon
ret.add(elem);
source iles at the same time, allowing parts of your project to }
be written in both languages. }
return ret;
53
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//jvm languages /
} javaClassFromDeclaration(klass));
}
Or to write this more simply using a comprehension, which in return cfg.buildSessionFactory();
Ceylon is a way to declare Iterable literals whose elements }
are generated by procedural code:
value s = createSessionFactory('package');
shared List<Subtype>
narrow<Type,Subtype>(List<Type> origin) Good Tooling
given SubType extends Type { Ceylon tooling does an excellent job, and many tools designed
// return an ArrayList whose elements for Ceylon will make development easier. The irst tool is the
// consist of every element of origin Ceylon IDE for Eclipse, which is a full-featured IDE for Ceylon
// that is an instance of Subtype with interoperability with Eclipse Java development tools.
return ArrayList<Subtype>{
This IDE is what I use every day to develop Ceylon modules. It
for (elem in origin)
if (is Subtype elem) is very solid and supports more refactorings, quick ixes, and
elem features than the Eclipse Java plugin. Naturally, like the rest
}; of the Ceylon tools, it knows how to deal with modules and
} module repositories, because they are irst-class citizens of
the language.
Metamodel In a few months, the Ceylon team will release the irst ver-
Ceylon also features a metamodel, which enables you to sion of the Ceylon IDE for IntelliJ IDEA. The team has been
inspect all running modules, their packages, and their mem- abstracting all the Eclipse plugin code into an IDE-agnostic
bers (types, functions, and values), optionally iltered by module for the last few months, rewriting it in Ceylon at the
annotation. This is very useful when creating frameworks. same time, so that it can be used in both IntelliJ and Eclipse
In order to point a framework to your packages, modules, without having to maintain quick ixes and refactorings for
or types, you can even use metamodel literals, which are both platforms. In the end, most of both IDEs will be writ-
similar to Foo.class literals in Java, except they can point ten in Ceylon—a task for which the easy interoperability with
to any element of the Ceylon program. For example, here is Java helps a lot.
how you can scan the current package for Java Persistence If you just want to try Ceylon without installing anything,
API (JPA) entities: there is a great Ceylon web IDE. It features several code
examples and will let you write your own code, too, and it
shared SessionFactory provides code completion, error markers, and documentation.
createSessionFactory(Package modelPackage){ So don’t hesitate to throw your Ceylon code at it until you
value cfg = Configuration();
need a full-ledged IDE.
for (klass in modelPackage
The other programmer’s best friend is the command line.
.annotatedMembers<ClassDeclaration,
Entity>()) { Ceylon has an extensive command line, inspired by Git,
cfg.addAnnotatedClass( with a single ceylon command, from which you can dis-
54
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//jvm languages /
cover every other subcommand via completion or via the Since the release of version 1.2.2, you can also ship a
--help argument. With it, you can compile and run on Ceylon autoinstaller with your projects. This installer uses
both back ends, document your modules, list available ver- the ceylon bootstrap command, which functions like the
sions of a module, search for modules, copy modules or famous Gradle Wrapper, gradlew: it is a very small library
module repositories, and even package and install new that autoinstalls the right version of Ceylon for users who do
command-line plugins. not have it installed locally.
For example, if you want to install and use the ceylon
format command, just type: Conclusion
It is impossible to cover the entire language and ecosystem
$ ceylon plugin install ceylon.formatter/1.2.2 in one article. But if this introduction has made you curious,
# now you can use it! check out a good tour of Ceylon online.
$ ceylon format ...
Ceylon 1.0 was released two years ago. Ceylon is now at the
mature version of 1.2.2, with a release cycle of around two
The Ceylon API documentation generator outputs beautiful,
or three months, to bring you bug ixes and features as fast
usable documentation, as you can observe from the language
as possible. The team has already merged two big feature
module documentation online. One of the features I like most
branches bringing support for writing Android applications
is the ability to search and navigate results from the keyboard
in Ceylon, as well as rebasing the JVM back end on the Java 8
by typing s, searching, and then using the keyboard arrows.
compiler, which allows you to produce Java 8 bytecode on top
Try it!
of the existing Java 7 bytecode that Ceylon already supports.
The Ceylon distribution ships with Ant plugins for all
Join the friendly Ceylon community online and feel free to
standard command-line subcommands. Maven and Gradle
post your questions. </article>
plugins for Ceylon have been written by the community and
are available, too, in case you want to include Ceylon as part
From deep in the mountains of Nice, France, Stéphane Épardaud
of an existing Java application.
(@UnFroMage) works for Red Hat on the Ceylon project, including
As your Ceylon modules mature, they can graduate from
the JVM compiler back end, various SDK modules, and the Herd
your computer to Herd: the online module repository for module repository. He is a frequent speaker and is the co-lead of
Ceylon. This open source web application functions as the the Riviera Java User Group.
central repository for every public Ceylon module. This is
where you will ind the Ceylon SDK, for example (it does not
need to be bundled with the distribution). Anyone can have
an account on Herd to publish modules, but if you want pri-
vate deployments, you can also download the web appli-
cation and run your own instance of it. By default, Ceylon learn more
tools attempt to fetch modules from the Herd repository
if they cannot be found locally, but it’s very easy to add
The oicial, in-depth tour of Ceylon
other repositories.
55
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//fix this /
Quiz Yourself
More questions from an author of the Java certification tests
Getting Onboard
Oracle Java Cloud Service
A hands-on, step-by-step guide to trying out an enterprise cloud
HARSHAD OAK
Conclusion
As this article has shown, using Oracle Java Cloud Service
is simple enough and gets you scale and the other beneits
that make the cloud a compelling proposition, especially for
enterprise applications.
The Java cloud space has matured rapidly over the past few
years. In its early days, many developers had concerns: “Can
the cloud be tweaked to get exactly what I want? Will the
cloud bring all the power and functionality that I am used
to getting from my on-premises server? Will it be lexible
enough for my business?” And so on. In my experience, the
newer Oracle Java Cloud Service solutions enable you to do all
that and more. </article>
learn more
Oracle Developer Cloud Service
Oracle Managed Cloud Services
65
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016
//contact us /
66
ORACLE.COM/JAVAMAGAZINE /////////////////////////////////////////////// MAY/JUNE 2016