Escolar Documentos
Profissional Documentos
Cultura Documentos
John Feilmeier
Kutztown University
john.feilmeier@gmail.com
RapidMiner provides a convenient and thorough tutorial RapidMiner accepts CSV (Comma Separated Values) and
system. In the “Help” panel window, there is information WEKA’s ARFF (Attribute-Relation File Format)
about the selected process and a link to an interactive formatted files, as well as a plethora of other file types
tutorial. For the predictive operators, the modelling and database sources, including Twitter and Salesforce.
technique is explained, and its use is outlined. This is
handy for beginners and scholastic users, as the concepts Operators can be dragged from the “Operators” panel or
behind the operators are often as important as the results. inserted in the workflow by right-clicking. Each operator
Home users that sign up for a RapidMiner account will widget has specific input and output ‘ports’, which are
also receive invitations to webinars about current data detailed in the “Help” panel. When a process is complete,
mining techniques and trends. the “Run” button is pressed, and each operator in turn is
completed. If any operators are missing a connection, a
2.3 Data Visualization warning pops up and the process will not run. If an
operator in the workflow is not being used, it must be
The visualization of a data set is where RapidMiner really deactivated to allow the process to run. Some operators
shines. In the “Results” view, a smorgasbord of graph will only handle attributes of a certain type; this
styles will provide the scholastic user with informative information is not readily obvious until a run is attempted.
visual confirmation of attribute relationships, and the This caused great frustration with some operators, as a
graphs can be exported easily for presentation. There is process is run, an attribute changed, and another error
also an “Advanced Charts” option (fig. 2), where the user message appears.
can create their own chart style, choosing attributes,
colors, shapes, etc. The execution of some operators may hang up on larger
datasets; the author could not get some of the Rule
2.4 Attribute Selection operators to complete. No error is presented, but the
progress halts and must be terminated in the program. The
RapidMiner provides some operators to automatically operator can then be replaced with a different one, or the
remove attributes. “Remove Useless Attributes,” when data changed and other attempts made. While the
applied to the USGS stream dataset, did clean some of the operator’s execution stalled, the program itself did not
actually useless attributes (agency_code and tz_code) but crash.
also removed attributes that play an important role in
dissolved Oxygen levels (TimeOfDay). “Remove Managing the connections between different operators
Correlated Attributes” also helped to clean the data. The can be frustrating. One must hover over the correct thread
and click the “X” or right click and remove the
connection, and the connecting threads between multiple warnings from the workflow execution. The font seems a
connections tend to overlap. bit small and cannot obviously be changed in the GUI.
3.6 Result Presentation Orange has a variety of tutorials and documentation for
the beginner user of its visual programming software.
The results of the prediction methods must be scored, There is a text tutorial for initial setup, a series of videos
which produces an ungainly confusion matrix for larger covering the use of Orange and some theory behind the
datasets. This graph also contains Accuracy measures. different “widgets,” and detailed documentation for each
of these data operators, including examples of their use.
3.7 Impressions The videos are quick and clear in their description. There
is a blog which covers different techniques and new
It seems from the tutorial videos that this platform is widgets and their use. Orange also provides paid training
capable of very involved and thorough data analytics, but courses covering different aspects of data mining and
the lackluster visual presentations and minimal graphing analytics.
capabilities, combined with the ponderous GUI and
similarity to the more beginner-friendly RapidMiner leave 4.3 Data Visualization
KNIME Analytics Platform in the do-not-use category.
Orange has many different graphing and visualization
widgets. These are simply connected to the “File”
operator and clicked on to view the results. The graphs are
4. Orange
colorful and informative, allowing the user to customize
shape, color, and size. Switching between types of graph
“Data Mining, Fruitful and Fun. Open source machine is a bit cumbersome, each representation requiring a
learning and data visualization for novice and expert. different operator.
than “Fun.” Each operator/widget has an “input” side and
an “Output” side where appropriate. Clicking and