Você está na página 1de 5

11.

7 Representing Discrete Categories: (1/8) Introduction


Introduction: Representing Discrete
Categories
When text labels are intended to represent
a finite set of possibilities, a cell array of
strings is unnecessary and utilizes more
memory. Instead, you can use a categorical
array.

11.7 Representing Discrete Categories: (3/8) Converting to and Operating on Categoricals


TASK
You can't compare items in a cell array Convert x into a categorical array named y.
using ==, nor can you use a histogram to
visualize the contents in a cell array. y = categorical(x)
However, you can do these things with a
categorical array.

You can convert a cell array to a categorical


using the categorical function.

catArray =
categorical(cellArray);
Categorical arrays take up less memory TASK
than cell arrays of strings. Issue the whos x y command to see how much memory each variable
uses.
whos x y
You can see the categories represented in a TASK
categorical array using Save the categories represented in y to cats.
the categories function on the categorical
array. cats = categories(y)
Categorical arrays allow the use of == for TASK
comparison. Create a variable named iC which contains values
of true corresponding to the values in y that equal C.
y == 'A'
ans = iC = y == 'C'
0 0 0 1 0
1 0
TASK INB = Y ~= 'B'
Create a variable named iNB which contains
values of true corresponding to the values
in y that are not equal to B.
11.7 REPRESENTING DISCRETE CATEGORIES: (5/8) PREMIER LEAGUE PLAYERS
TASK players.Position = categorical(players.Position)
Convert the Position variable
in players to a categorical array.
TASK ns = nnz(players.Position == 'striker')
Use the nnz function to count the number of
values that are striker in
the Position variable in players. Save the
result in a variable named ns.
You can view the category names with the TASK
number of elements in each category using Print a summary of the Position variable in players.
the summary function.
summary(players.Team) summary(players.Position)
Arsenal 3
Crystal Palace 1
Everton 1
Liverpool 1
Manchester City 1
Manchester United 1
Tottenham Hotspur 2

The mergecats function allows you to TASK


merge multiple categories into one, and In the Position variable
even give it a new name. merge attackingmidfielder, defensive midfielder,
and wingerinto a category named midfielder.
x = categorical([ 2 1 2 3 ])
x = So sánh:
2 1 2 3 players.Position =
y = mergecats(x,{'1','3'},'C') mergecats(players.Position,{'attacking
y = midfielder','defensive
2 C 2 C midfielder','winger'},'midfielder')

players.Postion = mergecats(players.Postion,{'attacking
midfielder','defensive
midfielder','winger'},'midfielder')

TASK players.Position = mergecats(players.Position,


In the Position variable {'striker','forward'},'forward')
merge striker and forward into a
category named forward.
11.7 Representing Discrete Categories: (6/8) Category Names and Ordinals
By default, the possible values and the TASK
names of the categories will be determined Convert x into a categorical array with category names represented
automatically from the data. You can specify by xlevels. Assume that there are four categories, 1, 2, 3, and 4, in x.
these as additional inputs to categorical, Store the result in a variable named y.
where the second input indicates the unique
category values in the original array, and the
third input indicates the names that y = categorical(x,1:4,xlevels)
correspond to these categories.

v = [ 10 5 0 0 ];
levels = { 'beg' 'mid' 'last' };
categorical(v,[0 5 10],levels)
ans =
last mid beg
beg
If your categories have an inherent ordering TASK
– for example, “low”, “medium”, and “high” – Create an ordinal variable named y with the data from x and levels
you can specify this with an optional represented by xlevels.
property 'Ordinal':
y = categorical(x,1:4,xlevels,'Ordinal',true)

v = [ 10 5 0 0 ];
levels = { 'beg' 'mid' 'last' };
c = categorical(v,...
[0 5 10],levels,...
'Ordinal',true)
c > 'mid'
ans =
1 0 0 0

This allows you compare the values using


operators such as greater than (>) and less
than (<).

Categorical arrays allow the use of == for TASK


comparison. Create a variable named iSmall which contains values
of true corresponding to the values in y that equal 'small'.
y == 'A';
ISMALL = Y == 'SMALL'

TASK IDX = Y > 'TINY'


Create a variable named idx which contains
values of true corresponding to the values
in y that are larger than tiny.
11.7 REPRESENTING DISCRETE CATEGORIES: (7/8) PREMIER LEAGUE 2015-16 SEASON
TASK teamInfo.ManagerNationality =
Convert the ManagerNationality variable categorical(teamInfo.ManagerNationality)
in teamInfo to a categorical.

TASK summary(teamInfo.ManagerNationality)
Print a summary of the categories
in ManagerNationality.
How many managers are from the United TASK
Kingdom (UK)? You can combine the UK Combine the ManagerNationality countries of England, Scotland,
managers into one category, then view and Wales into a category named UK.
the categories to see.
teamInfo.ManagerNationality =
mergecats(teamInfo.ManagerNationality,{'England','Scotlan
d','Wales'},'UK')

SUMMARY: REPRESENTING DISCRETE CATEGORIES


x is a cell array.

You can convert x into a categorical


array, y, using
the categorical function.

You can use == to create a logical array, nnz(x == 'C')


and count elements using nnz. ans =
3

You can view catgory statistics using summary(y)


the summaryfunction. A B C
2 2 3
11.8 Project - Organizing Data: (1/2) Tallest Buildings Exercise
TASK buildings = [ buildings ; newBuildings ]
Add the information in newBuildings to
the end of buildings using vertical
concatenation.
TASK buildings.Properties.VariableNames{'Height_feet_'} =
Change the name of the 'HeightFeet'
variable Height_feet_ in buildings to
HeightFeet.
TASK buildings = sortrows(buildings,'HeightFeet','descend')
Sort the values in buildings in order of
decreasing height.
You may double-click namesYears in the TASK
workspace to view the contents. Add YearOfCompletion in namesYears to the data in buildings.
buildings = join(buildings,namesYears)
Convert the City variable buildings.City = categorical(buildings.City)
of buildings to a categorical.

Print a summary of the City variable summary(buildings.City)


in buildings
buildings.City = mergecats(buildings.City,{'New York
TASK City','NYC'},'New York City')
In buildings.City, New York City
appears as
both New York City and NYC. Combine
the two categories into a category with
the name New YorkCity.
11.8 Project - Organizing Data: (2/2) All-Time Box Office Grosses
TASK movies = join(movies,release)
There are two tables in the workspace
named moviesand release. Merge them
so that the Release Datevariable is
added to the end of movies.
TASK movies = sortrows(movies,'ReleaseDate')
Plot the adjusted gross plot(movies.ReleaseDate,movies.AdjustedGross)
revenue, AdjustedGross, against the
release date, ReleaseDate in movies.

If plotting with a visible line, then put the


dates in order.

TASK movies.Studio = categorical(movies.Studio)


Convert the data in Studio to histogram(movies.Studio)
categorical. Then make a histogram of
the studios.
TASK Save titles for MGM Movies
Create a cell array
named mgmMovies containing the titles mgmIdx = movies.Studio == 'MGM'
for all movies released by the studio mgmMovies = movies.Title(mgmIdx);
MGM. disp('MGM Movies:')
disp(mgmMovies)
Determine the average release date for
all MGM movies and save it to a datetime Average year of release
variable named mgmARD.
mgmARD = mean(movies.ReleaseDate(mgmIdx));
disp(['Average MGM Release Date: ',char(mgmARD)])

Você também pode gostar