So another useful sort of
subcategory of record data is document data. So in this case, it
kind of is somewhat similar to a data matrix. Every term, every entry,
every data attribute has a numeric value. But in this case,
we’ve got counts, we’ve got discrete values. So in this case, what we have
here is each row, each data object, is
represented by what we think of as what we
call a term vector. So this term vector in
this case and there’s several ways you can do
it, but in this case, it just counts the
number of times a given word appears
in the document. So document 1 has team appear
three times, play appear five, but coach appear none. Document 2, on the other hand,
has coach appear seven times, but never has play appear over
the course of the document. So because these attributes
are all discrete, because they’re all
integer attributes, we can do different
kinds of things, different kinds of algorithms
and processing methods are more appropriate than data
matrices or mixed data is. All right, so the
last special kind of record data that we’re
going to talk about here is transaction data. So this shares some
similarities to document data. And you can use some
of the same analysis. But there’s different
semantics around it as well. So transaction data is
exactly what it sounds like. It’s record data where each
record involves a set of items. So if we’re at a
grocery store, the set of products purchased
by a customer during one shopping trip
constitutes a transaction. And the individual products that
were purchased are the items. So the difference between
this and document data is that usually these items
have more information than just a count associated with them. So not only is it bread, there’s
a price associated with that, there’s maybe an
inventory stock associated with that, how many are left,
all of those sorts of things. So we can do sort of things
similar to document analysis, but there’s other
sorts of information we have to consider as well. So that’s transaction data.

Leave a Reply

Your email address will not be published. Required fields are marked *