Data Objects and Attribute Types
Record
- Relational records
- Data matrix, e.g., numerical matrix, crosstabs
- Document data: text documents: term-frequency vector
- Transaction data
Graph and network
- World Wide Web
- Social or information networks
- Molecular Structures
Ordered
- Video data: sequence of images
- Temporal data: time-series
- Sequential Data: transaction sequences
- Genetic sequence data
Spatial, image and multimedia:
- Spatial data: maps
- Image data:
- Video data
Data Objects:Data sets are made up of data objects.
A data objectrepresents an entity.
Attributes
Attribute (ordimensions, features, variables): a data field, representing a characteristic or feature of a data object.
Nominal
Binary
Ordinal
Quantity(integer or real-valued)
Interval
Ratio
Basic Statistical Descriptions of Data
Motivation
Measuring the Central Tendency
Mean (algebraic measure) (sample vs. population)
Median
Mode
Measuring the Dispersion of Data
Quartiles, outliers and boxplots
Variance and standard deviation (sample:s, population: σ)
Five-number summary
Boxplot
Graphic Displays of Basic Statistical Descriptions
Boxplot
Histogram
Quantile plot
Quantile-quantile (q-q) plot
Scatter plot
Data dispersion characteristics
Numerical dimensions
Dispersion analysis on computed measures
Data Visualization
Categorization of visualization methods:
Pixel-oriented visualization techniques
Geometric projection visualization techniques
Icon-based visualization techniques
1.Chernoff Faces
2.Stick Figures
Hierarchical visualization techniques
1.Dimensional Stacking
2.Worlds-within-Worlds
3.Tree-Map
4.InfoCube
5.Three-D Cone Trees
Visualizing complex data and relations
Measuring Data Similarity and Dissimilarity
Similarity
Data matrix
Dissimilarity
Dissimilarity matrix
Proximity
Simple matching
A contingency table for binary data
Distance measure for symmetric binary variables
Distance measure for asymmetric binary variables
Jaccard coefficient
Jaccard coefficient is the same as “coherence”:
Standardizing Numeric Data
Minkowski distance: A popular distance measure
L-h norm:
h= 1: Manhattan distance
h = 2: (L2 norm)Euclidean distance
h →∞. “supremum”(Lmax norm, L∞norm) distance.
Ordinal Variables
Attributes of Mixed Type
Cosine Similarity
cos(d1,d2)=(d1•d2)/||d1||||d2||,