Dimensions

The Dimensions app provides functionality for asking about dimension metadata, including distributions within a dimension over a dataset.

Registry

Import this module to get access to dimension instances.

from msgvis.apps.dimensions import registry
time = registry.get_dimension('time') # returns a TimeDimension
time.get_distribution(a_dataset)
msgvis.apps.dimensions.registry.register(dimensionClass, kwargs)[source]

Register a dimension

msgvis.apps.dimensions.registry.get_dimension(dimension_key)[source]

Get a specific dimension by key

msgvis.apps.dimensions.registry.get_dimensions()[source]

Get a list of all the registered dimensions.

msgvis.apps.dimensions.registry.get_dimension_ids()[source]

Get a list of all the dimension keys.

Models

msgvis.apps.dimensions.models.find_messages(queryset)[source]

If the given queryset is actually a Dataset model, get its messages queryset.

class msgvis.apps.dimensions.models.CategoricalDimension(key, name=None, description=None, field_name=None, domain=None)[source]

A basic categorical dimension class.

Attributes:

  • key (str): A string id for the dimension (e.g. ‘time’)

  • name (str): A nicely-formatted name for the dimension (e.g. ‘Number of Tweets’)

  • description (str): A longer explanation for the dimension (e.g. “The total number of tweets produced by this author.”)

  • field_name (str): The name of the field in the database for this dimension (defaults to the key)

    Related to the Message model: if you want sender name, use sender__name.

is_categorical()[source]

Return True for real categorical dimensions

Return True for real categorical dimensions

filter(queryset, **kwargs)[source]

Apply a filter to a queryset and return the new queryset.

exclude(queryset, **kwargs)[source]

Exclude some points from a queryset and return the new queryset.

group_by(queryset, grouping_key=None, values_list=False, values_list_flat=False, **kwargs)[source]

Return a ValuesQuerySet that has been grouped by this dimension. The group value will be available as grouping_key in the dictionaries.

The grouping key defaults to the dimension key.

messages = dim.group_by(messages, 'value')
distribution = messages.annotate(count=Count('id'))
print distribution[0]
# { 'value': 'hello', 'count': 5 }
select_grouping_expression(queryset, expression)[source]

Add an expression for grouping to the queryset’s SELECT. Returns the queryset plus the alias for the expression.

For categorical dimensions this is a no-op. Beware if your expression refers to a related table!

get_domain(queryset, **kwargs)[source]

Get the list of values of the dimension, either in natural order or sorted by frequency. The values will be drawn from the queryset.

get_domain_labels(domain)[source]

Return a list of labels corresponding to the domain values

get_grouping_expression(queryset, **kwargs)[source]

Given a set of messages (possibly filtered), returns a string that could be used with QuerySet.values() to group the messages by this dimension.

class msgvis.apps.dimensions.models.ChoicesCategoricalDimension(key, name=None, description=None, field_name=None, domain=None)[source]

A categorical dimension where the values come from a choices set.

Don’t use for related fields.

class msgvis.apps.dimensions.models.RelatedCategoricalDimension(key, name=None, description=None, field_name=None, domain=None)[source]

A categorical dimension where the values are in a related table, e.g. sender name.

Currently doesn’t really do much beyond CategoricalDimension.

Return True for related categorical dimensions

class msgvis.apps.dimensions.models.QuantitativeDimension(key, name=None, description=None, field_name=None, default_bins=50, min_bin_size=1)[source]

A generic quantitative dimension. This works for fields on Message or on related fields, e.g. field_name=sender__message_count

get_range(queryset)[source]

Find a min and max for this dimension, as a tuple. If there isn’t one, (None, None) is returned.

get_grouping_expression(queryset, bins=None, bin_size=None, **kwargs)[source]

Generate a SQL expression for grouping this dimension. If you already know the bin size you want, you may provide it. Or the number of bins.

select_grouping_expression(queryset, expression)[source]

Add an expression for grouping to the queryset’s SELECT.

Returns a queryset, grouping_key tuple. The grouping_key could be used in values to identify the grouping expression.

group_by(queryset, grouping_key=None, bins=None, bin_size=None, **kwargs)[source]

Return a ValuesQuerySet that has been grouped by this dimension. The group value will be available as grouping_key in the dictionaries.

The grouping key defaults to the dimension key.

If num_bins or bin_size is not provided, an estimate will be used.

messages = dim.group_by(messages, 'value', 100)
distribution = messages.annotate(count=Count('id'))
print distribution[0]
# { 'value': 'hello', 'count': 5 }
class msgvis.apps.dimensions.models.RelatedQuantitativeDimension(key, name=None, description=None, field_name=None, default_bins=50, min_bin_size=1)[source]

A quantitative dimension on a related model, e.g. sender message count.

class msgvis.apps.dimensions.models.TimeDimension(key, name=None, description=None, field_name=None, default_bins=50, min_bin_size=1)[source]

A dimension for time fields on Message

class msgvis.apps.dimensions.models.TextDimension(key, name=None, description=None, field_name=None, domain=None)[source]

A dimension based on the words in a text field.

Return True for related categorical dimensions

class msgvis.apps.dimensions.models.DimensionKey(*args, **kwargs)[source]

Dimension names for research questions.

key = None

The id of the dimension