Corpus

The Corpus app is concerned with the representation of raw message data and its associated metadata.

Models

class msgvis.apps.corpus.models.Dataset(*args, **kwargs)[source]

A top-level dataset object containing messages.

name = None

The name of the dataset

description = None

A description of the dataset.

created_at = None

The datetime.datetime when the dataset was created.

start_time = None

The time of the first real message in the dataset

end_time = None

The time of the last real message in the dataset

get_example_messages(filters=[], excludes=[])[source]

Get example messages given some filters (dictionaries containing dimensions and filter params)

class msgvis.apps.corpus.models.MessageType(*args, **kwargs)[source]

The type of a message, e.g. retweet, reply, original, system...

name = None

The name of the message type

class msgvis.apps.corpus.models.Language(*args, **kwargs)[source]

Represents the language of a message or a user

code = None

A short language code like ‘en’

name = None

The full name of the language

class msgvis.apps.corpus.models.Url(*args, **kwargs)[source]

A url from a message

domain = None

The root domain of the url

short_url = None

A shortened url

full_url = None

The full url

class msgvis.apps.corpus.models.Hashtag(*args, **kwargs)[source]

A hashtag in a message

text = None

The text of the hashtag, without the hash

class msgvis.apps.corpus.models.Media(*args, **kwargs)[source]

Linked media, e.g. photos or videos.

type = None

The kind of media this is.

media_url = None

A url where the media may be accessed

class msgvis.apps.corpus.models.Timezone(*args, **kwargs)[source]

The timezone of a message or user

olson_code = None

The timezone code from pytz.

name = None

Another name for the timezone, perhaps the country where it is located?

class msgvis.apps.corpus.models.Person(*args, **kwargs)[source]

A person who sends messages in a dataset.

dataset

Which Dataset this person belongs to

original_id = None

An external id for the person, e.g. a user id from Twitter

username = None

Username is a short system-y name.

full_name = None

Full name is a longer user-friendly name

language

The person’s primary Language

message_count = None

The number of messages the person produced

replied_to_count = None

The number of times the person’s messages were replied to

shared_count = None

The number of times the person’s messages were shared or retweeted

mentioned_count = None

The number of times the person was mentioned in other people’s messages

friend_count = None

The number of people this user has connected to

follower_count = None

The number of people who have connected to this person

profile_image_url = None

The person’s profile image url

class msgvis.apps.corpus.models.Message(*args, **kwargs)[source]

The Message is the central data entity for the dataset.

dataset

Which Dataset the message belongs to

original_id = None

An external id for the message, e.g. a tweet id from Twitter

type

The MessageType Message type: retweet, reply, origin...

sender

The Person who sent the message

time = None

The datetime.datetime (in UTC) when the message was sent

language

The Language of the message.

sentiment = None

The sentiment label for message.

timezone

The Timezone of the message.

replied_to_count = None

The number of replies this message received.

shared_count = None

The number of times this message was shared or retweeted.

contains_hashtag = None

True if the message has a Hashtag.

contains_url = None

True if the message has a Url.

contains_media = None

True if the message has any Media.

contains_mention = None

True if the message mentions any Person.

urls

The set of Url in the message.

hashtags

The set of Hashtag in the message.

media

The set of Media in the message.

mentions

The set of Person mentioned in the message.

text = None

The actual text of the message.