Common frequency distribution methods#

Method

Discription

fdist = nltk.FreqDist(text)

freq. dist. object

fdist.pprint()

print

fdist[‘exmple’]

get count

fdist.freq(‘example’)

get freq

fdist.N()

Total number of samples

fdist.keys()

keys in desc order of freq

for text in fdist

iterate

fdist.max()

key with max freq

fdist.tabulate()

tabulate

fdist.plot()

plot of freq dist

fdist.plot(cumulative=True)

cumulative plot of freq dist

fdist1 < fdist2

compare

  1. tokenize

  2. FreqDist

  3. findall

  4. pprint

  5. freq

  6. plot

  7. Text Corpora / Corpus

  8. pretty table

[12]:
## frequency distribution
fdist = nltk.FreqDist(text1)

## print([(text,fdist[text]) for text in fdist])
[13]:
fdist.pprint()
FreqDist({',': 18713, 'the': 13721, '.': 6862, 'of': 6536, 'and': 6024, 'a': 4569, 'to': 4542, ';': 4072, 'in': 3916, 'that': 2982, ...})
[14]:
fdist['Sunday']
[14]:
7
[15]:
fdist.freq('Sunday')
[15]:
2.683853553613809e-05
[16]:
fdist.N()
[16]:
260819
[17]:
## fdist.keys()
[18]:
fdist.max()
[18]:
','
[19]:
## fdist.tabulate()
[20]:


fdist = nltk.FreqDist(text) fdist.plot()
../_images/Concepts_NLTKmethods_9_0.png
[20]:
<AxesSubplot:xlabel='Samples', ylabel='Counts'>
[21]:
fdist.plot(cumulative=True)
../_images/Concepts_NLTKmethods_10_0.png
[21]:
<AxesSubplot:xlabel='Samples', ylabel='Cumulative Counts'>
[22]:
## compare
## fdist1 < fdist2

Conditional Frequency#

Conditional Frequency is Frequency Distribution based on conditions.

CFD : Conditional Frequency Distribution
[26]:
cItems = [
    ('F','apple'),
    ('F','apple'),
    ('F','kiwi'),
    ('V','cabbage'),
    ('V','cabbage'),
    ('V','potato')
]
cfd = nltk.ConditionalFreqDist(cItems)
[27]:
cfd.conditions()
[27]:
['F', 'V']
[28]:
cfd['F']
[28]:
FreqDist({'apple': 2, 'kiwi': 1})
[29]:
cfd['V']
[29]:
FreqDist({'cabbage': 2, 'potato': 1})

Method

Description

cfdist = ConditionalFreqDist(pairs)

create

cfdist.conditions()

show conditions

cfdist[condition]

freq distribution for the condition

cfdist[condition][sample]

freq for the given condition

cfdist.tabulate()

tabulate

cfdist.plot()

plot of freq dist

cfdist.plot(cumulative=True)

cumulative plot of freq dist

cfdist1 < cfdist2

compare