this post was submitted on 03 Jul 2024

48 points (98.0% liked)

Star Trek Social Club

10726 readers

58 users here now

r/startrek: The Next Generation

Star Trek news and discussion. No slash fic...

Maybe a little slash fic.

New to Star Trek and wondering where to start?

Rules

1 Be constructive

All posts/comments must be thoughtful and balanced.

2 Be welcoming

It is important that everyone from newbies to OG Trekkers feel welcome, no matter their gender, sexual orientation, religion or race.

3 Be truthful

All posts/comments must be factually accurate and verifiable. We are not a place for gossip, rumors, or manipulative or misleading content.

4 Be nice

If a polite way cannot be found to phrase what it is you want to say, don't say anything at all. Insulting or disparaging remarks about any human being are expressly not allowed.

5 Spoilers

Utilize the spoiler system for any and all spoilers relating to the most recently-aired episode. There is no formal spoiler protection for episodes/films after they have been available for approximately one week.

6 Keep on-topic

All busmittions must be directly about the Star Trek franchise (the shows, movies, books, etc.). Off-topic discussions are welcome at c/Quarks.

7 Meta

Questions and concerns about moderator actions should be brought forward via DM.

Upcoming Episodes

Date	Episode	Title
11-28	LD 5x07	"Fully Dilated"
12-05	LD 5x08	"Upper Decks"
12-12	LD 5x09	"Fissure Quest"
12-19	LD 5x10	"The New Next Generation"
01-24	Film	"Section 31"

Episode Discussion Archive

In Production

Strange New Worlds (TBA)

Section 31 (2025-01-24)

Starfleet Academy (TBA)

In Development

Untitled theatrical film

Untitled comedy series

Wondering where to stream a series? Check here.

Allied Discord Server

founded 2 years ago

MODERATORS

ValueSubtracted@startrek.website

USSBurritoTruck@startrek.website

OpticalData@startrek.website

The number of lines for each character by percentage of the series (lemmy.ca)

submitted 6 months ago by danielquinn@lemmy.ca to c/startrek@startrek.website

13 comments fedilink hide all child comments

It would seem that I have far too much time on my hands. After the post about a Star Trek "test", I started wondering if there could be any data to back it up and... well here we go:

The Next Generation

Name	Percentage of Lines
PICARD	20.16
RIKER	11.64
DATA	10.1
LAFORGE	6.93
WORF	6.14
TROI	5.4
CRUSHER	5.11
WESLEY	2.32

DS9

Name	Percentage of Lines
SISKO	13.0
KIRA	8.23
BASHIR	7.79
O'BRIEN	7.31
ODO	7.26
QUARK	6.98
DAX	5.73
WORF	3.18
JAKE	2.31
GARAK	2.29
NOG	2.01
ROM	1.89
DUKAT	1.76
EZRI	1.53

Voyager

Name	Percentage of Lines
JANEWAY	17.7
CHAKOTAY	8.76
EMH	8.34
PARIS	7.63
TUVOK	6.9
KIM	6.57
TORRES	6.45
SEVEN	6.1
NEELIX	4.99
KES	2.06

Enterprise

Name	Percentage of Lines
ARCHER	24.52
T'POL	13.09
TUCKER	12.72
REED	7.34
PHLOX	5.71
HOSHI	4.63
TRAVIS	3.83
SHRAN	1.26

Discovery

Note: This is a limited dataset, as the source site only has transcripts for seasons 1, 2, and 4

Name	Percentage of Lines
BURNHAM	22.92
SARU	8.2
BOOK	6.21
STAMETS	5.44
TILLY	5.17
LORCA	4.99
TARKA	3.32
TYLER	3.18
GEORGIOU	2.96
CULBER	2.83
RILLAK	2.17
DETMER	1.97
OWOSEKUN	1.79
ADIRA	1.63
COMPUTER	1.61
ZORA	1.6
VANCE	1.07
CORNWELL	1.07
SAREK	1.06
T'RINA	1.02

If anyone is interested, here's the (rather hurried) Python used:

#!/usr/bin/env python

#
# This script assumes that you've already downloaded all the episode lines from
# the fantastic chakoteya.net:
#
# wget --accept=html,htm --relative --wait=2 --include-directories=/STDisco17/ http://www.chakoteya.net/STDisco17/episodes.html -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/Enterprise/ http://www.chakoteya.net/Enterprise/episodes.htm -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/Voyager/ http://www.chakoteya.net/Voyager/episode_listing.htm -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/DS9/ http://www.chakoteya.net/DS9/episodes.htm -m
# wget --accept=html,htm --relative --wait=2 --include-directories=/NextGen/ http://www.chakoteya.net/NextGen/episodes.htm -m
#
# Then you'll probably have to convert the following files to UTF-8 as they
# differ from the rest:
#
# * Voyager/709.htm
# * Voyager/515.htm
# * Voyager/416.htm
# * Enterprise/41.htm
#

import re
from collections import defaultdict
from pathlib import Path

EPISODE_REGEX = re.compile(r"^\d+\.html?$")
LINE_REGEX = re.compile(r"^(?P<name>[A-Z']+): ")

EPISODES = Path("www.chakoteya.net")
DISCO = EPISODES / "STDisco17"
ENT = EPISODES / "Enterprise"
TNG = EPISODES / "NextGen"
DS9 = EPISODES / "DS9"
VOY = EPISODES / "Voyager"


class CharacterLines:
    def __init__(self, path: Path) -> None:
        self.path = path
        self.line_count = defaultdict(int)

    def collect(self) -> None:
        for episode in self.path.glob("*.htm*"):
            if EPISODE_REGEX.match(episode.name):
                for line in episode.read_text().split("\n"):
                    if m := LINE_REGEX.match(line):
                        self.line_count[m.group("name")] += 1

    @property
    def as_percentages(self) -> dict[str, float]:
        total = sum(self.line_count.values())
        r = {}
        for k, v in self.line_count.items():
            percentage = round(v * 100 / total, 2)
            if percentage > 1:
                r[k] = percentage
        return {k: v for k, v in reversed(sorted(r.items(), key=lambda _: _[1]))}

    def render(self) -> None:
        print(self.path.name)
        print("| Name             | Percentage of Lines |")
        print("| ---------------- | ------------------- |")
        for character, pct in self.as_percentages.items():
            print(f"| {character:16} | {pct} |")


if __name__ == "__main__":
    for series in (TNG, DS9, VOY, ENT, DISCO):
        counter = CharacterLines(series)
        counter.collect()
        counter.render()

you are viewing a single comment's thread
view the rest of the comments

[–] Corgana@startrek.website 21 points 6 months ago (1 children)

Fascinating stuff I love that you did this. I'm surprised Morn didn't rank higher considering how chatty he is in every scene.

[–] ericjmorey@discuss.online 5 points 6 months ago

Number of lines vs number of words spoken vs length of time speaking probably would have a lot of variation in results.