Find out if You've Been Naughty or Nice Based on Your Reddit Comments
Shipyard Uses Shipyard

Find out if You've Been Naughty or Nice Based on Your Reddit Comments

Steven Johnson
Steven Johnson

In the past year, I've had the opportunity to announce some groundbreaking integrations with Shipyard like Hex and Tableau, but we have waited until the end of the year to announce our biggest partner yet. The big guy has finally given us the go ahead to share the big news.

Santa contacted us earlier in the year and mentioned that he was having a hard time monitoring comments on Reddit for his naughty or nice list. We told him that we would be happy to help. We wanted to share the tool with you now, so you can get a jump start on expectations for gifts or coal.

Below, I am going to go into detail on how we built this out for Santa using webhook parameters in Shipyard along with Hex. However, if you want to try it out before we get into the technical details, head over to Hex and enter in your Reddit username and email and click Run. You will receive Santa’s decision in an email along with a CSV of your comments.

Getting Reddit Comments

Reddit is a unique place where individuals from around the world can come together and share ideas on subreddits. To train the logistic model for Santa's task, we knew that we would need comments directly from Reddit. We used comments that were sourced from Reddit by Social Grep. To get a variety of comments, we pulled from three different sets of subreddits:

  • IRL (/r/meirl and /r/me_irl)
  • Confessions (/r/trueoffmychest, /r/confession, /r/confessions, and/r/offmychest)
  • /r/datasets

After downloading the datasets, I used Python code Vessels in Shipyard to clean the data by getting rid of links and subreddit calls in the comments because they would not affect Santa's judgement. I also got rid of shorter comments and comments from bots to make the training data better.

After the comments from each set of subreddits were cleaned, I merged them together and sent them over to a Google Sheet to allow the elves at Shipyard to begin to work on them.

Determining what is Naughty or Nice

Our initial plan to train the model was to run the comments through sentiment analysis and use those values to train the model. However, Santa shot that idea down because he needs naughty or nice not positive or negative. He informed us that the following items being referenced in a post would make it naughty:

  • Anything sexual
  • Aggression (Verbal, Emotional, Physical)
  • Racism
  • Drugs or Alcohol
  • Hate
  • Bullying/Threats
  • Dark Web
  • Gambling
  • Cursing

Our elves tagged the downloaded comments as naughty or nice. Once all comments were classified, we went back and checked twice just to be sure. After using the data to train a Scikit-Learn Logistic Regression model, we exported the model to be ready for Santa's task.

Using Reddit's API to Pull a User's Comments

Using the PRAW Reddit API Python package, we are pulling the 50 most recent comments by user. After the API finishes grabbing the comments, they are run through the trained Logistic Regression model. Each comment is scored from 0 to 1 based on the probability of the comment being nice.

import praw

reddit = praw.Reddit(
    client_id="YOUR_REDDIT_CLIENT_ID",
    client_secret="YOUR_REDDIT_CLIENT_SECRET",
    user_agent="Comment Extraction (by u/YOUR_REDDIT_USERNAME)",
)

username = reddit_username
redditor = reddit.redditor(username)

comments = redditor.comments.new(limit=50)

user_comments = []
for comment in comments:
    user_comments.append(comment.body)

Scoring the User's Comments

Now that we have the last 50 comments for the user. We need to tell how naughty or nice they were. To do this, we use the model that we generated earlier to loop through the comments to give a probability of each comment being nice.

user_scores = []
for comment in user_comments:
    score = loaded_model.predict_proba(vectorizer.transform([comment]))[0][0]
    user_scores.append(score)

After scoring each comment, we find the average probability for the user. If the average probability was less than .5, we reported the user as naughty to Santa. Scores greater than .5 were sent to Santa as nice.

We knew that we would need to go a little bit further for the general public to be able to use Santa's system. Combining the power of Shipyard's webhook parameters and Hex's application view allows us to send results for any Reddit username to any email address.

Combining an Interface with Webhook Parameters

The Hex interface pictured at the beginning of the blog post is put together with 2 entry fields, 1 button, and 1 block of code.

We use Hex's text input fields to allow the user to provide their Reddit username and email address. These two values are stored as variables to be sent using a webhook parameter.

The run button is stored as a variable that is used in an if block. When the run button is pressed, the if block executes. The block contains a Python request call to a Shipyard Fleet Webhook. The username and password are stored in json to be sent with the Webhook. This allows us to input those values into the code that Santa used for anyone across the world.

import requests
if run_button:
    json_data = {
        'username': f'{reddit_username}',
        'email': f'{email}',
    }

    username_error = 'We need your Reddit username to check if you are naughty or nice!'
    email_error = 'We need your email to send you your results!'
    results_message = f'The results for {reddit_username} are being emailed to {email}!'

    if reddit_username == '':
        message = username_error
    elif email == '':
        message = email_error
    else:
        response = requests.post(
            'SHIPYARD_WEBHOOK_URL',
            json=json_data,
        )
        message = results_message
else:
    message = 'We will wait on your inputs by checking our list twice.'

Sending your Results

We used the following code to get the values from the json code sent in the webhook:

import json
import os

body_path = os.getenv('SHIPYARD_WEBHOOK_BODY_FILE')
with open(body_path) as f:
	body = json.load(f)
reddit_username = body['username']
user_email = body['email']

With the username from Hex, we can grab the most recent 50 comments from the specific user and give them a score as I described above. Santa already knows the results of the model. Now, we needed to get them to you.

Each comment of the user along with the score that the model gave it is stored in a CSV. The CSV along with the overall score of the user is emailed to them. Here is the code block that we are using to send that email with the file is attached:

def send_mail(subject,body,send_from,send_to,file1,smtpServer,smtpPort,username,password,isTls=True):
    """Sends a email with attachment.
      :param subject: Email subject.
      :param send_from: From email address.
      :param send_to: To email address.
      :param file: Filename(Absolute path) to attached as part of email.
      :param smtpServer: SMTP server address.
      :param smtpPort: SMTP server port.
    """
    fileName=file1
    sendfrom=send_from
    send_to=send_to
    server=smtpServer
    port=smtpPort
    text = body
               
    msg = MIMEMultipart()
    msg['From'] = sendfrom
    msg['To'] = send_to
    msg['Date'] = formatdate(localtime = True)
    msg['Subject'] = subject
    msg.attach(MIMEText(text,"html"))
    part = MIMEBase('application', "octet-stream")
    part.set_payload(open(fileName, "rb").read())
    encoders.encode_base64(part)
    part.add_header('Content-Disposition', 'attachment; filename='+fileName)
    msg.attach(part)
    smtp = smtplib.SMTP(server, port)
    if isTls:
        smtp.ehlo()
        smtp.starttls()
        smtp.login(username,password)
        smtp.sendmail(msg['From'], [msg['To']], msg.as_string())
        smtp.quit()


send_mail(
    subject,
    body,
    'YOUR_EMAIL',
    user_email,
    f'{reddit_username}_comments.csv',
    'smtp.gmail.com',
    587,
    'YOUR_EMAIL',
    'YOUR_EMAIL_PASSWORD',
    isTls=True)

If you want to see the three Fleets that were used to build this project, sign up for our free developer plan, and then head here to check them out in Shipyard.

If you want to use the Fleets as a starting point for your own exploration, navigate to the YAML editor and use this guide to learn how to build with our configuration files.