Background

Motivation

As it becomes increasingly clear that our habits are harming the environment, there is a growing need to support animals. I recently became interested in the field of eco-linguistics--the study of how language shapes the environment. This project explores, if, by performing sentiment analysis on the language of large government documents related to environmental policy, we can highlight sections with particularly strong sentiment that may inspire interesting further human analysis from an eco-lingustic perspective.

To demonstrate, a 150-page document originally prepared for the California Department of Transportation was used. The document concerns the relationship between bats and highway infrastructure (e.g., bridges and culverts). It includes background research, environmental rules and regulations, and recommendations to the department for future policy.

Basics of Sentiment Analysis

Sentiment analysis is a natural language processing technique used to determine if a section of text is positive or negative. For example, it's commonly applied to product reviews, customer feedback, and social media posts. TextBlob is a sentiment analysis Python library built on the popular NLTK (Natural Language Toolkit) library. TextBlob makes it simple to perform basic sentiment analysis by returning two insightful values given a string of text: subjectivity and polarity.

Polarity is the most important value. It represents the feeling implied by the text. Polarity ranges from -1 to 1, where -1 represents an extremely negative sentiment, and 1 represents an extremely positive sentiment. A value of 0 is neutral.

The Script

  1. Reads in the text file (the document to analyze)
  2. Separates the text into sentences
  3. Gets the polarity of each sentence
  4. Saves outlier values (negative sentiment of -0.5 or less and positive sentiment of 0.5 or more)
    Also does some text formatting and error handling for cleaner output

The Result

The overall sentiment is much more positive than negative. Only five sentences within the 150-page document had a polarity of -0.5 or less, while 74 sentences had a polarity of 0.5 or more.

Investigating the sentences with positive connotations reveals many false positives--sentences calculated to have a positive sentiment, but that actually have negative connotations.

For example, the sentence "In many cases, bats have adapted to roosting in transportation structures as a result of lost or degraded habitats" has a polarity of 0.5. This may suggest that the algorithm doesn't recognize habitat degradation as a negative outcome, but it could also imply that the language used here to describe habitat degradation has been softened. The surrounding context could be interesting to analyze through an eco-linguistic lens.

The results do demonstrate a limitation of the machine learning processes used. On eight separate occasions, sentences detailing a lack of sufficient research or information available are calculated to have a positive sentiment. For example, the sentences "However, the efficacy of this technique needs more research," "More research and monitoring are needed to fully assess the effectiveness of these mitigation structures," and "However, with an increased awareness of vehicular collisions with bats, more research is needed to understand this threat" each have a polarity of 0.5. The sentence "Even experienced bat biologists cannot guarantee when prescribed mitigation measures will work as intended" has a polarity of 0.8. All of these sentences can be considered false positives.

Conclusion

Sentiment Analysis can be used to easily parse through large documents, and may highlight sections of language that warrant further human analysis. The base machine learning algorithms used here often mistakenly categorize sentences related to insufficient research as positive, which may represent a pitfall for anyone using similar unaltered machine learning models to analyze academic sources related to the environment or otherwise.

Built With

Share this project:

Updates