Inspiration

was inspired by a Reddit challenge to do some data handling, which i thought was fairly cool and used some data supplied openly by Data.Iowa.gov about alcohol statistics withing the state. it was very comprehension and was an oppurtunity to learn more about my shortcomings, i did initially assume that i would be beaten by the task and this event was an excellent place to be being beaten by a program

What it does

breaks, mostly., the original plan was to read the data into objects and then use an array of the objects to do some data manipulation, answer the question set out by the reddit challenge which were: What's the most popular non-beer beverage bought in 2016? What store has made the most profit (the difference between the state cost per bottle and the sales price per bottle times the quantity of all bottles sold)? What are the top types of alcohol commonly bought together? (e.g. "wine and tequila") What day of the week sees the most vodka sales? Which streets in Iowa are really Beer Street and Gin Lane? Where in the world is all of that root beer schnapps going? link

How I built it

with java, and time, started with the CSV file of all the data, which comes out to 3.5gig, this proved to be a challenge to even open on windows, as things just didnt seem to want to work with it, however it was made clear that i could at least preview the system somewhat jankily through excel without actually loading the file into anything, once i was able to preview it i could determine the fields to construct the object with(mistake 1) this would be tedious and long to manufacture, although intellij helped with its autocomplete abilities, the other problem was that the file was 13 Million rows, this would mean that it was un reasonable to put into an array, but at the time i didnt know this and thought i have so much ram, this cant possibly be a problem, boy was i wrong. uhm well, the first time i ran the program successfully it crashed after pinning my processor for 100% for 5 minutes and using 6gig of ram, which was spectacular to watch, there was also a small attempt at an AWS instance in EC2 however this also failed fairly spectacularly also. eventually i just settled on this will be the program that breaks itself as i had run out of time by the time id even gotten that far

Challenges I ran into

learning openCSV, this was a new thing to me and i had never done it, i got some help by someone who had used it and from there was able to figure the rest out, they were very helpful in telling me where i had gone wrong

Accomplishments that I'm proud of

i learnt a lot about java, lots about using CSV files in java and that deliminating through them is a pain, better done with the help of openCSV, although this is also a less than clever approach when coupled with lack of experience

What I learned

data handling with large files, needs a lot more care than i knew how to employ, there are much faster languages and tools for what i want to do, and doing it the way i did was inefficient and designed to fail

What's next for myjavathateats

make it less hungry, i hope, rewrite using file streamers and more efficient structuring so that there is less resources being used. make use of the while looping to make up a less barogue and more "dynamic" system for manipulating the data.

Built With

  • friends
  • https://data.iowa.gov/economy/iowa-liquor-sales/m3tr-qhgy
  • intellij-idea
  • java
  • opencsv
Share this project:

Updates