#100Devs Asset Manager

What's the point of data if it's not organized?

The target of this organization-spree was the #100Devs classes - specifically the streams themselves. There are a number of assets linked to each class that I use to keep notes, so I can easy reference what I'd forgotten:

Exported Twitch Chapters
Chat History
Captions
Links
Date

Some of these are barely worth mentioning - Date and Links totaled are less then a tweet - while others are quite sizable - the chat JSONs are anywhere from 10 to 60 MB. Not only did I desire to keep all these assets organized together, but also I desired a easy way to search all of these assets at once, and be given a direct timestamped-link to the relevant time.

Too lazy for a frontend

It had been a while since I used inquirer.js, so I decided to combine it with some TS and start parsing the classes. Of course I needed something to parse first, and while I had the assets generally together, I needed to enforce uniformity to prepare them for parsing, which led to this file structure:

YYYY-MM-DD
- chapters
- links
- chat.json
- captions
  - YOUTUBE_TITLE.en.vtt
  - YOUTUBE_TITLE.en.txt

With all my files organized, I only needed to start the parsing - only thing that I needed to manually work with was the en.txt caption files, as the script I found to transform the VTT into minute-delimited blocks of text wasn't perfect, which led to me writing a bit more then the rest to parse them into my SecondMaps - Map<number, string> - which I converted most of the other time-based content into also - Chapters and Chats.

Missing-Asset Detector

A feature I hadn't thought of until I was half way through was the fact that this could help me remember which assets I still needed to gather, so I wrote a basic generator that would yield a reason-string for every missing asset for the class.

There was one discrepancy of the assets: office hours were missing both the captions directory and a YouTube link - as they're not being uploaded to YouTube, and my captions are actually YouTube automatically-generated captions. Thankfully the schedule for them is extremely consistent, so I only needed to branch some of these checks if the getDay() was 0, aka Sunday.

Performance

Finally I had it working - I'd ask what to search for, which assets to search, if to be case sensitive, and finally do the searching and display the results per class with the link to the Twitch (or YouTube) video right beside it.

But there was an issue, the entire point of this is to be faster then me manually running grep commands, and while I'm already introducing user-delays with the inquirer.js prompts, a much larger and more notable delay was the 15 seconds to parse all the classes. This would not do, so I started up ndb at it and profiled the program - finding out that my parsing of chat.json files was taking 25% of the time!

At this point I decided to start thinking performantly, and realized that I don't need assets I'm not searching through - so I updated my fetchClasses() to leave these file-dependant assets undefined, and moved said population to after the user has selected which assets to search through. Now not only is the delay non-existent, but the delay rarely exists as all - as chats were the primary cause of said delay, and when I'm searching chats I'm not usually searching other assets due to their great volume.

I did move the missing-asset detector to another file, but this was only because I didn't want to rewrite the reason generator function to check if files exist rather then if the properties info object were falsy.

Class Selection

Now I had another issue - too much raw information! The results would simply be dumped line-by-line into the console, which gave me no real scope of how many matches there were per class. So I added another inquirer.js prompt, this one to have the user choose which classes to view the information of, and I suffixed the classes with the number of each kind of matches.

This helped, though I still felt like this could be cleaner, though I had no actionable ideas as for how to make it cleaner at that moment, but even so it was a great improvement to my existing workflow.

Polishing

I first moved the missing-asset-detection back into the main program, swapping the intensive parsing for simple file existence checks. Lastly I had initially generated the timestamps in seconds format, which while it did actually work both on Twitch and YouTube, it's less valuable then a link that has the segments delimited by h, m, and s characters - one can simply look at the link and know where in the video the time is without having to click it or do mental math.