Person With Probability 69%*, or Fun With Machine Learning and Telegram Bots

Another week, another post. That’s what I would say if I’ve been consistently posting these like I implied I would in the last post. Regardless, in starting this post I was almost foiled by the annoying editor changes of WordPress.

The new editor no longer has the strike-through option front and centre. How am I supposed to use my favourite syntactic sugar now? Where is it even. Ah, there we go. I’m sure people who blog enough to have reasonable feelings about the editor will have something better to say than I. However, it is another motivation to get going with my Hugo exploration.

Onto the topic for today

A few weeks ago I was ruminating about the poor state of motion detection in home security cameras with colleagues (notwithstanding the numerous selling my privacy to the highest bidder functional cloud solutions). Someone mentioned they want to setup a person detector on one of their cameras and have it notify them when a person is found. I was interested, but dismissed it as probably being a bit too difficult to implement over a weekend, I’m not quite up to the latest hip and happening things in machine learning.

I cut my teeth on deep learning when debates were still raging on “whether this TensorFlow thing was worth the effort” and if you should use Keras or TFLearn ontop of it. Wisely, I chose TFLearn.

For most of my professional applications the AI winter remains frosty and littered with dubious, half-understood PoC journal articles, so I’ve been out of the loop a bit doing different things and just keeping tangentially aware of developments. Imagine my surprise when a brief search on Thursday told me I can literally just git clone a repo, feed an IP cam stream into it, and get a working object detector. I am sometimes in awe of what people make publicly accessible.

Suddenly we are thrust from the murky realm of maybe into the sleep depriving reality of “I can do this before the end of the weekend”, and the result is the mishmash you can find in this GitHub repo:

The code operates as two distinct entities: the object detector, and the telegram bot. The object detector is strongly derived from that provided in the original YOLOv3 repo, and needs quite a bit more work before I will be fully satisfied with it. The Telegram bot is vastly expanded from some standard boilerplate code, I’m insufferably pleased with it and all the little jokes nobody other than my wife and I will likely ever see. Unless you also run it. Please don’t judge the numerous sleeps scattered about, it’s so I can have the pleasure of having the bot show as typing at me.

The detector runs quite well when dealing with h.264 streams, in my case, three 1080p25 and one 1520p25 stream. My cameras originally ran h.265, but my VM could not reliably decode all streams without hardware assistance (VM is running okay with 4 threads on a Xeon E3-1230v2 as seen below). All streams are processed in about 5 to 7 seconds when using 608x352 input size to the network. Using 416x256 bumps that down to below 5 seconds in most instances.

I would like to go into a bit more depth, looking at the code snippet by snippet, but I’m a bit exhausted from the weekend, and looking forward to some much needed sleep. To tide you over, here are some images from the bot:

* the Niceā„¢ multiple of three in the title comes from one of the first images classified of myself by the detector