Categorizing Posts On Tumblr

Categorizing Posts on Tumblr

Millions of posts are published on Tumblr everyday. Understanding the topical structure of this massive collection of data is a fundamental step to connect users with the content they love, as well as to answer important philosophical questions, such as “cats vs. dogs: who rules on social networks?”

As first step in this direction, we recently developed a post-categorization workflow that aims at associating posts with broad-interest categories, where the list of categories is defined by Tumblr’s on-boarding topics.

Methodology

Posts are heterogeneous in form (video, images, audio, text) and consists of semi-structured data (e.g. a textual post has a title and a body, but the actual textual content is un-structured). Luckily enough, our users do a great job at summarizing the content of their posts with tags. As the distribution below shows, more than 50% of the posts are published with at least one tag.

image

However, tags define micro-interest segments that are too fine-grained for our goal. Hence, we editorially aggregate tags into semantically coherent topics: our on-boarding categories.

We also compute a score that represents the strength of the affiliation (tag, topic), which is based on approximate string matching and semantic relationships.

Given this input, we can compute a score for each pair (post,topic) as:

image

where

w(f,t) is the score (tag,topic), or zero if the pair (f,t) does not belong in the dictionary W.

tag-features(p) contains features extracted from the tags associated to the post: raw tag, “normalized” tag, n-grams.

q(f,p) is a weight [0,1] that takes into account the source of the feature (f) in the post (p).

The drawback of this approach is that relies heavily on the dictionary W, which is far from being complete.

To address this issue we exploit another source of data: RelatedTags, an index that provides a list of similar tags by exploiting co-occurence patterns. For each pair (tag,topic) in W, we propagate the affiliation with the topic to its top related tags, smoothing the affiliation score w to reflect the fact these entries (tag,topic) could be noisy.

image

This computation is followed by filtering phase to remove entries (post,topic) with a low confidence score. Finally, the category with the highest score is associated to the post.

image

Evaluation

This unsupervised approach to post categorization runs daily on posts created the day before. The next step is to assess the alignment between the predicted category and the most appropriate one.

image

The results of an editorial evaluation show that the our framework is able to identify in most cases a relevant category, but it also highlights some limitations, such as a limited robustness to polysemy.

We are currently looking into improving the overall performances by exploiting NLP techniques for word embedding and by integrating the extraction and analysis of visual features into the processing pipeline.

Some fun with data

What is the distribution of posts published on Tumblr? Which categories drive more engagements? To analyze these and other questions we analyze the categorized posts over a period of 30 days.

Almost 7% of categorized posts belong to Fashion, with Art as runner up.

image

The category that drives more engagements is Television, which accounts for over 8% of the reblogs on categorized posts.

image

However, normalizing by the number of posts published, the category with the highest average of engagements per post isGif Art, followed by Astrology.

image

Last but not least, here are the stats you all have been waiting for!! Cats are winning on Tumblr… for now…

image

More Posts from Philosophical-amoeba and Others

7 years ago

i think the coolest thing would be to see a new color


Tags
7 years ago
The Hippocratic Oath Is One Of The Most Famous Pieces Of Medical Writing, And It Includes Some Of The
The Hippocratic Oath Is One Of The Most Famous Pieces Of Medical Writing, And It Includes Some Of The

The Hippocratic Oath is one of the most famous pieces of medical writing, and it includes some of the basic ethical guidelines for medical practitioners. It is also constantly evolving. The images above come from a version of the oath that we found in the 1634 edition of Peter Lowe’s surgical text. If you compare it to this example of a modern version, you’ll notice some similarities and some differences. Both of them emphasize respecting the work of prior physicians and protecting the patient’s privacy. On the other hand, the modern oath doesn’t begin with an invocation to the gods, and it makes no mention of refusing to assist in abortions or any type of treatment that involves cutting. These changes illustrate how the practice of medicine, and what we expect of medical practitioners, changes over time.

New students at the Washington University School of Medicine are given the chance to devise their own student oath that is similar to the Hippocratic Oath. Take a look at the 2016 class oath here.


Tags
8 years ago

In general I am a casual observer and usually do not make comments, especially since I am here to learn and have no background in linguistics. But in this case I feel strongly compelled to put my 2 cents' worth of thoughts in.

Although I cannot say that I am anything like fluent, I do have a reasonable amount of Mandarin Chinese and Japanese, and I have to say the first thing I thought when I saw this article was "ah". Because although I can see how katakana is derived from Chinese, using the rather restricted stroke combinations that is the basis of all Chinese characters, the same cannot be said for hiragana, because at the very least, squiggles do not exist in Chinese, at least by the time it was exported to Japan. What you might think are squiggles in Chinese are in fact just our possibly lazy, or perhaps more elegant way of writing, the way cursive would look compared to printed letters. Hirangana bears only a superficial resemblance to Chinese and always feels like it must have another source of inspiration.

Also keep in mind that Chinese was basically an imported language into Japan, and an attempt to shoehorn Japanese sounds into Chinese characters (which I think I can safely say did not sound the same) must have been unwieldy at best. In fact, today, Japanese pronouciations of kanji differ so much from the Chinese, and often their usage too, that I would use my knowledge of the characters only as a rough starting point as to what they might mean in Japanese.

Also, I looked up Kūkai, and, to cut a long story short, he was a Japanese Buddhist monk who went to China to study the sutras, and, to quote from the Wikipedia page directly:

Kūkai arrived back in Japan in 806 as the eighth Patriarch of Esoteric Buddhism, having learnt Sanskrit and its Siddhaṃ script, studied Indian Buddhism, as well as having studied the arts of Chinese calligraphy and poetry, all with recognized masters. He also arrived with a large number of texts, many of which were new to Japan and were esoteric in character, as well as several texts on the Sanskrit language and the Siddhaṃ script.

And a quick look at the Siddham script shows that it has its roots in the Aramaic alphabet.

This is the man to whom the invention of the kana system is attributed to, and if that is the case, I see a possible connection that is as not as far-fetched as it seems.

The History of Hiragana

In Japanese language, we have three types of letters, Kanji, Hiragana, Katakana.

Hiragana’s root is from old Ivrit and Palmyra letters.

The History Of Hiragana

The first column:  Phoenician alphabet The second column: Ostracon The third column: Old Aramaic The forth column: Imperial Aramaic The fifth column: Dead Sea scrolls The sixth column: Palmyrene script The seventh column: Palmyra

The History Of Hiragana

The first column: Hiragana The second column: Consonants The third column: Vowels The forth column: combined with the consonant and the vowel The fifth column: Sousho-tai (a hand writing style) The sixth column: Kanji


Tags
7 years ago
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 
Katherine Stinson In Tokyo,Japan - Her Asian Tour In 1916 

Katherine Stinson in Tokyo,Japan - Her Asian tour in 1916 


Tags
7 years ago

Book Lovers Day - Free Aeronautics e-Books from NASA

image

Quieting the Boom

image

The Shaped Sonic Boom Demonstrator and the Quest for Quiet Supersonic Flight.

Download it HERE

Elegance in Flight

image

A comprehensive History of the F-16XL Experimental Prototype and its Role in our Flight Research. 

Download it HERE

Probing the Sky

image

Selected National Advisory Committee for Aeronautics (NACA) Research Airplanes and Their Contributions to Flight.

Download it HERE

Cave of the Winds

image

The huge Langley Full-Scale Tunnel building dominated the skyline of Langley Air Force Base for 81 years (1930–2011). Explore how the results of critical tests conducted within its massive test section contributed to many of the Nation’s most important aeronautics and space programs.

Download it HERE

A New Twist in Flight Research

image

A New Twist in Flight Research describes the origins and design development of aeroelastic wing technology, its application to research aircraft, the flight-test program, and follow-on research and future applications.

Download it HERE

Sweeping Forward

image

Developing & Flight Testing the Grumman X-29A Forward Swept Wing Research Aircraft.

Download it HERE

Thinking Obliquely

image

Robert T. Jones, the Oblique Wing, our AD-1 Demonstrator, and its Legacy.

Download it HERE

The Apollo of Aeronautics

image

The fuel crisis of the 1970s threatened not only the airline industry but also the future of American prosperity itself. It also served as the genesis of technological ingenuity and innovation from a group of scientists and engineers at NASA, who initiated planning exercises to explore new fuel-saving technologies.

Download it HERE

X-15: Extending the Frontiers of Flight

image

X-15: Extending the Frontiers of Flight describes the genesis of the program, the design and construction of the aircraft, years of research flights and the experiments that flew aboard them.

Download it HERE

Ikhana

image

Delve into the story of the Ikhana, a remotely piloted vehicle used by NASA researchers to conduct Earth science research, which became an unexpected flying and imaging helper to emergency workers battling California wildfires.

Download it HERE

NASA’s Contributions to Aeronautics, Volume 1

image

This first volume in a two-volume set includes case studies and essays on NACA-NASA research for contributions such as high-speed wing design, the area rule, rotary-wing aerodynamics research, sonic boom mitigation, hypersonic design, computational fluid dynamics, electronic flight control and environmentally friendly aircraft technology.

Download it HERE

NASA’s Contributions to Aeronautics, Volume 2

image

Continue your journey into the world  of NASA’s Contributions to Aeronautics with case studies and essays on NACA-NASA research for contributions including wind shear and lightning research, flight operations, human factors, wind tunnels, composite structures, general aviation aircraft safety, supersonic cruise aircraft research and atmospheric icing.

Download it HERE

Interested in other free e-books on topics from space, science, research and more? Discover the other e-books HERE.

Make sure to follow us on Tumblr for your regular dose of space: http://nasa.tumblr.com


Tags
7 years ago
This Week, We’re Taking A Look At Manuscripts Having To Do With Health, Medicine, And Human Physiology
This Week, We’re Taking A Look At Manuscripts Having To Do With Health, Medicine, And Human Physiology
This Week, We’re Taking A Look At Manuscripts Having To Do With Health, Medicine, And Human Physiology
This Week, We’re Taking A Look At Manuscripts Having To Do With Health, Medicine, And Human Physiology
This Week, We’re Taking A Look At Manuscripts Having To Do With Health, Medicine, And Human Physiology

This week, we’re taking a look at manuscripts having to do with health, medicine, and human physiology specifically focusing on how bodies are displayed in manuscript illuminations or diagrams across different cultures.

LJS 389 shown above, is a 14th century Chinese treatise on the anatomy, physiology, and pathology of blood vessels titled Shi si jing fa hui. The manuscript is made from bamboo paper and the diagrams and kaishu script are written with black ink. Focus on the diagrams of the bodies and stay tuned this week to see not only how the details and forms of these depictions change from culture to culture, but also the mediums with which these manuscripts are created.

The full LJS 389 manuscript filled with more diagrams can be found on Openn: http://openn.library.upenn.edu/Data/0001/html/ljs389.html

and Penn In Hand: http://hdl.library.upenn.edu/1017/d/medren/4824235


Tags
8 years ago
Hey Resident Neuroscientist @sixpenceee, Wanna Explain Why The Strawberries Look Red?

Hey resident neuroscientist @sixpenceee, wanna explain why the strawberries look red?


Tags
7 years ago
1971 Japanese Re-release Poster For THE GRADUATE (Mike Nichols, USA, 1967)

1971 Japanese re-release poster for THE GRADUATE (Mike Nichols, USA, 1967)

Designer: unknown

Poster source: Heritage Auctions

Celebrating the films of storyboard artist Harold Michelson and researcher Lillian Michelson–the subjects of the upcoming HAROLD AND LILLIAN - A HOLLYWOOD LOVE STORY. This weekend, TCM will mark the 50th anniversary of The Graduate—a film that Harold storyboarded and contributed an iconic shot to—by screening a 4K restoration of the film in 700 theaters nationwide on April 23 and 26. Read more at the Harold and Lillian blog and find out where to see The Graduate here.

HAROLD AND LILLIAN opens next Friday at the Quad Cinema in New York.


Tags
7 years ago
1984

1984

George Orwell


Tags
Loading...
End of content
No more pages to load
  • emopuppycat
    emopuppycat liked this · 4 years ago
  • ournewoverlords
    ournewoverlords liked this · 4 years ago
  • plavigumbic
    plavigumbic liked this · 5 years ago
  • prabhjyotkaur-blog
    prabhjyotkaur-blog reblogged this · 5 years ago
  • travelintimeandspace
    travelintimeandspace liked this · 6 years ago
  • sleepy-as-hell
    sleepy-as-hell liked this · 6 years ago
  • amberdoungk
    amberdoungk reblogged this · 6 years ago
  • jercheat-blog
    jercheat-blog liked this · 7 years ago
  • things-ilike-tech
    things-ilike-tech reblogged this · 7 years ago
  • jack-the-pumpkin-king
    jack-the-pumpkin-king liked this · 7 years ago
  • solsmed
    solsmed liked this · 7 years ago
  • usuallypurplewolfp
    usuallypurplewolfp liked this · 7 years ago
  • a-maidens-fantasy
    a-maidens-fantasy liked this · 7 years ago
  • unluvablemisfit
    unluvablemisfit liked this · 7 years ago
  • noumbissi7
    noumbissi7 liked this · 7 years ago
  • mentaliongmai
    mentaliongmai liked this · 8 years ago
  • eevee371
    eevee371 liked this · 8 years ago
  • self-diagnosed-hypochondri-blog1
    self-diagnosed-hypochondri-blog1 liked this · 8 years ago
  • desobedientezcatz
    desobedientezcatz liked this · 8 years ago
  • kingdomlevel
    kingdomlevel liked this · 8 years ago
  • nopingthefudgemonkeysouttahere
    nopingthefudgemonkeysouttahere liked this · 8 years ago
  • manzanita-refrigerator
    manzanita-refrigerator reblogged this · 8 years ago
  • supersaiyan
    supersaiyan reblogged this · 8 years ago
  • brandonfujii
    brandonfujii reblogged this · 8 years ago
  • salsero54
    salsero54 liked this · 8 years ago
  • 69ilovecats420
    69ilovecats420 reblogged this · 8 years ago
  • off-by-one
    off-by-one liked this · 8 years ago
  • antinous-wilde
    antinous-wilde liked this · 8 years ago
  • ilm126
    ilm126 liked this · 8 years ago
  • liliesofgrace
    liliesofgrace liked this · 8 years ago
  • bobpattersonjr
    bobpattersonjr reblogged this · 8 years ago
  • bobpattersonjr
    bobpattersonjr liked this · 8 years ago
  • fousheezy
    fousheezy liked this · 8 years ago
  • jakdemir
    jakdemir liked this · 8 years ago
  • davek
    davek liked this · 8 years ago
philosophical-amoeba - Lost in Space...
Lost in Space...

A reblog of nerdy and quirky stuff that pique my interest.

291 posts

Explore Tumblr Blog
Search Through Tumblr Tags