SOLAR_FIELDS 5 years ago

I read this story in the most chilling manner: the same tactics they use to perform this analysis will eventually be used to link people who anonymously post now but might say something in the future that can be linked to them using the same type of analysis used here. To phrase in another way, we have come to a point where your very prose is a digital signature.

  • _nedR 5 years ago

    This is already a thing. In fact the rumour is that the US govt discovered the identity of Bitcoin's Satoshi using this.

    https://medium.com/cryptomuse/how-the-nsa-caught-satoshi-nak...

    • jmkni 5 years ago

      I always thought Satoshi was more than one person, if this is true it would suggest otherwise.

      • _nedR 5 years ago

        Actually in the article, the author says he got the impression it might be more than one person (but his sources do not confirm or deny it) which if true probably means the NSA found the writings to belong to several people.

        Sorry if my original comment seemed to imply otherwise.

  • ggggtez 5 years ago

    I keep forgetting the general public is not aware of these things.

    Data is like nuclear waste. Everything you do online leaves a pattern of behavior that is unique to you. Your only saving grace is no one cares about you specifically, until they do.

  • yogsototh 5 years ago

    It was already known that a simple Markov Chain was used to detect another author had written a chapter inside a book. It was in 2003 I think, unfortunately I cannot find a reference about this. Just to tell that Markov chains are a very basic and old ML method quite efficient for this kind of task.

  • TheOperator 5 years ago

    This sort of analysis is older than Tolkien. There are pretty substantial processing requirements to do it at scale and it's pretty inaccurate. In the future people who say controversial things will use short sentence long statements to render this sort of analysis useless.

  • mmcwilliams 5 years ago

    Isn't this the reason the grep was created? It was used to determine which parts of the Federalist Papers were written by which author.[0]

    Considering this occurred in 1974, I can only imagine that techniques for de-anonymizing authors have gotten much better due to how much written text individuals post on social media sites, like hn. Uh oh.

    [0] https://en.wikipedia.org/wiki/Grep

  • cocochanel 5 years ago

    It's already a thing, see how the FBI caught Silkroad admin. Although not in the automated fashion that you suggest, I am pretty sure the algos are already in use.

  • kzrdude 5 years ago

    Aren't college students everywhere already exposed to this with every text, due to "plagiarism detection" software?

  • salutonmundo 5 years ago

    Possible solution: run your text through a different machine translator for each account. Make minor corrections for cohesiveness.

  • lordnacho 5 years ago

    But the whole cat and mouse game hasn't really started yet. Once people find out what the algo looks at they can try to game it. Eg if you know or looks for the same phrases like "first of all" you can stop using that. Or if it looks at specific errors you can sprinkle it in one text but not another.

    • stedaniels 5 years ago

      Even better, it would be trivial to write a tool that rephrased what you wrote to be anonymous or even better/worse match someone else's signature.

      • toyg 5 years ago

        It already exists: Google Translate.

        Feed your text once or twice to GT, then re-correct it just enough to make it understandable.

        • herewulf 5 years ago

          Ain't nobody got time for that.

          (Apologies for the vernacular but I am attempting to conceal my digital signature)

  • metastew 5 years ago

    Wasn't that how they caught the Unabomber? I saw a documentary about the guy who caught him by using this sort of analysis, although his method was quite analog (scanning through written letters and Unabomber's correspondences to the press).

    • commandar 5 years ago

      The Unabomber was turned in by his brother, who recognized some unique phrasing in the released manifesto and reported it to authorities.

      EDIT:

      After looking, it looks like the brother turned him in after having this sort of analysis done to confirm his suspicions. So it does look like there's a tie to the story, but the analysis wasn't done blindly in that case.

  • Simple_Guy 5 years ago

    Only a matter of time before AI-powered 'prose anonymiser' is developed.

    Then you can just run all your naughty words through the russian styliser.

  • viko-h 5 years ago

    simple & obscure solution: google translate to another language and back to original.

    • eru 5 years ago

      That will add a bit of entropy, but might not be enough.

      • genera1 5 years ago

        To add additional entropy, you can do, multiple passes, use different sets of languages every time you write or even use different translator than GT

        • gpderetta 5 years ago

          Fun fact of the day: Galactic Pothealer (written in 1969 by PKD) starts with the protagonist doing exactly this to play a game.

    • jammygit 5 years ago

      The secret to the bad grammar in spam emails?

  • full-o-shit 5 years ago

    Stylometry isn't hard science. And it is vulnerable to attack.

    You can fake it. If you have a target to frame then it is possible to impersonate them. If you wish to evade detection then it is possible to attempt a disguise. It can be useful when looking at innocuous passages of text but the context of stylometry is ruined by an author aware of the technique.

slg 5 years ago

Has the idea that Beowulf started as an oral traditional fallen out of favor? There is no mention of that in the article and that would seem to be an obvious flaw in the study if it truly was originally told orally. There is certainly less freedom when transcribing structured verse like Beowulf compared to prose, but the transcriber is still going to have an impact. A single transcriber could therefore provide a stabilizing voice that helps mesh together a story that was originally built piece by piece.

  • kro92kfmrzz 5 years ago

    It still seems possible that could be the case. The conclusion here seems to be “one person wrote down the words that tell the overall tale.”

    Does not seem to preclude it originating as a series of verbal tales.

    • paulific 5 years ago

      That was more or less the point of Tolkien's Lecture "The Monster and the Critics". He compared it to building a tower out rocks that came from a historical ruin. Everybody was interested in taking it apart to see where the rocks came from, but he felt it had value in considering it as a single creative work in its own right.

      • kijin 5 years ago

        It's also the kind of idea that Tolkien was in a unique position to entertain. If one man can create an entire mythology spanning The LOTR, The Hobbit, and The Silmarillion, there is no reason to suppose that another man cannot achieve a similar feat with the Beowulf.

        According to the article, doubts about single authorship began to be raised in the 19th century. This was a time when a lot of people thought that they were living at the pinnacle of history, and that cultures of the distant past must have been strictly inferior to the modern one. Troy could not have possibly existed; no single person in such a barbaric age could possibly have produced great poems like the Homeric epics -- or Beowulf.

        Well, we found Troy. We also found evidence of great devastation at Troy right around the time when the Homeric war supposedly took place. It seems that people of the distant past did possess the ability to tell a great story after all, moving freely between history and mythology, filled with allegory and philosophical depth. Just like Tolkien did, but hundreds or even thousands of years earlier.

  • thaumasiotes 5 years ago

    > Has the idea that Beowulf started as an oral traditional fallen out of favor?

    It's hard to imagine that that idea could have fallen out of favor, given the incredibly sparse written record of the Germanic tribes. They didn't produce enough literature to have a non-oral tradition.

    • hadepabade 5 years ago

      we don't know because the catholic church burned or hid everything just like they would do later in mexico

  • dmckeon 5 years ago

    > Scholars agree that these were different scribes copying the poem, not two different poets.

    Which suggests that some of the perceived homogeneity may stem from the scribal style, rather than the oral style.

  • jdminhbg 5 years ago

    Even if originally an oral tradition, the current version could still be the work of a single author. Many of Shakespeare's plays are built on pre-existing stories/histories, but the plays as they exist now are his.

    • kijin 5 years ago

      We could say the same about the Homeric epics. One does not end up producing the canonical version of an oral tradition by blindly recording whatever lyrics are circulating at the time. A great poet(s?) rearranges, reinterprets, and infuses the story with a unique perspective that can be identified as his (or theirs), even thousands of years later.

rhokstar 5 years ago

"Across many of the proposed breaks in the poem, we see that these measures are homogeneous," said Krieger. "So as far as the actual text of Beowulf is concerned, it doesn't act as though there is supposed to be a major stylistic change at these breaks. The absence of major stylistic shifts is an argument for unity."

I'm imaging this methodology applied across many other literary works. So many insights can be generated throughout the ages!

  • bhritchie 5 years ago

    And have been! Careful attention to stylistic features has allowed us to get a pretty good idea of the chronological groupings of Plato's dialogues, for example, which helps us understand things like how his views evolved over time. That's a typical sort of use of stylometry.

  • cocochanel 5 years ago

    What if they had style guidelines like software engineers do?

  • TheGRS 5 years ago

    Have they applied this to Homer? That was another one I understood was supposedly multiple authors.

    • mrb 5 years ago

      Yes they have. The authors write:

      "Like Beowulf, the Greek epics Iliad and Odyssey have also generated much debate about their authorship and composition. Conventionally attributed to a single author—Homer—both works nevertheless clearly originate in a long oral tradition and show signs of considerable evolution in the course of their transmission history, including the possible influence of written versions[37,38]. Since the two Homeric epics have numerous features in common, we hypothesized that they might also have a similar pattern of sense-pauses. However, as shown in Fig. 2a, the Odyssey has a higher proportion of intraline sense-pauses relative to the Iliad. This difference suggests a slight change of compositional practice between the two Greek poems, whether due to a single poet’s stylistic evolution or natural variation across the oral tradition. "

    • N_trglctc_joe 5 years ago

      Homer's a bit of an unusual case. The Illiad was the first written work produced after a long dark age (or close to it; I'm not sure where the consensus is right now on whether Hesiod came earlier), so Homer was drawing on a few centuries of pent-up oral tradition from a culture that had itinerant hostorian-poets. As such, he probably didn't compose all his own verses but could well have been the first to write them down. I'm not sure how effective the technique referred to in the article (stylometry) would be at teasing apart the distinction between composer-of-verse and author-of-lines.

    • ehsankia 5 years ago

      What about Shakespeare, doesn't he have one of the most extensive authorship research? The article quickly mentions that an older analysis mistakenly attributed someone's poem to Shakespeare, to me, that only adds to the mystery of the authorship question.

eyerow 5 years ago

The scholars should try this on the Book of Mormon.

  • thousandautumns 5 years ago

    Why? There's no point. If the results come back indicating multiple authors, great. If the results come back indicating a single author, you could just make the argument that it is the result of the book having a single translator.

    Academic pursuits should have more meaning than trying to dunk on other people's belief systems when those belief systems are fairly harmless.

    • herewulf 5 years ago

      You wouldn't think they are "fairly harmless" if you were a non-Mormon living in Utah.

dorkHeroics 5 years ago

This is just the vague bullshit artistry of stylometry. It's like phrenology or "psychic detectives" standing in as a form of pseudoscience, for those moments when we lack a means to describe true methodologies, and need an alternative form of evidence fabrication to service parallel construction's objective.

In this case, scholars enjoy and admire Tolkien, and it's fun for them to validate his ideas and steal a headline or two. Thus post-hoc rationalization leads them to negotiate a codified reasoning behind Tolkien's linguistic gut instincts.

Viola, now they earn geek credibility and we all can cite their pseudoscience as precedent for our own purposes.

Gunstig2Snath 5 years ago

I always preferred Grendel by John Gardner. But if Zach Snyder directed the original with Ben Affleck as Beo.... oh, hell, I'd hate that.

elfakyn 5 years ago

I doubt this would get as much media attention if Tolkien wasn't involved.

  • Mediterraneo10 5 years ago

    Seamus Heaney’s translation of Beowulf two decades got quite a bit of attention in mainstream magazines and newspapers. Plus, 2007 saw a film adaptation of Beowulf directed by Robert Zemeckis and written by Neil Gaiman and Roger Avary. There is public interest in this poem as a classic of English literature even when Tolkien is not involved.

    • hinkley 5 years ago

      And there is also The 13th warrior, a screen adaptation of Michael Crichton’s ode to Beowulf, called Eaters of the Dead.

    • scottlocklin 5 years ago

      Heaney's translation is bloody awful too.

      Reading a chunk of Beowulf in the original was, along with learning Maxwell's equations and reading Gibbon one of the best things I ever did for myself. It's ... fairly obvious it's one person. The latter half with the dragon could have been an add-on.

      • libraryatnight 5 years ago

        Heaney's translation is pretty fun to read, I don't know what you're talking about with this "bloody awful" stuff. It's a poets translation for sure, and he admittedly takes license, but "bloody awful" feels like a stretch.

        Heaney's translation felt like he took Tolkien's "The Monsters and the Critics" to heart with his understanding, and clear love of the source.

        • scottlocklin 5 years ago

          I don't care if he loves the source; it was an awful translation and it gives me the heebie jeebies like nails on a chalkboard. Just picking it up gives me the creeps; vandalism.

          Rebsamen is closer to something like what was actually written.

          • libraryatnight 5 years ago

            Your Rebsamen comment is true - everything else you say makes you sound like those people who go around telling others they can't possibly appreciate or enjoy sushi if they've never been to Japan.

            You may want to try dislodging the stick from your wart ridden anus.

            • scottlocklin 5 years ago

              Well, at least you've read the damn thing. There's no accounting for taste, and unless you're Haney, there is no reason to be personally insulting.

              • libraryatnight 5 years ago

                Disliking something is fine, calling it shit is pompous and ignorant for someone boasting of being well read.

                • scottlocklin 5 years ago

                  Man, you're really butthurt; are you a relative of Haney's? One of the terrible diseases of our time, along with things like never reading the anglo saxon classics (I learned OE and read Beowulf in the original because both my parents did so in high school, as, apparently did everyone in the US at one point in time) is not calling out terrible things as terrible, and ascribing importance to people the media has declared as "great." Haney's translation is bloody awful, and will be remembered as such for as long as anyone remembers what his name is.

    • krapp 5 years ago

      >Plus, 2007 saw a film adaptation of Beowulf directed by Robert Zemeckis and written by Neil Gaiman and Roger Avary.

      That film would likely not have happened if not for the success of the Lord of the Rings films, and most people likely never heard of Beowulf until then, unless they dimly remembered having to read it in class once.

      And no translation of anything, much less any book without a big media tie in, gets anything close to "quite a bit of attention" in the mainstream press. Coverage in literature sections of the newspapers or dedicated literary sites are far from mainstream.

      And this is an article in Ars Technica, which to HN may seem mainstream, but which is far from it for the masses. A quick Google of "Tolkein Beowulf single author" brings up little in the way of mainstream coverage, with the Ars article being on top.

      Don't get me wrong, I love Beowulf and Seamus Heaney's translation is one of the few books I'll reread regularly, but elfakyn is correct. If Tolkein's name weren't involved, no one would be covering this at all, and really, almost no one is now.

      • NeedMoreTea 5 years ago

        Well it does become difficult to separate considering Tolkein lectured about Beowulf, translated it in the 1920s (not published until this decade!), and his decades of work on ancient languages, philology and linguistics.

        Nothing to do with LotR films, more to do with an intellectual giant well known in the field (who also wrote LotR). The books were far better anyway.

        Neither is it Tolken's fault Beowulf is considered the most significant work from Old English. Often discussed in the broadsheets I once read, not ever likely to reach those who read the Sun or the Mirror. Still doesn't stop it being a highly significant work (without Tolken or Jackson).

        The Beeb trot it out regularly - not buried in dusty literate sections that no one normal would encounter, which seems to be what you're driving at.

        Mind it probably even reached down to tabloid readers from time to time. There was a fun Australian cartoon version, narrated by Peter Ustinov retelling from Grendel's point of view. Managed to become a bit of a cult classic in its day. There's been a couple of TV mini series. Probably a game and festival too for all I know!

      • Mediterraneo10 5 years ago

        > That film would likely not have happened if not for the success of the Lord of the Rings films

        That particular film may not have been made, but it’s not hard to imagine an adaptation being made by someone even in the absence of the Lord of the Rings trilogy. Michael Crichton’s Eaters of the Dead, which riffs on the Beowulf story, got a film adaptation (as The Thirteen Warrior) in 1999. The Beowulf story isn’t The Dream of the Rood or other esoteric Old English literature; it has adventure elements that will attract ordinary audiences from time to time.

        > Coverage in literature sections of the newspapers or dedicated literary sites are far from mainstream.

        Literature sections of mainstream newspapers are mainstream reporting, even if many readers are going to skip over those columns. And are you seriously arguing that mags like e.g. The New Yorker or The New York Review of Books are not mainstream? Those may be bought by a certain demographic of bookish people, but those mags are sold at ordinary newsagents. They are not specialist journals.

        • krapp 5 years ago

          >And are you seriously arguing that mags like e.g. The New Yorker or The New York Review of Books are not mainstream?

          Maybe. Most people read neither nowadays. Unless my understanding of the definition of "mainstream" is flawed, that makes them essentially niche publications.

          But that wasn't actually my argument. My argument is that most people don't care about literature beyond anything not tied into a popular media franchise, non-literary books or books by famous authors, and Beowulf is none of those things.

          • dwringer 5 years ago

            Whether "most people ... care about [it as] literature", I'm not sure, but most people in many school districts and universities were at least forced to read Beowulf as part of a standard curriculum, possibly more than once over the years. Isn't the question merely whether they would've cared enough to upvote or comment on the HN posting without seeing "Tolkien" in the headline? Beowulf is part of what one might call literary canon. What constitutes a literary canon is always going to be subject to debate, as it is ultimately subjective at some level. How one is to define "popular" media franchise, "literary" books, or "famous" authors can only pose an even greater challenge in forming any consensus.

      • garmaine 5 years ago

        Most people I know are familiar with The 13th Warrior, and know that it is a (reimagined) retelling of Beowulf by Michael Crichton. From wikipedia:

        > In an afterword in the novel Crichton gives a few comments on its origin. A good friend of Crichton's was giving a lecture on the "Bores of Literature". Included in his lecture was an argument on Beowulf and why it was simply uninteresting. Crichton stated his views that the story was not a bore and was, in fact, a very interesting work. The argument escalated until Crichton stated that he would prove to him that the story could be interesting if presented in the correct way.

        • krapp 5 years ago

          To be fair, chances are you and your friends are not representative of the mainstream. Just being on Hacker News makes that unlikely.

          Michael Crichton is a famous enough author that people are more likely than not to see a movie based on his work because it's a "Michael Crichton movie" and neither know nor care about the source material. To most people, the Beowulf movie is just a fantasy movie where Angelina Jolie plays a sexy demon, not the adaptation of Beowulf they've been waiting for years to see, the way people were waiting to see (or dreading to see) the Lord of the Rings.

          Beowulf just isn't that significant or relevant in popular culture - it just isn't. I don't even know why this is controversial.

          • justin66 5 years ago

            > Beowulf just isn't that significant or relevant in popular culture - it just isn't. I don't even know why this is controversial.

            Not everyone slept through their high school English class and failed to notice when characters in movies they were watching were named "Beowulf."

            And we're talking about one of the few things that is examined in almost every high school English class.

            • thaumasiotes 5 years ago

              > we're talking about one of the few things that is examined in almost every high school English class.

              I don't think this comes close to being true. Maybe in Britain.

              Ancient epics and ancient languages are a primary interest for me, but no school class ever covered Beowulf.

              • justin66 5 years ago

                > no school class ever covered Beowulf

                How interesting. In that case am I right in guessing that your coverage of the Medieval part of the canon was limited to Chaucer and didn't include anything else? I'm just curious how much things have changed.

                • thaumasiotes 5 years ago

                  Chaucer was covered in a sense, but in History rather than English. The class did not read him, except for one student who chose that as the focus of a class project.

                  I did have a high school English class covering (among other, non-medieval works) Sir Gawain and the Green Knight, and the story of Tristan and Iseult. Sir Gawain and the Green Knight was read in translation, but Tristan and Iseult was a fairly modern reimagining (set in the original period), with an author's introduction discussing how she chose to omit the magic that was present in the original because she thought it detracted from the agency of the characters.

                  Edit: found it - it was this one. https://www.amazon.com/dp/0374479828/ . "Tristan and Iseult: an inspired retelling of the legendary love story".

                  • justin66 5 years ago

                    That sounds really good, I don't think I ever read Tristan and Iseult.

                    I'd have slotted Sir Gawain and the Green Knight in with Beowulf in the "medieval" part of the literary canon but I could be off-base there. I remember reading Beowulf in high school but not the other. That might be a function of which one I found more interesting at the time, I'm not sure.

                    • thaumasiotes 5 years ago

                      I agree that Sir Gawain and the Green Knight is "medieval". I meant to say that the English class covering it was not focused on a historical period, covering literature that was much more modern in the same year.

                      Beowulf is from around the 8th century; I guess that's technically "medieval" but I think of it as belonging to some nameless period that's older than "medieval". There's a huge difference between Old English of the 8th century and Middle English of the 14th.

                      In terms of story quality, Sutcliffe's Tristan and Iseult was in fact quite good. And it gave me a bit more appreciation for this: https://arthurkingoftimeandspace.com/1020.htm .

                      • justin66 5 years ago

                        I think the "medieval" terminology is a little dated anyhow. I guess Harold Bloom's categorizations and listings and so on are a lot more authoritative now (they sure pop on a google search) and it doesn't look like he uses the term. I have no real opinion on how much any of that matters.

                        Memory is unreliable but I recall my high school class using a pretty good textbook that included Beowulf with both old English and modern translations, but also the chapter of The Hobbit where Bard shoots the dragon, which stylistically invited some interesting comparisons. It was a pretty good lesson for a high school kid who was also a fan of Tolkien, back before that was something you could be without reading any books.

            • krapp 5 years ago

              Plenty of people studied Chaucer in English class as well, and yet no almost no one in mainstream culture cares about Canterbury Tales.

              And yes, more people more or less slept through English class than not.

          • inflatableDodo 5 years ago

            >Beowulf just isn't that significant or relevant in popular culture - it just isn't. I don't even know why this is controversial.

            I dunno, all those superhero films are doing pretty well.

      • darkpuma 5 years ago

        I don't understand what point you're trying to make. Are you trying to criticize the general public for not caring about Beowulf in "the right way" or are you trying to criticize the media for not caring about Beowulf in "the right way"? Or do you think this story should not have been reported at all?

        • krapp 5 years ago

          > Are you trying to criticize the general public for not caring about Beowulf in "the right way" or are you trying to criticize the media for not caring about Beowulf in "the right way"? Or do you think this story should not have been reported at all?

          I'm criticizing the premise that Beowulf is as well known as Tolkien's works in popular culture, or even that well known at all outside of niche literary circles, as counter to the claim that Tolkien's attachment to the story has no relevance to the degree of its coverage, which, itself, is limited to begin with.

          • darkpuma 5 years ago

            Okay, well the way you are phrasing it seems to suggest you're annoyed that the story was published and/or posted here.

            • krapp 5 years ago

              No, not at all. This is exactly the kind of diverse content we need more of.

      • ctdonath 5 years ago

        Several other movie renditions of Beowulf were made before the LotR films.

        • krapp 5 years ago

          How much money did they make?

  • openredbull 5 years ago

    I disagree. Beowulf is a staple of English literature, and a well-known poem for many.

  • libraryatnight 5 years ago

    Beowulf is required reading for many many students, when I worked in book stores students came in every new semester from all levels of schooling to buy either the Heaney or Raffel translations - so I think it's probably interesting to more people than you think.

  • SiVal 5 years ago

    I think this claim is true in its explicit sense. The Tolkien connection makes an interesting story even more interesting, which probably increases media attention by some non-zero value.

    But it would be an interesting story for many of us without the Tolkien connection. Beowulf is an important artifact in the history of the language many of us are deeply attached to. And better than a potsherd, this artifact literally speaks to us from the distant past (literal if you consider writing of this sort to be a form of speech, as I do.) If the claim is implying that most of the coverage is due to the Tolkien angle, and it would have little to no coverage without it, I believe that to be incorrect. But I don't know if that is what was meant, and the explicit interpretation of the claim is probably correct.

  • walshemj 5 years ago

    Tolkien was well know as one of the leading scholars in Norse literature before he wrote the LOTR

    • iguy 5 years ago

      And IIRC was largely responsible for getting people to read Beowulf in particular as literature. I mean to appreciate the work of art, as opposed to dissecting it as evidence about language etc.

      • thaumasiotes 5 years ago

        It was my impression (not based on a lot) that Beowulf was sort of forced into the status of "great literature" by the fact that it is the only major work of Anglic literature at all, and English elites wanted something from their own native tradition (which, again, didn't really exist) to compete with the classical epics.

    • labster 5 years ago

      Beowulf is English literature though.

  • sneakernets 5 years ago

    Ah, never miss an opportunity to turn a positive and engaging story into a cynical jab against society.

    • rgrieselhuber 5 years ago

      It’s always low karma, anonymous accounts. There seem to be a bunch of them recently.

      • ineedasername 5 years ago

        I've noticed the same, unfortunately. Lot's more posts on anything that isn't coding or directly tech related that just say something like, "this shouldn't even be on HN"

        At least the tend to get downvoted relatively fast

        • rgrieselhuber 5 years ago

          It seems like a lot of bots or sleeper accounts have been activated recently.