The report in SF Gate speaks for itself, especially with regard to modified bank statements.
…according to the affidavit, Olguin and Soberal sent an investor an “altered” bank statement that showed a Bitwise account on March 31, 2022, with over $20 million in cash in it. First Republic Bank provided the government with the actual statement, which showed that the company had just $325,000, the affidavit said. Olguin and Soberal “explained that they made the alterations because they believed … no one would invest in the Series B-2 if people knew the company’s actual condition,” per the affidavit.
They believed nobody would invest if “company’s actual condition” was known, so they lied in the most unintelligent way possible to attract investors.
Do AI chatbots have the ability to comprehend lengthy texts and provide accurate answers to questions about the content? Not quite. Anthropic recently disclosed internal research data explaining the reasons behind their shortcomings (though they present it as a significant improvement from their previous failures).
Before I get to the news, let me first share a tale about the nuances of American “intelligence” engineering endeavors by delving into the realm of an English class. I distinctly recall the simplicity with which American schools, along with standardized tests purporting to gauge “aptitude,” assessed performance through rudimentary “comprehension” questions based on extensive texts. This inclination toward quick answers is evident in the popularity of resources like the renowned Cliff Notes, serving as a convenient “study aid” for any literary work encountered in school, including this succinct summary of the book “To Kill a Mockingbird” by Harper Lee.
… significant in understanding the epigraph is Atticus’ answer to Jem’s question of how a jury could convict Tom Robinson when he’s obviously innocent: “‘They’ve done it before and they did it tonight and they’ll do it again and when they do it — it seems that only children weep.'”
To illuminate this point further, allow me to recount a brief narrative from my advanced English class in high school. Our teacher mandated that each student craft three questions for every chapter of “Oliver Twist” by Charles Dickens. A student would be chosen daily to pose these questions to the rest of the class, with grades hinging on accurate responses.
While I often sidestepped this ritual by occupying a discreet corner, fate had its way one day, and I found myself tasked with presenting my three questions to the class.
The majority of students, meticulous in their comprehension endeavors, adopted formats reminiscent of the Cliff Notes example, prompting a degree of general analysis. For instance:
Why did Oliver’s friend Dick wish to send Oliver a note?
Correct answer: Dick wanted to convey affection, love, good wishes, etc. so you get the idea.
Or, to phrase it differently, unraveling the motives behind Dickens’ character Bill Sikes exclaiming, “I’ll cheat you yet!” demands a level of advanced reasoning.
For both peculiar and personal objectives, when the moment arrived for me to unveil my trio of questions they veered into a somewhat… distinct territory. As vivid as if it transpired yesterday, I posed to the class:
How many miles did Oliver walk “that day”?
The accurate response appears to align more with the rudimentary function of a simplistic and straightforward search engine task than any genuine display of intelligence.
Correct answer: twenty miles. That’s it. No other answer accepted.
This memory is etched in my mind because the classroom erupted into a cacophony of disagreement and discord over the correct number. Ultimately, I had to deliver the disheartening news that none of them, not even the most brilliant minds among them, could recall the exact phrase/number from their memory.
What did I establish on that distant day? The notion that the intelligence of individuals isn’t accurately gauged by the ability to recall trivial details, and, more succinctly, that ranking systems may hide the fact that dumb questions yield dumb answers.
Now, shift your gaze to AI companies endeavoring to demonstrate their software’s prowess in extracting meaningful insights from extensive texts. Their initial attempts, naturally, involve the most elementary format: identifying a sentence containing a specific fact or value.
Anthropic (the company known best perhaps for disgruntled staff at a company competing with Google departing to accept Google investments to compete against their former company) has published a fascinating a promotional blog post that gives us insights into major faults in their own product.
Claude 2.1 shows a 30% reduction in incorrect answers compared with Claude 2.0, and a 3-4x lower rate of mistakenly stating that a document supports a claim when it does not.
Notably, the blog post emphasizes the software “requires some careful prompting” to accurately target and retrieve a buried asset.
The embedded sentence was: “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” Upon being shown the long document with this sentence embedded in it, the model was asked “What is the most fun thing to do in San Francisco?”
In this evaluation, Claude 2.1 returned some negative results by answering with a variant of “Unfortunately the essay does not provide a definitive answer about the most fun thing to do in San Francisco.”
To be fair about careful prompting, the “best thing to do” was in the sentence being targeted, however their query clearly was for “the most fun” instead.
This query had an obvious problem. Best things often can be very, very NOT FUN. As a result, and arguably not a bad one, the AI software balked at being forced into a collision and…
would often report that the document did not give enough context to answer the question, instead of retrieving the embedded sentence
I see a human trying to hammer meaning into places where it doesn’t exist, incorrectly prompting an exacting machine to give inexact answers, which also means I see sloppy work.
In other words, “best” and “most fun” are literally NOT the same things. Amputation may be the best thing. Fun? Not so much.
Was a sloppy prompt an intentional or mistaken one? Hard to tell because… Anthropic clearly wants to believe it’s improving and the blog reads like they are hunting for proof at any cost.
Indeed. The test results are said by Anthropic to improve dramatically when they go about lowering the bar of success.
We achieved significantly better results on the same evaluation by adding the sentence “Here is the most relevant sentence in the context:” to the start of Claude’s response. This was enough to raise Claude 2.1’s score from 27% to 98% on the original evaluation.
Source: Anthropic
Not the best idea, even though I’m sure it was fun.
Adding “relevance” in this setup definitely seems like stretching the goal posts. Imagine Anthropic selling a football robot. They have just explained to us that by allowing “relevant” kicks at the goal to be treated the same as scoring a goal, their robot suddenly goes from zero points to winning every game.
Here is the most relevant kick in the context:”””
Sure, that may be considered improvement by shady characters like Bill Sikes, but also it obscures completely that the goal posts changed in order to accommodate low scores (regrading them as high).
I find myself reluctant to embrace the notion that the gamified test result of someone desperate to show improvement holds genuine superiority over the basic recognition ingrained in a search engine, let alone considering such gamification as compelling evidence of intelligence. Google should know better.
Wingtip 30,000 feet over the English Channel. Source: It’s a real photo, really. Taken by me.
The Library of Congress (LOC) gives a full context presentation of John Gillespie Magee’s famous “High Flight” poem written from the cockpit of his 1941 Spitfire, as he trained to defeat the Nazis.
Oh! I have slipped the surly bonds of Earth
And danced the skies on laughter-silvered wings;
Sunward I’ve climbed, and joined the tumbling mirth
of sun-split clouds,—and done a hundred things
You have not dreamed of—wheeled and soared and swung
High in the sunlit silence. Hov’ring there,
I’ve chased the shouting wind along, and flung
My eager craft through footless halls of air. . . .
Up, up the long, delirious, burning blue
I’ve topped the wind-swept heights with easy grace
Where never lark nor ever eagle flew—
And, while with silent lifting mind I’ve trod
The high untrespassed sanctity of space,
Put out my hand, and touched the face of God.
LOC offers us this concluding analysis, a nod to cognitive warriors of non-physical battles.
By writing “High Flight,” John Gillespie Magee, Jr., achieved a place in American consciousness arguably greater than any he could have achieved through heroism in battle.
*cough*
Non-physical, lyrical combat is in fact… battle more relevant today than ever with the acceleration of attacks using AI.
For decades there has been a dilemma of privacy versus safety nagging commercial malls, as compared with public spaces.
More specifically, law enforcement trying to provide safety faced a serious data ownership boundary issue when many large open spaces for assembly were privatized and controlled for profit by very small groups (e.g. corporations).
Enter detailed map and geolocation software vendors.
While many, or perhaps nearly all people, think about databases of spaces in terms of shoppers and commuters, behind the scenes are special operators training in high stakes rapid targeted insertions for hostage rescues and threat elimination.
A very long time ago we would be talking about some maps of rebel compounds traced in charcoal by hand onto a headscarf that gets imaged and transmitted by radio to rescue teams (de oppresso liber)… and “here” we are today simply talking about APIs and a finger touching a screen.
A good example of the latest achievement — very open steps for public knowledge through private space boundaries — is being showcased by German engineers at HERE working with Japanese corporations.
“Yahoo! JAPAN Maps’ easy interface guides users through complex indoor venues such as mega-shopping mall LaLaport Tokyo Bay (shown on left) and the commercial hub around Shibuya — Japan’s world-famous fashion epicenter (shown on right).” Source: HERE
With HERE, opening Yahoo! JAPAN Maps on your smartphone will reveal a seamless navigation experience. Each shopping mall floor is clear and easy to read. For example, all stores are shaded in pink, restaurants are colored orange and additional icons for escalators, elevators, ATMs and toilets are highlighted accordingly. As you are guided through the space, you can quickly switch floors with a simple tap of your screen.
Tokyo’s shopping centers are just as fast-paced inside as the roads that surround them — powered by HERE Indoor Map, Yahoo! JAPAN Maps’ floor plans are updated monthly so any renovations or new store launches are automatically captured and made visible.
Private spaces “automatically captured and made visible” sounds like constant surveillance positioned as for good, or in other words some subtle law and order enforcement direct marketing, if I’ve ever seen it.
Shopping, dining, or eliminating a dangerous threat. What’s your preferred tool and destination? “A woman walks past as South Korean soldiers participate in an anti-chemical and anti-terror exercise… Seoul, South Korea, August 22, 2023.” Source: Chung Sung-Jun / Getty
After all, who truly benefits from the mass privatization of open spaces especially in terms of freedoms, such as from harms?
The next logical step of this map innovation will be highly precise 3D fly-through data in VR for practice rescue training (like 1990s VRML all over again). It’s a relatively small data storage and processing market, but there’s nonetheless a lot of quiet public money fueling these seemingly large commercial efforts.
a blog about the poetry of information security, since 1995