Anatomy of a Large-Scale Social Search Engine

Back in October, we wrote a research paper entitled “Anatomy of a Large-Scale Social Search Engine” and submitted it to WWW 2010. We found out last week that it has been accepted, so we wanted to share a preview with you today!

Our paper was inspired by the classic Google paper, “Anatomy of a Large-Scale Hypertextual Web Search Engine”, in which Sergey Brin and Larry Page originally describe the algorithms and architecture of Google. This paper was published 12 years ago in the same WWW conference.

So our goal with our paper is to follow their example by providing a thorough presentation of the approach, architecture, algorithms, interfaces, and issues involved with Aardvark’s new social search paradigm.

The paper describes the fundamental differences between the traditional “Library” paradigm of web search — in which answers are found in existing online content — and the new “Village” paradigm of social search — in which answers arise in conversation with the people in your network. We explain that in social search:

  • Users can ask questions in natural language, not keywords
  • Content is generated “on-demand”, tapping the huge amount of information in peoples’ heads
  • The system is fueled by the goodwill of its users

We demonstrate that there is a large class of subjective questions — especially longer, contextualized requests for recommendations or advice — which are better served by social search than by web search. And our key finding is that whereas in the Library paradigm, users trust information depending upon the authority of its author, in the Village paradigm, trust comes from our sense of intimacy and connection with the person we are getting an answer from.

We also provide a detailed analysis of user behavior, and include dozens of interesting statistics. For example, of the 90,361 users we had in October 2009…

  • 87.7% of questions sent to Aardvark got answered (very high answer rate!)
  • 75.0% of users who asked Aardvark a question also answered a question for someone else (very high participation rate!)
  • 70.4% of answer feedback had a rating of ‘good’ as opposed to ‘ok’ or ‘bad’ (high quality!)

Writing a paper like this requires being more open, and sharing more information, than most small internet startups might be comfortable with. But we recognize that we have benefitted from the open culture of the scientific community, and would like to do our part. Further, we think that the opportunity presented by social search is truly significant, and we’d like to engage with the rest of the research community on the many challenges it presents. There are very interesting problems to explore around question classification, analysis of social relationships, person-to-person matching, maintaining a question/answer economy, and many other areas.

I wrote the paper with my good friend, Sep Kamvar, who started Kaltix, a search company acquired by Google in 2003. He led personalized search at Google for several years, and is now a professor at Stanford — and an advisor for Aardvark. But this paper would not be possible without the hard work and support of the whole Aardvark team over the past few years. And, of course, Aardvark itself would not be possible without the continued enthusiastic contributions of all of you, our users!

We’re very excited about presenting this at the WWW conference, which has been providing a great forum for web research for 19 years, and we hope to see you there in April.  So take a read, and let us know what you think…

(Note: the preview version we’re sharing here has some changes inspired by the great reviewer comments we received; we may make further changes for the camera-ready version that will be presented at the conference.)

70 Comments

  1. Posted February 2, 2010 at 11:36 am | Permalink

    This is revolutionary. You have defined a whole new Internet.

  2. Posted February 2, 2010 at 12:20 pm | Permalink

    We do not only have to define a whole new internet, thus we have to build it too.

  3. Gercek
    Posted February 2, 2010 at 1:28 pm | Permalink

    Thanks for sharing this information with the world. Keep up the good work…

  4. Posted February 2, 2010 at 2:21 pm | Permalink

    Can’t wait to read this. It’s going straight to my Nook :)

  5. Devlin Dunsmore
    Posted February 2, 2010 at 4:28 pm | Permalink

    This paper looks fantastic! If 10-12 years down the road social search is as ubiquitous as keyword search I wouldn’t be surprised to see this research as a major referenced work.

  6. Posted February 3, 2010 at 1:32 am | Permalink

    Congratulations Vark! You’re doing a great work.
    It’s important to note that having great technology is not enough. Your service’s user-experience and integration with IM contributed to its adoption by many users.
    Looking forward hearing great news from you!

  7. Posted February 3, 2010 at 3:53 am | Permalink

    remarkable!

  8. Posted February 3, 2010 at 6:56 am | Permalink

    I will read it with great care—and I apologize for describing your service as “secretive” to a fellow varker, asking for details.

  9. Posted February 3, 2010 at 3:45 pm | Permalink

    This is superb! Wonderful to see social platforms—particularly one as revolutionary and impactful as Aardvark—producing academic, data-driven papers. Much respect.

  10. Posted February 3, 2010 at 5:50 pm | Permalink

    I am an a big fan of Aardvark and I think it is a very good service that serves a purpose and can help people in general. Best of luck to you guys…

  11. Posted February 4, 2010 at 11:40 am | Permalink

    I feel like something important’s missing from your paper: how do you precisely define one’s “extended social network”? How do you even compute it (sounds like lots of recursion and that might hurt scalability)?

  12. Alison, Purchaser of Exotic Animals
    Posted February 5, 2010 at 3:18 pm | Permalink

    Thanks for everyone’s awesome comments!

    Julien, we consider your social network to be friends, friends-of-friends and people with common group affiliations (such as alumni networks), as defined by your connections on Vark.com. Let us know if you have any other questions!

  13. Posted February 10, 2010 at 2:11 am | Permalink

    WOW! Simply sounds to good to be true. But I am a true believer in the wisdom of the many. Together with a friend I have recently release Twick.it - The explain engine. Like Aakvark we ask users to generate short explanations. The difference is: We do not answer individual questions but topics. So our concept is in a sense still relying on the “library” paradigm. But with our Tool Tip - the explanations can be delievered to readers on demand. Check it out.

    I will follow Aardvark very closely and participate if I have any time left on my hands. Let’s make the web a better place.

  14. Posted February 10, 2010 at 7:46 am | Permalink

    We think this new site Aardvark is stupendous to say the least. In fact we have added a link to Aardvark on our website http://ferret bumper stickers Aardvark (the social network) is the greatest thing since chopped liver, or is that sliced bread!

  15. Apostol Apostolov
    Posted February 11, 2010 at 11:13 am | Permalink

    Congratulations on the Google sale! I really hope it gets tightly integrated into Google Profiles and Google Buzz.

  16. Posted February 15, 2010 at 1:18 am | Permalink

    Really cool to want to try new things, and create a new rigorous paradigm for search.
    At last, something fresh and not from Google!:)
    There exists a network of human brains, and it has remained under-exploited.
    With this you can tap in the knowledge people have in their brains.

    I like simplicity. So calculating the probability that a user can answer a question based on 2 separate components, i.e. the familiarity/social proximity and the expertise is, I think an elegant way to do.

    Main critics I would do:
    - you break out too much your probabilities into too many different components, which mean you have not found a simple enough encopassing principle yet.
    - the social graph is I think a false idea since, for example most of the people with whom I am connected on SNS are complete strangers to me.
    - You rely too much on probability (but this not avoidable :you can’t help since we don’t have the means to download rigorously each human brain) even when trying to measure how well a user is expert for one topic.

    These 2 last issues, the Web does not have since the network of linked pages is well defined, and determining the relevance of a page to a particular query is straightforward.

    But, still, it’s rigorous, robust and see the brain as a computational device. I like that!
    Keep the good work up! :)

    -julian

  17. Posted February 15, 2010 at 5:57 am | Permalink

    linguistics is key here. nice paper

  18. kapil madhwani
    Posted February 21, 2010 at 1:04 pm | Permalink

    “village search ” a new generation search …really machines do need humans…..

  19. Posted February 23, 2010 at 5:58 pm | Permalink

    Loving the post, read the paper, reminding me how much I enjoy working with researchers!! Any Stanford EESOR alum reading this post, get in touch with me!

  20. shimelis
    Posted April 17, 2010 at 12:22 am | Permalink

    This is a very good concept because search has to take the form of a dialogue marked by series of feedback in both directions. I have issues on the ‘library’ and ‘village’ paradigm, though. Assuming the ‘village’ is a pool of inter-connected expertise in a particular area, there is a limit to what they can answer unless the questions are very factual and straightforward in nature. When the questions involve critical analysis or serious research, the expertise have to still resort to the ‘library’ paradigm, in which case the issue would be to create a system that understands content. I argue the bulk of knowledge and thus answer is still in documents whether online or on the shelves.

    Cheers.

  21. Posted April 19, 2010 at 9:19 pm | Permalink

    Oh,you said right,i like you!

  22. Posted June 7, 2010 at 3:28 pm | Permalink

    Great Post. Really it will help lot of people.

  23. Posted June 8, 2010 at 7:47 pm | Permalink

    thanks admin….

  24. Posted June 9, 2010 at 12:10 am | Permalink

    Julien, we consider your social network to be friends, friends-of-friends and people with common group affiliations (such as alumni networks), as defined by your connections on Vark.com. Let us know if you have any other questions!

  25. Posted June 9, 2010 at 12:13 am | Permalink

    “village search ” a new generation search …really machines do need humans!

  26. Posted June 24, 2010 at 8:25 pm | Permalink

    Thanks Arena! It was a great panel, glad we got to talk.

  27. Posted July 7, 2010 at 4:01 am | Permalink

    WOW! GREAT POST! THANK YOU!!!

  28. Posted July 7, 2010 at 7:00 pm | Permalink

    why cant i find her on webcam

  29. Posted July 8, 2010 at 10:35 am | Permalink

    Well , the view of the passage is totally correct ,your details is really reasonable and you guy give us valuable informative post, I totally agree the standpoint of upstairs

  30. Posted July 13, 2010 at 12:30 am | Permalink

    thank you for sharing this with us !!!!

  31. Posted July 19, 2010 at 11:36 pm | Permalink

    Good, smart post. I’ve done this as well and I have to see it’s great in terms of creating brand and ultimately, awareness. I speak about pumashoescom, which is the investors equivalent of lifestreaming — using social media tools to add to and plug into the collective pumashoescom. I recently launched a book under the same brand. My website, http://www.pumashoescom.com, will also ultimately serve as the homebase for other content products using the same content category.

  32. Posted July 19, 2010 at 11:36 pm | Permalink

    good, perfect

  33. Posted July 20, 2010 at 8:37 am | Permalink

    Oh,you said right,i like you!

  34. Posted July 20, 2010 at 8:38 am | Permalink

    Thanks Arena! It was a great panel, glad we got to talk.

  35. Posted July 24, 2010 at 4:55 am | Permalink

    :)

  36. Posted July 24, 2010 at 4:56 am | Permalink

    Hi

  37. Posted July 24, 2010 at 11:16 am | Permalink

    We of the best articles I have found on using social media. Thanks! :-)

  38. Posted July 24, 2010 at 11:22 am | Permalink

    One of the best articles I have found on using social media. Thanks! :-)

  39. Posted July 24, 2010 at 11:47 am | Permalink

    Incredible snippet of your research paper! I’d be interested in reading all of it…

  40. Posted July 24, 2010 at 2:04 pm | Permalink

    Thank you very much for this information! This is really a revolutionary method of social interaction. I’m going to start using it right now!

  41. Posted July 24, 2010 at 2:48 pm | Permalink

    Thank you very much for sharing. I’m with Jason, I would also be interested in reading all of it…

  42. Posted July 24, 2010 at 4:54 pm | Permalink

    Really great paper, although i fear it might be a little too advanced for me… thanks for the share anyway! I hope you had a great time at the conference.

  43. Posted July 24, 2010 at 5:54 pm | Permalink

    surely is there not a way that the research you’ve done and the Twick-it service could be incorporated into social media? particularly twitter as it is directly answerable rather then the library form.

    A database could then be created from comments and cataloged t provide a real scope of answers that people could use to form their own judgement…

  44. Posted July 26, 2010 at 6:50 pm | Permalink

    What a very interesting concept! I agree it really will change the way we connect.

  45. Posted July 27, 2010 at 12:13 am | Permalink

    This will surely revolutionize the way we search for information. If this concept would find its way to the right intellect, the open culture will surely benefit from it aside from being innovative. This paper will gather more and more attention when a full consideration on the subject will be endorsed. Search Engine in a Social Media platform is great idea if you think about it.

  46. Posted July 27, 2010 at 6:55 am | Permalink

    I think the stat that stands out most is the “75.0% of users who asked Aardvark a question also answered a question”. I think with people willing to help others out after having been helped it
    will greatly influence what others do on the site.

  47. Posted July 27, 2010 at 10:24 am | Permalink

    I think peoples shift to social media has proven your paper correct. I think the internet is becoming a growing thinking life of its own and its naturally tending towards the “village” theme you talk of.

  48. Posted July 27, 2010 at 9:33 pm | Permalink

    Hi Damon

    I’m really intrigued and surprised with these stats:

    - 87.7% of questions sent to Aardvark got answered (very high answer rate!)
    - 75.0% of users who asked Aardvark a question also answered a question for someone else (very high participation rate!)
    - 70.4% of answer feedback had a rating of ‘good’ as opposed to ‘ok’ or ‘bad’ (high quality!)

    Any further news or update?

  49. Posted July 28, 2010 at 3:09 am | Permalink

    This concept is very cool idea on how to find searches.

    Would you have or see any value in having people from social networks that are niche oriented for say a party social network like ElboRoom.com to get the average person to submit things?

    You never know who might be drunk at the bar that used to be smart and might have some ideas to share from the bar in Fort Lauderdale where most college professors used to drink in the spring break days.

  50. Posted July 28, 2010 at 7:34 am | Permalink

    This concept is very good idea on how to find searches

42 Trackbacks

  1. [...] Isto promete nos próximos dias mexer com a forma como as redes sociais são encaradas. Pelo menos a coragem para imitar o título do paper “Anatomy of a Large-​​Scale Hypertextual Web Search Engine” dos fundadores do Google está lá. [...]

  2. [...] Read more on Aardvark blog Share and Enjoy: [...]

  3. By SearchCap: The Day In Search, February 2, 2010 on February 2, 2010 at 3:06 pm

    [...] Anatomy of a Large-Scale Social Search Engine, blog.vark.com [...]

  4. By Fresh From Twitter | mobile geo social on February 2, 2010 at 5:07 pm

    [...] Aardvark paper on social search http://j.mp/c9nFTh @vark methinks social graph + ‘PeopleRank’ also helps the Library paradigmi wonder if this [...]

  5. By Fresh From Twitter | mobile geo social on February 2, 2010 at 7:20 pm

    [...] but native apps still best for Mobile Gaming /via @rwwinteresting Aardvark paper on social search http://j.mp/c9nFTh @vark methinks social graph + ‘PeopleRank’ also helps the Library paradigmi wonder if this [...]

  6. By Links for February 2nd, 2010 on February 2, 2010 at 10:37 pm

    [...] Anatomy of a Large-Scale Social Search Engine [...]

  7. [...] проекта Aardvark (сервис вопросов-ответов с поисковиком) опубликовали исследовательскую работу "Анатомия большого [...]

  8. [...] recently published a comprehensive research paper entitled “Anatomy of a Large-Scale Social Search Engine,” whereby it delves into the [...]

  9. [...] strategy for social search has been getting a good deal of attention in tech circles. The paper, “Anatomy of a Large Scale Social Search Engine,” was written by Damon Horowitz and Sepandar Kamvar of Aardvark, one of several companies working [...]

  10. [...] this week, the team at Aardvark unveiled a new paper “The Anatomy of a Large-Scale Social Search Engine” which will be presented in April at WWW 2010. Inspired by and patterned after “The [...]

  11. [...] From a Summary Blog Post: The paper describes the fundamental differences between the traditional “Library” paradigm of web search — in which answers are found in existing online content — and the new “Village” paradigm of social search — in which answers arise in conversation with the people in your network. We explain that in social search: [...]

  12. [...] with questions to people with answers. The company has detailed their proposal in a paper titled, Anatomy of a Large Scale Social Search Engine. Aardvark is a network that harnesses the knowledge of its users within the community to create a [...]

  13. [...] number of rhetorical questions. I want to share three of these with you, so you can think about the Aardvark paper and your own experience with question answering [...]

  14. By Lengthy blueprint for reinventing higher education on February 9, 2010 at 12:24 am

    [...] The real world gives professors collaboration opportunities in their department and with whom they meet, but just think of the potential serendipities a people-indexer like Aardvark could produce. [...]

  15. [...] today, I ran across a data point in Aardvark’s new social search report that I find way more interesting than Google’s theoretical downfall. It’s not whether [...]

  16. [...] favorite web services of 2010, but the company still remains relatively small. Aardvark had around 100,000 users in October 2009. Aardvark co-founder Max Ventilla just confirmed to us that the company has indeed [...]

  17. [...] פרסמה לאחרונה מסמך “Anatomy of a Large-Scale Social Search Engine” שקיבל את ההשראה שלו ממסמך בעל שם דומה שפרסמו מקימי [...]

  18. By The Rise of GoogVark | Sanjay Kairam on February 11, 2010 at 4:33 pm

    [...] Google has purchased Aardvark for $50 million. My last blog post was about Aardvark’s recent paper describing their social search engine, which included allusions to the research paper which was [...]

  19. [...] a blog post last week, Aardvark’s CTO Damon Horowitz acknowledged the company’s debt to Google. The [...]

  20. By Aardvark joins Google! on February 12, 2010 at 12:04 pm

    [...] Join now! « Anatomy of a Large-Scale Social Search Engine [...]

  21. [...] On a related note, Aardvark also had a research paper accepted at WWW2010 which might be of interest as well, especially if you haven’t heard of Aardvark:  http://blog.vark.com/?p=352 [...]

  22. By Anatomy of a Large Scale Socia… | Jiva Technology on February 12, 2010 at 3:31 pm

    [...] Anatomy of a Large Scale Social Search Engine, paper for WWW2010 by Damon of Aardvark http://blog.vark.com/?p=352 [...]

  23. [...] by the aforementioned Damon Horowitz and Google’s former head of personalization, Sep Kamvar (Damon’s post and the paper itself). Their paper outlines, for all intents and purposes, exactly how [...]

  24. [...] פרסמה לאחרונה מסמך “Anatomy of a Large-Scale Social Search Engine” שקיבל את ההשראה שלו ממסמך בעל שם דומה שפרסמו מקימי [...]

  25. By » Google Poaches Social Search Service Aardvark on February 13, 2010 at 10:49 am

    [...] if you are interested in how Vark.com actually works, check out this remarkably detailed paper the company published just last month (ironically with a title that riffs on the famous paper from [...]

  26. By Puzzlepieces – Aardvark (February 13, 2010) on February 13, 2010 at 4:53 pm

    [...] they published a paper Anatomy of a Large-Scale Social Search Engine (the name is a reference to a famous Google paper) which I found quite interesting. I was expecting [...]

  27. [...] והממש סקרנים, מוזמנים לקרוא את המאמר שהוא פירסם בנושא: Anatomy of a Large-Scale Social Search Engine. [...]

  28. By Fresh From Twitter | mobile geo social on February 15, 2010 at 8:01 am

    [...] but native apps still best for Mobile Gaming /via @rwwinteresting Aardvark paper on social search http://j.mp/c9nFTh @vark methinks social graph + ‘PeopleRank’ also helps the Library paradigm Powered by Fresh [...]

  29. By Aardvark: Google’s new baby. | Technoheads on February 16, 2010 at 6:40 pm

    [...] that registration was now open to the public. Hmm. So, I gave it a shot. Aardvark is a self-defined social search engine, helping you find people rather than web pages. The site is really, really simple. There are two [...]

  30. [...] strategy for social search has been getting a good deal of attention in tech circles. The paper, “Anatomy of a Large Scale Social Search Engine,” was written by Damon Horowitz and Sepandar Kamvar of Aardvark, one of several companies [...]

  31. [...] Anatomy of a Large-Scale Social Search Engine – (tags: aardvark statistics search social research socialmedia paper google internet ) [...]

  32. [...] out their recent paper on the inner workings of the Aardvark [...]

  33. [...] has defined a new kind of social search: sometimes you want a person, not a web page, to answer your question. We’re extremely excited [...]

  34. [...] [소셜] 아드바크, WWW2010에 등록한 소셜 검색 관련 논문 공개 http://blog.vark.com/?p=352 Comments RSS [...]

  35. [...] 根據Aardvark在今年2月所公開的一篇學術論文裡面(沒錯,確實是學術論文,已被WWW2010接受並即將發表,值得一提的是,1998年Google創辦人Larry Page也是在同一個發表會發表了後來Google的核心技術與PageRank的相關研究),有幾個重點是值得注意的: [...]

  36. [...] Aardvark “Anatomy of a Large Scale Social Search Engine“ [...]

  37. By Aaron Johnson – Links: 6-1-2010 on June 2, 2010 at 1:00 am

    [...] Anatomy of a Large-Scale Social Search Engine Aardvark stats: * 87.7% of questions sent to Aardvark got answered (very high answer rate!) * 75.0% of users who asked Aardvark a question also answered a question for someone else (very high participation rate!) * 70.4% of answer feedback had a rating of ‘good’ as opposed to ‘ok’ or ‘bad’ (high quality!) (categories: social-search search socialsoftware aardvark ) [...]

  38. [...] As friends replace inbound links as votes for page authority, a new “Village” paradigm of social search emerges—one in which answers arise in conversation between people in your network. ”Trust comes from our sense of intimacy and connection with whom we are getting an answer from,” wrote Damon Horowitz in “Anatomy of a Large-Scale Social Engine.” [...]

  39. [...] As friends replace inbound links as votes for page authority, a new “Village” paradigm of social search emerges—one in which answers arise in conversation between people in your network. ”Trust comes from our sense of intimacy and connection with whom we are getting an answer from,” wrote Damon Horowitz in “Anatomy of a Large-Scale Social Engine.” [...]

  40. [...] networks influence search and information.  This paper follows on the the idea of PageRanks but moves it deeper into the interconnections we have created since 1998; We demonstrate that there is a large class of subjective questions — especially longer, [...]

  41. By READING 8.26 | MTEC1101/IMT1101 – Fall 2010 on August 25, 2010 at 5:40 am

    [...] Read the short article at http://blog.vark.com/?p=352 [...]

  42. By READING 8.26 | MTEC1101/IMT1101 – Fall 2010 on August 25, 2010 at 7:19 pm

    [...] Read the short article at  Anatomy of Large Scale Social Search Engine [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>