1 00:00:05,882 --> 00:00:07,218 (Dan) Hello everyone. 2 00:00:07,218 --> 00:00:09,911 So this session is about teaching SPARQL. 3 00:00:09,911 --> 00:00:12,423 The presenter is Martin Poulter, so I leave you the stage. 4 00:00:12,423 --> 00:00:13,668 Have fun. 5 00:00:13,668 --> 00:00:14,943 (Martin) Thank you very much. 6 00:00:16,501 --> 00:00:18,717 Hi, everybody. 7 00:00:18,717 --> 00:00:23,355 I trust you'll agree that Wikidata is great, 8 00:00:23,355 --> 00:00:27,171 it has lots of interesting data on different topics, 9 00:00:27,171 --> 00:00:31,225 the tools people make with it are fun to use and fun to explore, 10 00:00:31,225 --> 00:00:33,412 and easy to use. 11 00:00:33,412 --> 00:00:38,578 And maybe you'll agree with the suggestion that to get the best out of Wikidata 12 00:00:38,578 --> 00:00:40,142 you need to know SPARQL, 13 00:00:40,142 --> 00:00:42,040 you need to be able to phrase your own queries. 14 00:00:42,040 --> 00:00:45,141 So you might see that as a barrier, an obstacle, 15 00:00:45,141 --> 00:00:50,183 that we ideally need a big program of training for developers, 16 00:00:50,183 --> 00:00:54,008 for librarians, for curators, for ordinary people 17 00:00:54,008 --> 00:00:58,236 to get them literate in this language, and that's a big effort, 18 00:01:01,036 --> 00:01:04,031 an aspect of Wikidata outreach. 19 00:01:04,031 --> 00:01:06,238 My suggestion is to kind of turn that around, 20 00:01:06,238 --> 00:01:09,037 that Wikidata, especially the Query Service, 21 00:01:09,037 --> 00:01:11,673 because it's so helpful, because it's so full of good stuff, 22 00:01:11,673 --> 00:01:13,857 because it's so colorful, 23 00:01:13,857 --> 00:01:16,200 because it has so many visualization abilities, 24 00:01:16,200 --> 00:01:20,173 is the ideal platform for people to learn SPARQL, 25 00:01:20,173 --> 00:01:21,890 also to learn about databases, 26 00:01:21,890 --> 00:01:23,724 learn about knowledge representation, 27 00:01:23,724 --> 00:01:25,305 learn about data and computers. 28 00:01:25,305 --> 00:01:28,671 There's no necessity that someone's first encounter 29 00:01:28,671 --> 00:01:32,106 with data and computers, has to be a relational database system. 30 00:01:32,106 --> 00:01:33,947 So I'm going to put forward, 31 00:01:33,947 --> 00:01:36,539 I'm going to report on a training workshop 32 00:01:36,539 --> 00:01:40,330 I've delivered to library staff in University of Oxford, 33 00:01:40,330 --> 00:01:42,550 and I've also done as a public event, 34 00:01:42,550 --> 00:01:46,710 so just with members of the public coming to an open data week 35 00:01:46,710 --> 00:01:47,875 that university hosted. 36 00:01:47,875 --> 00:01:51,979 And also done some of this with researchers as well. 37 00:01:51,979 --> 00:01:57,441 So I teach in a way that is very particular to me, 38 00:01:57,441 --> 00:01:59,847 so it's not like I hand over materials to you. 39 00:01:59,847 --> 00:02:03,164 I'll show you my approach and then you'll take it up 40 00:02:03,164 --> 00:02:05,902 and improve on it, and make it personal to you 41 00:02:05,902 --> 00:02:08,469 and the audiences you're dealing with. 42 00:02:08,469 --> 00:02:10,253 And I want to avoid this. 43 00:02:10,253 --> 00:02:16,256 So in my career, I had to learn data technologies, and SQL, and XML, 44 00:02:16,256 --> 00:02:19,610 and the content of tutorials, 45 00:02:19,610 --> 00:02:23,400 or examples, is very much like this. 46 00:02:23,400 --> 00:02:26,330 I'm not objecting to the language-- because that's what you got to learn-- 47 00:02:26,330 --> 00:02:28,969 but employees, invoices. 48 00:02:28,969 --> 00:02:32,708 So your task might be you have a sales force 49 00:02:32,708 --> 00:02:36,913 and you've got to identify the person who sold the most items, 50 00:02:36,913 --> 00:02:38,369 and calculate their bonus 51 00:02:38,369 --> 00:02:41,541 and then issue the invoices to the customers, 52 00:02:41,541 --> 00:02:44,707 and it's the most boring-- I can't get excited about that, 53 00:02:44,707 --> 00:02:48,195 or I don't feel like I'm learning a topic. 54 00:02:48,195 --> 00:02:51,662 With Wikidata, we have so many topics we can engage people in, 55 00:02:51,665 --> 00:02:54,613 and it might be things in the solar system, 56 00:02:54,613 --> 00:02:56,591 or characters in Shakespeare, 57 00:02:56,591 --> 00:02:59,765 or things in the solar system named after characters in Shakespeare, 58 00:02:59,765 --> 00:03:01,897 which is what most of this is. 59 00:03:03,497 --> 00:03:05,739 So when you have a teaching approach, 60 00:03:05,739 --> 00:03:08,395 one question is what things do you leave out. 61 00:03:09,295 --> 00:03:15,271 So in the workshop I run, I don't explain what SPARQL stands for, 62 00:03:15,271 --> 00:03:18,193 that doesn't help you write SPARQL at all. 63 00:03:18,193 --> 00:03:20,591 It doesn't help to explain what RDF is. 64 00:03:20,591 --> 00:03:22,763 Obviously, it's historically really important, 65 00:03:22,763 --> 00:03:25,713 but telling people there's a format for describing resources 66 00:03:25,713 --> 00:03:27,630 that's called resource description format, 67 00:03:27,630 --> 00:03:30,966 and resource is whatever's described, it's not really a format. 68 00:03:30,966 --> 00:03:32,226 That doesn't help people, 69 00:03:32,226 --> 00:03:36,650 that gets people no closer to actually, practically, using this. 70 00:03:36,650 --> 00:03:40,639 Linked open data, LOD, I may mention. 71 00:03:40,639 --> 00:03:44,317 So the library museum professionals that come to my training 72 00:03:44,317 --> 00:03:46,830 have definitely heard about linked open data, 73 00:03:46,830 --> 00:03:50,697 and know that it's the future of their discipline, 74 00:03:50,697 --> 00:03:52,564 and it's going to revolutionize their work. 75 00:03:52,564 --> 00:03:54,879 But at the moment, they're not using that kind of system. 76 00:03:54,879 --> 00:03:58,404 So they've not seen a real practical example of that technology. 77 00:03:58,404 --> 00:04:00,206 So that's what they're going to get from this. 78 00:04:00,206 --> 00:04:01,895 So I might mention linked open data, 79 00:04:01,895 --> 00:04:03,971 but I don't get into the definition. 80 00:04:03,971 --> 00:04:06,404 I basically say, this is a service you can use for free. 81 00:04:06,404 --> 00:04:08,113 It's been given to you to use for free, 82 00:04:08,113 --> 00:04:10,675 and that gets the point across. 83 00:04:10,675 --> 00:04:14,925 Semantic identifiers and namespaces, 84 00:04:14,925 --> 00:04:16,518 I want to get across implicitly, 85 00:04:16,518 --> 00:04:18,294 I don't want to teach people these concepts, 86 00:04:18,294 --> 00:04:21,271 I want them to pick up the concepts even if I don't use the terms. 87 00:04:21,271 --> 00:04:26,536 Reification, so people already using a RDF database want to know 88 00:04:26,536 --> 00:04:31,432 does Wikidata have statement IDs, and I try to avoid that. 89 00:04:31,432 --> 00:04:33,855 I hardly even mention Wikidata. 90 00:04:33,855 --> 00:04:39,048 So these workshops are advertised as like *Introduction to SPARQL,* 91 00:04:39,048 --> 00:04:41,027 or for the public event one, it was 92 00:04:41,027 --> 00:04:45,097 *Asking and Answering Questions with Open Data.* 93 00:04:45,097 --> 00:04:47,826 And then in the blurb, I'd say we're going to be using this platform, 94 00:04:47,826 --> 00:04:50,268 And I'll introduce it and say, well, this is the best platform 95 00:04:50,268 --> 00:04:52,815 on which to learn this language, this skill. 96 00:04:52,815 --> 00:04:55,138 It's the most helpful, it's got the most interesting stuff. 97 00:04:55,138 --> 00:04:57,265 And then in the course of the workshop, 98 00:04:57,265 --> 00:04:58,969 maybe we'll get into more about Wikidata, 99 00:04:58,969 --> 00:05:02,351 why this exists, who put this data here. 100 00:05:02,351 --> 00:05:04,501 So there's a whole lot of background 101 00:05:04,501 --> 00:05:08,347 that kind of professional RDF or link data people will have, 102 00:05:08,347 --> 00:05:09,942 but you don't need. 103 00:05:09,942 --> 00:05:13,737 I just want to get people thinking about nodes and arcs, 104 00:05:13,737 --> 00:05:15,699 and thinking in triples, 105 00:05:15,699 --> 00:05:19,690 and imagining how a triple representation can be created and queried. 106 00:05:19,690 --> 00:05:22,897 I want them to phrase questions in their own language, 107 00:05:22,897 --> 00:05:27,252 and translate into SPARQL, via a kind of a baby talk intermediary. 108 00:05:27,252 --> 00:05:28,984 But I want them to think in triples 109 00:05:28,984 --> 00:05:34,740 and get used to asking questions in that way, and just to get to the point 110 00:05:34,740 --> 00:05:38,887 where they ask interesting questions relevant to their work, or their hobbies, 111 00:05:38,887 --> 00:05:42,395 or whatever, and they come away with something. 112 00:05:42,395 --> 00:05:44,107 So it's not the theoretical understanding 113 00:05:44,107 --> 00:05:46,835 that I'm getting in these quite short sessions. 114 00:05:46,835 --> 00:05:50,285 And the first thing I present them with is this, they've got to look at this. 115 00:05:50,285 --> 00:05:53,650 And there's a "what the hell?" reaction 116 00:05:53,650 --> 00:05:55,496 in the workshop and probably in the room now, 117 00:05:55,496 --> 00:05:59,361 because, "I thought this was about technology skills! 118 00:05:59,361 --> 00:06:01,512 Why have we got to look at a cute dog?" 119 00:06:01,512 --> 00:06:05,289 But this is to introduce my toy world. 120 00:06:05,289 --> 00:06:10,525 So there are three human beings. Two of them are a married couple. 121 00:06:10,525 --> 00:06:13,054 One is the child from that couple. 122 00:06:13,054 --> 00:06:16,678 There are two beings that are pets of this couple, 123 00:06:16,678 --> 00:06:19,119 and we've got the types of the pets. 124 00:06:19,119 --> 00:06:20,839 Clearly, this is not official data. 125 00:06:20,839 --> 00:06:23,922 This knowledge representation, which it is, 126 00:06:23,922 --> 00:06:26,854 only exists in this slide, it's not a database. 127 00:06:26,854 --> 00:06:28,780 So I'm getting people thinking of a toy world. 128 00:06:28,780 --> 00:06:30,512 And there's loads that can be learnt 129 00:06:30,512 --> 00:06:33,491 with just discussing this, and kind of role-playing about this. 130 00:06:33,491 --> 00:06:38,121 And you're going to make your own toy world. 131 00:06:40,721 --> 00:06:43,701 So a point to come from this is this isn't a representation 132 00:06:43,701 --> 00:06:47,102 of all of my family or of all my parent's pets. 133 00:06:47,102 --> 00:06:49,311 It's a tiny fragment. 134 00:06:49,311 --> 00:06:50,787 When we query things, 135 00:06:50,787 --> 00:06:53,261 we're querying a representation of the world, not the world. 136 00:06:53,261 --> 00:06:55,150 There's so much that's missed out. 137 00:06:56,150 --> 00:07:01,104 That's a really important first lesson to get about any database, any querying. 138 00:07:01,104 --> 00:07:06,281 So everything's expressed in triples, and nodes, and arcs. 139 00:07:06,281 --> 00:07:08,427 Arcs have a direction. 140 00:07:08,427 --> 00:07:09,529 How do the names work? 141 00:07:09,529 --> 00:07:12,507 So one of these nodes is marked *Bob.* 142 00:07:12,507 --> 00:07:17,207 Is that the name Bob, does that stand for the name Bob? 143 00:07:17,207 --> 00:07:20,624 Well, not quite, because other people use the name Bob. 144 00:07:20,624 --> 00:07:22,535 And Dan, you probably know a Bob. 145 00:07:22,535 --> 00:07:23,649 (Dan) Like Bob [inaudible]. 146 00:07:23,649 --> 00:07:25,247 Yeah, you know a Bob. 147 00:07:25,247 --> 00:07:28,617 And that's the Bob I think-- no, that isn't this Bob. 148 00:07:28,617 --> 00:07:29,642 So we talk about that. 149 00:07:29,642 --> 00:07:32,359 So names are relative to the system that they're in, 150 00:07:32,359 --> 00:07:36,327 and we could talk about Martin's Bob and Dan's Bob not being the same person. 151 00:07:36,327 --> 00:07:37,696 So it's not the names. 152 00:07:37,696 --> 00:07:39,878 So we could think of them as relative to a system. 153 00:07:39,878 --> 00:07:43,828 So we can even say *Martin:Bob* is the name for one thing, 154 00:07:43,828 --> 00:07:47,775 and *Dan:Bob* identifies another thing in another system. 155 00:07:49,375 --> 00:07:52,121 And I emphasize triples, so three things. 156 00:07:52,121 --> 00:07:57,754 You might be tempted to say, "Cindy and Bob, together, have a pet dog," 157 00:07:58,511 --> 00:08:03,995 but you can't do that in this system unless you have a node for the couple. 158 00:08:03,995 --> 00:08:07,350 Things have to have a direction. That may not make much sense. 159 00:08:07,350 --> 00:08:09,673 There's a married couple-- that doesn't have a direction, 160 00:08:09,673 --> 00:08:11,196 that's a relation between two people, 161 00:08:11,196 --> 00:08:14,014 but we are modeling it with things that have a direction 162 00:08:14,014 --> 00:08:17,464 so we have to have the two directions. 163 00:08:17,464 --> 00:08:18,962 There are arbitrary choices. 164 00:08:18,962 --> 00:08:24,206 So why have "Cindy has child, Martin, and not Martin has parent, Cindy?" 165 00:08:24,206 --> 00:08:25,598 It's an arbitrary choice. 166 00:08:25,598 --> 00:08:28,605 Arbitrary choices like that-- choices of name, choices of direction-- 167 00:08:28,605 --> 00:08:31,140 are built into this system and intrinsic. 168 00:08:31,140 --> 00:08:32,871 So there are arbitrary choices to be made, 169 00:08:32,871 --> 00:08:34,656 how to represent this, 170 00:08:34,656 --> 00:08:37,794 even the same facts could be represented in different ways. 171 00:08:37,794 --> 00:08:39,233 Who makes that decision? 172 00:08:39,233 --> 00:08:40,731 Well, whoever creates the system, 173 00:08:40,731 --> 00:08:45,069 whoever sets up the knowledge-based system. 174 00:08:45,069 --> 00:08:49,330 So people can see that this-- called serializable-- 175 00:08:49,330 --> 00:08:52,459 this could be expressed as triple statements. 176 00:08:52,459 --> 00:08:58,468 So, "Cindy has pet, Tilly, Martin is a human," 177 00:08:58,468 --> 00:09:02,393 and getting to the core insight 178 00:09:02,393 --> 00:09:06,970 is comparing how do we make a question in English? 179 00:09:06,970 --> 00:09:10,953 Well, we have a statement and it's incomplete, 180 00:09:10,953 --> 00:09:16,762 like, "Who has pet, Tilly?" 181 00:09:16,762 --> 00:09:21,585 So we go from "Cindy has pet Tilly," to "Who has pet Tilly?" 182 00:09:21,585 --> 00:09:23,316 We've taken something out, 183 00:09:23,316 --> 00:09:27,522 we've put in a placeholder, and we've introduced a question mark. 184 00:09:27,522 --> 00:09:30,080 I say that's just like what we do with SPARQL. 185 00:09:30,080 --> 00:09:33,053 We take something out, we have an incomplete statement, 186 00:09:33,053 --> 00:09:35,930 or incomplete statements, 187 00:09:35,930 --> 00:09:40,213 we put a placeholder in the missing place, and we have a question mark 188 00:09:40,213 --> 00:09:42,645 to mark that that's a placeholder. 189 00:09:42,645 --> 00:09:47,164 So it can be a role play where I'm the query service 190 00:09:47,164 --> 00:09:49,383 for this knowledge base. 191 00:09:49,383 --> 00:09:53,906 And so people can learn what a query service does 192 00:09:53,906 --> 00:09:56,969 by seeing a query service and role-playing 193 00:09:56,969 --> 00:09:59,709 and being a query service, which we'll get to. 194 00:10:00,909 --> 00:10:05,414 So people can see that working on the level of triples. 195 00:10:07,214 --> 00:10:09,371 "Who has pet, Tilly?" 196 00:10:09,371 --> 00:10:14,480 If you say that to me, and I can say, "results *Cindy, Bob.*" 197 00:10:14,480 --> 00:10:17,774 Then I put it to the trainees, 198 00:10:17,774 --> 00:10:19,534 how do you ask more complicated questions? 199 00:10:19,534 --> 00:10:22,436 So, "Who has a dog as a pet?" 200 00:10:23,646 --> 00:10:28,701 And some will get it straightaway, some will say, "Oh, it's a triple-- 201 00:10:28,701 --> 00:10:33,075 Who? has pet dog?" 202 00:10:33,075 --> 00:10:38,103 So my role as the query service is to look at this and match your triple, 203 00:10:38,103 --> 00:10:39,385 "Who? has pet dog," 204 00:10:39,385 --> 00:10:41,522 so I got to find things that have pet dog, 205 00:10:41,522 --> 00:10:43,024 and results *None.* 206 00:10:43,024 --> 00:10:48,082 So this is the discussion-- what is this node I've called *dog*? 207 00:10:48,082 --> 00:10:49,231 It's not a dog. 208 00:10:49,231 --> 00:10:53,250 Although it's called dog, it's not a dog, it stands for a class. 209 00:10:53,250 --> 00:10:56,130 Obvious when you're a SPARQL user, but this is getting people 210 00:10:56,130 --> 00:10:59,054 over the threshold of thinking in this way. 211 00:10:59,054 --> 00:11:02,319 And you got to do what kinds of things have pets. 212 00:11:02,319 --> 00:11:05,258 People see that they can't do that in one triple, 213 00:11:05,258 --> 00:11:06,572 you got to do multiple triples, 214 00:11:06,572 --> 00:11:10,126 and those multiple triples ask for multiple things. 215 00:11:12,726 --> 00:11:16,588 So if you've got, "What kinds of things have pets?" 216 00:11:16,588 --> 00:11:18,861 then you're going to identify people, 217 00:11:18,861 --> 00:11:21,070 and then you've got to identify those types, 218 00:11:21,070 --> 00:11:24,362 and it naturally comes up, "How do I specify the columns I want? 219 00:11:24,362 --> 00:11:27,365 How do I specify that I want the types?" That's the question. 220 00:11:27,365 --> 00:11:29,838 And then you say, "You have these partial statements, 221 00:11:29,838 --> 00:11:34,643 and you enclose them in curly brackets and put *Select.*" 222 00:11:37,943 --> 00:11:41,137 So this is kind of the first half hour of the workshop, 223 00:11:41,137 --> 00:11:44,162 and it's not on computers, it's all with role play 224 00:11:44,162 --> 00:11:45,743 and thinking about this. 225 00:11:45,743 --> 00:11:51,776 And I invite people in the workshop to make their own toy world, 226 00:11:51,776 --> 00:11:54,506 and you'll be going toy world, I hope, after this. 227 00:11:54,506 --> 00:11:59,702 So five minutes, eight to ten nodes to represent your family, your work place, 228 00:11:59,702 --> 00:12:02,351 the thing you're working on, the TV you were watching last night, 229 00:12:02,351 --> 00:12:05,166 and to have some meaningful links between them. 230 00:12:05,166 --> 00:12:08,688 And the lesson that-- you make arbitrary decisions, 231 00:12:08,688 --> 00:12:10,516 you name things, you create properties, 232 00:12:10,516 --> 00:12:17,228 but they're the creation of the person who sets up the knowledge system. 233 00:12:17,558 --> 00:12:24,394 And then, in pairs, they explain their graphs to each other, and query. 234 00:12:24,394 --> 00:12:28,166 So, "What's a query you could ask about this little world, 235 00:12:28,166 --> 00:12:29,570 and then what would be the answer?" 236 00:12:29,570 --> 00:12:33,730 So, like I say, people mostly get it, 237 00:12:33,730 --> 00:12:36,451 but people want a four- or five-part relation, 238 00:12:36,451 --> 00:12:38,088 so they might want to say, 239 00:12:38,088 --> 00:12:39,958 "This couple, together, have a pet." 240 00:12:39,958 --> 00:12:43,204 Or they might want to say, "Tilly is a pet, is a dog." 241 00:12:43,204 --> 00:12:47,207 And you can enforce nodes, triples, and triples have a direction. 242 00:12:48,307 --> 00:12:51,258 So I'll explain what a triple is and say also, not in this example, 243 00:12:51,258 --> 00:12:54,639 but, "Triples, generally, they have an item, they have a property, 244 00:12:54,639 --> 00:12:57,307 and then they have a number of other things 245 00:12:57,307 --> 00:12:59,516 which could be values, could be time periods, 246 00:12:59,516 --> 00:13:03,104 could be locations on a globe." 247 00:13:07,288 --> 00:13:11,235 So with that role-play exercise, we're 40 minutes into a 2-hour workshop, 248 00:13:11,235 --> 00:13:14,270 and in a computer room, and we haven't touched computers yet. 249 00:13:14,270 --> 00:13:17,387 But I think it's useful to get people thinking in that way, 250 00:13:17,387 --> 00:13:19,535 and to think about how they would make the model 251 00:13:19,535 --> 00:13:23,793 and what the query is, and to actually translate, 252 00:13:23,793 --> 00:13:25,149 so your translation exercise. 253 00:13:26,339 --> 00:13:32,597 And then I'd direct people to *query.wikidata.org.* 254 00:13:34,197 --> 00:13:36,240 So there's a bunch of things they've got to take on. 255 00:13:36,240 --> 00:13:40,086 We've been doing-- I will have a flip chart, and we will-- 256 00:13:40,086 --> 00:13:41,539 Is that six? 257 00:13:41,539 --> 00:13:43,290 Six minutes elapsed? 258 00:13:43,290 --> 00:13:45,278 (man) [inaudible] 259 00:13:45,278 --> 00:13:46,318 Right. 260 00:13:50,548 --> 00:13:52,485 So I'll give them a task. 261 00:13:52,485 --> 00:13:55,679 I don't want them to learn Q numbers and P numbers. 262 00:13:55,679 --> 00:14:00,646 So I'll tell them what the names are and show them the *Ctrl+Shift* trick. 263 00:14:00,646 --> 00:14:01,894 But there's a lot to take on, 264 00:14:01,894 --> 00:14:04,210 so they're taking on Q numbers and P numbers, 265 00:14:04,210 --> 00:14:08,240 they've seen the triple format, and they've seen *Select,* 266 00:14:08,240 --> 00:14:11,338 but they've got to apply this all in one go. 267 00:14:11,338 --> 00:14:14,538 So I'll give people a task. 268 00:14:14,538 --> 00:14:17,299 Some will get it immediately, some will struggle 269 00:14:17,299 --> 00:14:18,896 because they missed a bit of discussion, 270 00:14:18,896 --> 00:14:22,866 or more often, because they're familiar with another kind of database system, 271 00:14:22,866 --> 00:14:25,490 and they have particular expectations from that. 272 00:14:26,890 --> 00:14:30,656 So I set bonus things or more complicated things 273 00:14:30,656 --> 00:14:31,874 if people are getting bored. 274 00:14:31,874 --> 00:14:37,828 Or I say, "If you get bored and you work on an entirely different question, 275 00:14:37,828 --> 00:14:40,058 that's fine, but show me." 276 00:14:40,058 --> 00:14:42,254 So I'll run through this in front of them, 277 00:14:42,254 --> 00:14:45,617 tell them to do it, just show the hints of what properties they'll be using, 278 00:14:45,617 --> 00:14:46,979 and then run through it again. 279 00:14:46,979 --> 00:14:50,277 And then, go through the cycle of adding on extra things 280 00:14:50,277 --> 00:14:51,280 to enhance the query. 281 00:14:51,280 --> 00:14:53,084 So we might have done a query and I'll say, 282 00:14:53,084 --> 00:14:55,522 "Here's how you add on an optional property." 283 00:14:57,822 --> 00:15:01,046 And then give them a task involving optional property. 284 00:15:01,046 --> 00:15:04,518 In the Bodleian, I say, "Find manuscripts in Latin 285 00:15:04,518 --> 00:15:06,326 for a public event at University of Bristol, 286 00:15:06,326 --> 00:15:09,255 where there's lots of celebrities who study at the University of Bristol, 287 00:15:09,255 --> 00:15:14,113 so get that as an example." 288 00:15:14,113 --> 00:15:15,933 So going to the interface, 289 00:15:15,933 --> 00:15:20,949 there's still a hump in the learning curve 290 00:15:20,949 --> 00:15:24,199 because they've got to put the query into action, 291 00:15:24,199 --> 00:15:25,752 they've got to think in this language, 292 00:15:25,752 --> 00:15:29,879 and they've got to look up Q numbers and P numbers, 293 00:15:29,879 --> 00:15:32,246 and then there's all the things they can do with the query, 294 00:15:32,246 --> 00:15:33,283 once they've done it. 295 00:15:33,283 --> 00:15:37,627 And the visualization options, the bookmarking, getting the data. 296 00:15:43,881 --> 00:15:45,635 So I'll suggest refinements. 297 00:15:45,635 --> 00:15:50,264 So we can take a succession of steps of getting people doing a query, 298 00:15:50,264 --> 00:15:53,215 and taking it up to the next level. 299 00:15:53,215 --> 00:15:56,069 Like, "Find landscape paintings taller than they are wide." 300 00:15:56,069 --> 00:16:02,658 So within the two-hour thing, we get people doing basic queries, 301 00:16:02,658 --> 00:16:07,803 adding refinements onto them, 302 00:16:07,803 --> 00:16:11,164 not doing much filtering, 303 00:16:11,164 --> 00:16:13,893 but starting to introduce measurements, 304 00:16:13,893 --> 00:16:14,982 and so on. 305 00:16:14,982 --> 00:16:17,782 Not getting into qualifiers or another level. 306 00:16:17,782 --> 00:16:20,816 If it's a whole day thing, you probably could. 307 00:16:20,816 --> 00:16:25,526 It comes up, inevitably, "Where else can I use the SPARQL language?" 308 00:16:25,526 --> 00:16:29,581 And I observe that that is a question, and questions can be framed in SPARQL, 309 00:16:29,581 --> 00:16:31,671 and put to Wikidata, and you'll get answers, 310 00:16:31,671 --> 00:16:34,444 and there is a Wikidata property called SPARQL endpoint. 311 00:16:34,444 --> 00:16:36,888 So when they ask that, that becomes their task. 312 00:16:36,888 --> 00:16:38,809 And then they get that list of institutions 313 00:16:38,809 --> 00:16:40,369 that have SPARQL endpoints. 314 00:16:42,499 --> 00:16:43,877 And it's worth pointing out, 315 00:16:43,877 --> 00:16:48,647 so in an introductory session on other computer languages, 316 00:16:48,647 --> 00:16:52,065 people will typically learn how to do loops, 317 00:16:52,065 --> 00:16:55,477 how to do functions, how to do conditionals. 318 00:16:55,477 --> 00:16:56,803 They'll learn the basic grammar 319 00:16:56,803 --> 00:16:59,735 but they won't make something fantastic and useful, 320 00:16:59,735 --> 00:17:01,663 they'll just learn the basic grammar. 321 00:17:01,663 --> 00:17:06,458 But in an introductory session on Wikidata SPARQL you can make-- 322 00:17:06,458 --> 00:17:08,142 if you're interested in German literature-- 323 00:17:08,142 --> 00:17:10,333 a map of the birthplace of German poets, and so on. 324 00:17:10,333 --> 00:17:12,097 And so we get feedback like this. 325 00:17:12,097 --> 00:17:14,196 This is how great the Wikidata Query Service is 326 00:17:14,196 --> 00:17:16,266 as an educational tool. 327 00:17:16,266 --> 00:17:19,298 "What is this sorcery?" Isn't even from someone in the room. 328 00:17:19,298 --> 00:17:21,226 A trainee in the room made a map, 329 00:17:21,226 --> 00:17:24,702 emailed it to her colleagues and got back, "What is this sorcery!? 330 00:17:24,702 --> 00:17:25,703 How have you made this?" 331 00:17:25,703 --> 00:17:29,428 And was just not expecting this to happen. 332 00:17:29,428 --> 00:17:32,271 People are not expecting to look at the picture of the cute dog, 333 00:17:32,271 --> 00:17:36,243 they're not expecting to do the role play where they represent their family 334 00:17:36,243 --> 00:17:37,865 and query each other. 335 00:17:37,865 --> 00:17:40,210 They're not expecting to actually make something concrete 336 00:17:40,210 --> 00:17:42,587 which they take away as a link and show to their colleagues. 337 00:17:42,587 --> 00:17:45,010 And all of this, being unexpected, 338 00:17:45,010 --> 00:17:47,092 makes it memorable and makes them want to go away 339 00:17:47,092 --> 00:17:48,527 and talk to other people about it. 340 00:17:48,527 --> 00:17:51,399 It's not like your run-of-the-mill IT training. 341 00:17:52,699 --> 00:17:58,020 The lower quote is from a researcher who saw how he could make a map 342 00:17:58,020 --> 00:18:00,761 of famous people with his first name 343 00:18:00,761 --> 00:18:04,421 and another one of famous people with his wife's first name. 344 00:18:04,421 --> 00:18:07,819 And then he just had more and more ideas of things and charts, and so on, 345 00:18:07,819 --> 00:18:09,469 he's going to create with Wikidata, 346 00:18:09,469 --> 00:18:10,967 and so he's glad to say, 347 00:18:10,967 --> 00:18:13,297 "You've destroyed my productivity for the next month." 348 00:18:15,805 --> 00:18:17,601 So that's my recommendation. 349 00:18:17,601 --> 00:18:19,702 I think we can take it as a positive, 350 00:18:19,702 --> 00:18:22,985 and we take beyond training people about Wikidata, 351 00:18:22,985 --> 00:18:24,671 training people about data. 352 00:18:24,671 --> 00:18:26,716 The stuff that came up in the keynote this morning, 353 00:18:26,716 --> 00:18:32,468 making people literate about ideas of representation 354 00:18:32,468 --> 00:18:36,568 and starting people off and being involved in that discussion, 355 00:18:36,568 --> 00:18:37,722 involves this [inaudible]. 356 00:18:37,722 --> 00:18:38,816 So this could be done-- 357 00:18:38,816 --> 00:18:40,822 doesn't have to be like a workplace training thing, 358 00:18:40,822 --> 00:18:42,134 it could be a public event, 359 00:18:42,134 --> 00:18:45,250 to get people familiar with these technologies. 360 00:18:46,150 --> 00:18:48,302 But I will stop there for discussion. 361 00:18:48,302 --> 00:18:51,150 And like I say, it's respectfully submitted to people in the room 362 00:18:51,150 --> 00:18:55,280 who do SPARQL training a different way, but I hope this is useful to you. 363 00:18:57,180 --> 00:19:00,184 (audience applause) 364 00:19:12,915 --> 00:19:15,721 (Dan) Okay, are there any questions? 365 00:19:23,511 --> 00:19:26,605 (man) Hi, it's [Mohammed Hijah] from Palestine. 366 00:19:26,605 --> 00:19:28,420 Thank you for the session. 367 00:19:28,420 --> 00:19:30,921 I was wondering if there are resources 368 00:19:30,921 --> 00:19:35,131 that we can get to learn SPARQL language professionally? 369 00:19:37,899 --> 00:19:40,213 I've got the SPARQL book, the O'Reilly book. 370 00:19:40,213 --> 00:19:43,413 I find the Wikibook on SPARQL 371 00:19:43,413 --> 00:19:44,987 is really, really useful. 372 00:19:44,987 --> 00:19:48,387 That's like the most useful and accessible reference. 373 00:19:49,287 --> 00:19:54,570 The tutorials on Wikidata itself are going to vary in quality. 374 00:19:55,170 --> 00:19:57,694 (Mohammed) I think that they are for beginners. 375 00:19:57,694 --> 00:20:01,240 I can handle with SPARQL but in the beginner level, 376 00:20:01,240 --> 00:20:04,343 but I want to deal with it professionally. 377 00:20:10,864 --> 00:20:13,609 So my concern is to get as many people as possible 378 00:20:13,609 --> 00:20:16,292 across the threshold into being aware of how this works, 379 00:20:16,292 --> 00:20:17,925 and dabbling. 380 00:20:19,225 --> 00:20:24,920 I'd like it to be a deeper course by going into more of the... 381 00:20:26,220 --> 00:20:29,120 how it works-- qualifiers and references, and so on. 382 00:20:29,120 --> 00:20:31,809 Where in a professional context, you're probably aiming towards 383 00:20:31,809 --> 00:20:35,923 people using a particular SPARQL endpoint, 384 00:20:35,923 --> 00:20:39,123 and Wikidata has some customizations 385 00:20:39,123 --> 00:20:41,636 We've discussed in Twitter that there's some things we use 386 00:20:41,636 --> 00:20:43,548 that actually aren't a SPARQL standard. 387 00:20:43,548 --> 00:20:46,130 They're like an optimization. 388 00:20:46,130 --> 00:20:48,816 So in the professional context, 389 00:20:50,516 --> 00:20:56,190 I'd hope it would be tailored to that particular data set and endpoint, 390 00:20:56,190 --> 00:20:59,575 but there's not a demand for that yet, 391 00:20:59,575 --> 00:21:03,459 because like I said, I deal with people who are aware of linked open data, 392 00:21:03,459 --> 00:21:07,558 and the word out, it's a good thing, but haven't seen an example yet, 393 00:21:07,558 --> 00:21:09,446 haven't an example they can apply to their work, 394 00:21:09,446 --> 00:21:11,693 they're not enthusiastic about it yet. 395 00:21:11,693 --> 00:21:13,843 So I think we want to get my whole workplace 396 00:21:13,843 --> 00:21:17,726 and other workplaces and developers across that threshold 397 00:21:17,726 --> 00:21:21,998 to where they're demanding that kind of really in deep, 398 00:21:21,998 --> 00:21:25,333 like using endpoint in a library kind of training. 399 00:21:26,082 --> 00:21:27,376 (Mohammed) Thank you. 400 00:21:31,883 --> 00:21:34,892 (woman) It's just a question. I really liked that, thank you so much. 401 00:21:34,892 --> 00:21:37,819 Is it documented step-by-step anywhere? 402 00:21:39,194 --> 00:21:43,043 I can share my succession of tasks. 403 00:21:43,843 --> 00:21:47,100 That's very much tailored to where I'm presenting it. 404 00:21:47,100 --> 00:21:50,697 Like I said, with librarians, I start with manuscripts and go on. 405 00:21:53,697 --> 00:21:56,393 You want to end up with people asking a question 406 00:21:56,393 --> 00:22:00,764 which is the question they came, in their heads, to the event with. 407 00:22:04,764 --> 00:22:10,283 So there's an order of querying with a triple, 408 00:22:10,283 --> 00:22:13,006 and then with multiple triples, and then with an optional triple, 409 00:22:13,006 --> 00:22:17,147 and then with a measurement in a filter, and so on. 410 00:22:17,147 --> 00:22:20,618 And, yeah, I can share... 411 00:22:22,438 --> 00:22:24,338 Yeah, I'll share a separate set of slides 412 00:22:24,338 --> 00:22:25,421 for those exercises. 413 00:22:25,421 --> 00:22:27,379 (woman) Thank you so much because I will take that 414 00:22:27,379 --> 00:22:29,783 and customize it for my own needs. Thank you. 415 00:22:31,010 --> 00:22:33,095 (Dan) Okay. No questions? 416 00:22:34,953 --> 00:22:38,994 (man) What would you recommend if you also want to teach editing, 417 00:22:38,994 --> 00:22:41,595 apart from just querying? 418 00:22:46,968 --> 00:22:53,476 I'm pleased to report that people find Wikidata editing, 419 00:22:53,476 --> 00:22:56,632 when I demonstrate it, to be so simple, 420 00:22:56,632 --> 00:22:58,943 that it just takes them by surprise. 421 00:22:58,943 --> 00:23:01,568 It's Wikidata editing, and I've got to add knowledge 422 00:23:01,568 --> 00:23:03,018 to this huge knowledge base. 423 00:23:03,018 --> 00:23:05,435 Sounds like something that really technical people can do. 424 00:23:05,435 --> 00:23:08,524 And then you show it, and they go, "Oh, right. 425 00:23:08,524 --> 00:23:11,096 Martin is instance of human." 426 00:23:13,296 --> 00:23:18,851 So I haven't done that systematically yet. 427 00:23:21,498 --> 00:23:26,007 I think a precondition would be getting people thinking in triples, 428 00:23:26,007 --> 00:23:29,675 and maybe underline that triples need references, 429 00:23:29,675 --> 00:23:34,237 and triples need qualifiers and that multiple triples, 430 00:23:34,237 --> 00:23:37,442 triples have multiple conflicting values. 431 00:23:37,442 --> 00:23:39,949 So I'd still do the toy world, 432 00:23:39,949 --> 00:23:45,149 maybe a more professionally relevant toy world, and translation exercise, 433 00:23:45,149 --> 00:23:48,222 but then go to, "So now the exercise we're going to do with triples 434 00:23:48,222 --> 00:23:49,661 is adding them." 435 00:23:51,561 --> 00:23:54,522 There's a lot of work done, and maybe Jason's done, 436 00:23:54,522 --> 00:23:58,402 with guessing a table of identifiers. 437 00:23:58,402 --> 00:23:59,581 So something I'd like to do, 438 00:23:59,581 --> 00:24:03,710 there's an online database 439 00:24:03,710 --> 00:24:06,710 of people who've won a Rhodes Scholarship. 440 00:24:06,710 --> 00:24:10,616 There's a scholarship to Oxford University from other countries. 441 00:24:10,616 --> 00:24:12,221 But it's not in Wikidata yet. 442 00:24:12,221 --> 00:24:14,381 So you can kind of divide up the room and say, 443 00:24:14,381 --> 00:24:16,595 "You're going to find these people in Wikidata 444 00:24:16,595 --> 00:24:18,874 and your task is to add 445 00:24:18,874 --> 00:24:21,106 with the reference to this online database." 446 00:24:21,106 --> 00:24:23,449 And then you can do a query to see how many have been added 447 00:24:23,449 --> 00:24:25,545 in that session. 448 00:24:25,545 --> 00:24:28,246 So I think, with all the training I do, 449 00:24:28,246 --> 00:24:31,582 I think the comprehension is more important 450 00:24:31,582 --> 00:24:33,554 than the taking action immediately. 451 00:24:33,554 --> 00:24:35,543 So when I'm training people on Wikipedia, 452 00:24:35,543 --> 00:24:39,514 I first show them article histories, contribution records, talk page, 453 00:24:39,514 --> 00:24:44,800 quality scale, so they're comprehending the process before they edit, 454 00:24:44,800 --> 00:24:47,439 and actually change something. 455 00:24:49,939 --> 00:24:52,636 (man) Not really a question but a comment. 456 00:24:52,636 --> 00:24:58,570 There is, for beginners, a good tutorial on YouTube, 457 00:24:58,570 --> 00:25:01,423 *How to Query and Start with SPARQL,* 458 00:25:01,423 --> 00:25:04,421 and if you want to go deeper, also, 459 00:25:04,421 --> 00:25:08,521 *How to Add Data with OpenRefine.* 460 00:25:08,521 --> 00:25:12,621 And I've also made some videos 461 00:25:12,621 --> 00:25:15,121 and uploaded them in German language. 462 00:25:15,121 --> 00:25:16,916 Oh, great! Thanks. 463 00:25:17,894 --> 00:25:21,823 I should also mention Hilary Thorsen, who's from Stanford Library, 464 00:25:21,823 --> 00:25:25,076 did, last week, a really good video capture 465 00:25:25,076 --> 00:25:28,857 of adding a data set to Wikidata with OpenRefine. 466 00:25:28,857 --> 00:25:33,529 This is for the LD4P, the Linked Data for Production project, 467 00:25:33,529 --> 00:25:35,932 and that was a really good video tutorial 468 00:25:35,932 --> 00:25:38,392 I'd recommend to anybody for-- 469 00:25:38,392 --> 00:25:42,426 That's the next couple of levels up from what I'm doing. 470 00:25:43,189 --> 00:25:45,029 (Dan) Is there a last question? 471 00:25:49,486 --> 00:25:52,203 (man) So SPARQL's sort of SQL-ish. 472 00:25:52,203 --> 00:25:54,856 If someone walked into your tutorial with an SQL background, 473 00:25:54,856 --> 00:25:57,291 is that a blessing or a curse? 474 00:25:57,291 --> 00:26:00,164 It's a bit of a curse because I had to learn SQL, 475 00:26:00,164 --> 00:26:03,398 so I did the... 476 00:26:03,398 --> 00:26:09,498 generate the invoices using SQL for your fictitious company, 477 00:26:09,498 --> 00:26:14,369 and definitely had to unlearn an SQL way of thinking about things 478 00:26:14,369 --> 00:26:15,712 to get to SPARQL. 479 00:26:15,712 --> 00:26:17,638 But it was freeing, it was freeing. 480 00:26:17,638 --> 00:26:21,302 Databases without built-in schemas are liberating. 481 00:26:22,102 --> 00:26:24,042 When you think about how many columns there are, 482 00:26:24,042 --> 00:26:25,727 and it's this number of columns for a book, 483 00:26:25,727 --> 00:26:27,638 and it's this number of columns for the address, 484 00:26:27,638 --> 00:26:28,984 and it's just three columns. 485 00:26:28,984 --> 00:26:31,406 Well, three and a bit more. 486 00:26:31,406 --> 00:26:34,443 That's really liberating. 487 00:26:34,443 --> 00:26:36,814 So that's my point, I kind of glanced at, 488 00:26:36,814 --> 00:26:41,810 that people make different progress in these workshops as in all training, 489 00:26:41,810 --> 00:26:43,869 but it's not like intelligent versus dumb, 490 00:26:43,869 --> 00:26:46,588 it's like the preconceptions you're coming with, 491 00:26:46,588 --> 00:26:47,823 are more the obstacle. 492 00:26:47,823 --> 00:26:50,242 So it's actually more-- 493 00:26:50,242 --> 00:26:55,655 I'm more optimistic about training people who have never encountered databases, 494 00:26:55,655 --> 00:26:58,805 coding, or any of that before, than... 495 00:26:58,805 --> 00:27:02,232 The worst people to try and train are linked data experts 496 00:27:02,232 --> 00:27:04,631 because they've used DBpedia a lot. 497 00:27:04,631 --> 00:27:07,180 They used a particular approach of querying 498 00:27:07,180 --> 00:27:08,834 and expecting to get certain things, 499 00:27:08,834 --> 00:27:12,429 and it looks odd when Wikidata does things differently. 500 00:27:12,429 --> 00:27:14,540 And they need to get with the program. 501 00:27:15,205 --> 00:27:17,867 (Dan) Okay, let's thank Martin for his insights. 502 00:27:17,867 --> 00:27:18,884 Thanks very much. 503 00:27:18,884 --> 00:27:21,888 (audience applause)